claude sonyashijin/RTL_verilog_claude_verified_to_simulate Viewer • Updated May 31, 2025 • 316 • 22 • 4 trentmkelly/USTaxCodeBench Viewer • Updated Jun 26, 2025 • 12k • 17 snorkelai/agent-finance-reasoning Viewer • Updated Aug 20, 2025 • 357 • 321 • 65 tencent/ArtifactsBenchmark Viewer • Updated Oct 15, 2025 • 1.83k • 190 • 13
audio-datasets-hindi Audio-Transcript pairs for hindi/hinglish ujs/hinglish Viewer • Updated Jun 29, 2023 • 29k • 57 • 3 asahi417/seamless-align-enA-hiA Viewer • Updated May 30, 2024 • 178k • 169 • 1 TheAIchemist13/gramvaani_preprocessed_hi_train Viewer • Updated Sep 27, 2023 • 37.1k • 228
papers CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Paper • 2408.03910 • Published Aug 7, 2024 • 18
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Paper • 2408.03910 • Published Aug 7, 2024 • 18
datasets to filter CohereLabs/aya_dataset Viewer • Updated Apr 15, 2025 • 206k • 25.4k • 346 jamescalam/ai-arxiv2-semantic-chunks Viewer • Updated Apr 28, 2024 • 210k • 54 • 2
uspto monology/pile-uncopyrighted Viewer • Updated Aug 31, 2023 • 177M • 104k • 169 nickypro/minipile-split Viewer • Updated Jul 18, 2024 • 394k • 636
Small models Small models for experimentation google/gemma-2-2b Text Generation • Updated Aug 7, 2024 • 327k • 640 HuggingFaceTB/SmolLM-1.7B Text Generation • 2B • Updated Oct 16, 2024 • 46.9k • 181 h2oai/h2o-danube3-500m-base Text Generation • 0.5B • Updated Jul 18, 2024 • 846 • 33 Qwen/Qwen2-1.5B-Instruct Text Generation • 2B • Updated Jun 6, 2024 • 3.92M • • 162
claude sonyashijin/RTL_verilog_claude_verified_to_simulate Viewer • Updated May 31, 2025 • 316 • 22 • 4 trentmkelly/USTaxCodeBench Viewer • Updated Jun 26, 2025 • 12k • 17 snorkelai/agent-finance-reasoning Viewer • Updated Aug 20, 2025 • 357 • 321 • 65 tencent/ArtifactsBenchmark Viewer • Updated Oct 15, 2025 • 1.83k • 190 • 13
uspto monology/pile-uncopyrighted Viewer • Updated Aug 31, 2023 • 177M • 104k • 169 nickypro/minipile-split Viewer • Updated Jul 18, 2024 • 394k • 636
audio-datasets-hindi Audio-Transcript pairs for hindi/hinglish ujs/hinglish Viewer • Updated Jun 29, 2023 • 29k • 57 • 3 asahi417/seamless-align-enA-hiA Viewer • Updated May 30, 2024 • 178k • 169 • 1 TheAIchemist13/gramvaani_preprocessed_hi_train Viewer • Updated Sep 27, 2023 • 37.1k • 228
Small models Small models for experimentation google/gemma-2-2b Text Generation • Updated Aug 7, 2024 • 327k • 640 HuggingFaceTB/SmolLM-1.7B Text Generation • 2B • Updated Oct 16, 2024 • 46.9k • 181 h2oai/h2o-danube3-500m-base Text Generation • 0.5B • Updated Jul 18, 2024 • 846 • 33 Qwen/Qwen2-1.5B-Instruct Text Generation • 2B • Updated Jun 6, 2024 • 3.92M • • 162
papers CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Paper • 2408.03910 • Published Aug 7, 2024 • 18
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Paper • 2408.03910 • Published Aug 7, 2024 • 18
datasets to filter CohereLabs/aya_dataset Viewer • Updated Apr 15, 2025 • 206k • 25.4k • 346 jamescalam/ai-arxiv2-semantic-chunks Viewer • Updated Apr 28, 2024 • 210k • 54 • 2