-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 137 -
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 25 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 30 -
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Paper • 2504.11343 • Published • 19
Sugato Ray PRO
AI & ML interests
Recent Activity
Organizations
-
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 22 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 114 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 39
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 22 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 43 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 29 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 54 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1
-
meta-llama/Meta-Llama-3-8B
Text Generation • 8B • Updated • 382k • • 6.26k -
meta-llama/Meta-Llama-3-8B-Instruct
Text Generation • 8B • Updated • 1.09M • • 4.1k -
mlx-community/Meta-Llama-3-8B-Instruct-4bit
Text Generation • 2B • Updated • 5.95k • 78 -
mlabonne/Meta-Llama-3-120B-Instruct
Text Generation • 122B • Updated • 15 • 200
-
Running1212
marimo app template
🍃Template for deploying a marimo application to HF
-
Running22
Bulk
🍃A bulk labelling interface for binary text classification
-
Running55
marimo server template
📝A marimo Space to edit marimo notebooks
-
Running88
Fast-Bulk
🍃A bulk labelling interface for binary text classification
-
madhurjindal/autonlp-Gibberish-Detector-492513457
Text Classification • 0.1B • Updated • 141k • • 61 -
answerdotai/ModernBERT-base
Fill-Mask • 0.1B • Updated • 1.23M • • 905 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 153 -
answerdotai/ModernBERT-large
Fill-Mask • 0.4B • Updated • 143k • • 415
-
Runtime error4.55k4.55k
Chatbot Arena Leaderboard
🏆Display chatbot performance leaderboard
-
Running on CPU Upgrade13.4k13.4k
Open LLM Leaderboard
🏆Track, rank and evaluate open LLMs and chatbots
-
Running1.08k1.08k
OpenVoice
🤗 -
Running189189
Yet Another LLM Leaderboard
🌖Run a Streamlit web app
-
Runtime error4.55k4.55k
Chatbot Arena Leaderboard
🏆Display chatbot performance leaderboard
-
Running on CPU Upgrade13.4k13.4k
Open LLM Leaderboard
🏆Track, rank and evaluate open LLMs and chatbots
-
Running189189
Yet Another LLM Leaderboard
🌖Run a Streamlit web app
-
Running on CPU Upgrade143143
Hallucinations Leaderboard
🔥View and submit LLM evaluations
-
Latxa: An Open Language Model and Evaluation Suite for Basque
Paper • 2403.20266 • Published • 3 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124 -
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Paper • 2405.08707 • Published • 33
-
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper • 2104.09864 • Published • 13 -
Attention Is All You Need
Paper • 1706.03762 • Published • 73 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Zero-Shot Tokenizer Transfer
Paper • 2405.07883 • Published • 5
-
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Paper • 2406.08587 • Published • 16 -
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Paper • 2406.09170 • Published • 28 -
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Paper • 2407.18901 • Published • 35 -
Benchmarking Agentic Workflow Generation
Paper • 2410.07869 • Published • 28
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 137 -
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 25 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 30 -
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Paper • 2504.11343 • Published • 19
-
Running1212
marimo app template
🍃Template for deploying a marimo application to HF
-
Running22
Bulk
🍃A bulk labelling interface for binary text classification
-
Running55
marimo server template
📝A marimo Space to edit marimo notebooks
-
Running88
Fast-Bulk
🍃A bulk labelling interface for binary text classification
-
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 22 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 114 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 39
-
madhurjindal/autonlp-Gibberish-Detector-492513457
Text Classification • 0.1B • Updated • 141k • • 61 -
answerdotai/ModernBERT-base
Fill-Mask • 0.1B • Updated • 1.23M • • 905 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 153 -
answerdotai/ModernBERT-large
Fill-Mask • 0.4B • Updated • 143k • • 415
-
Runtime error4.55k4.55k
Chatbot Arena Leaderboard
🏆Display chatbot performance leaderboard
-
Running on CPU Upgrade13.4k13.4k
Open LLM Leaderboard
🏆Track, rank and evaluate open LLMs and chatbots
-
Running1.08k1.08k
OpenVoice
🤗 -
Running189189
Yet Another LLM Leaderboard
🌖Run a Streamlit web app
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 22 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70
-
Runtime error4.55k4.55k
Chatbot Arena Leaderboard
🏆Display chatbot performance leaderboard
-
Running on CPU Upgrade13.4k13.4k
Open LLM Leaderboard
🏆Track, rank and evaluate open LLMs and chatbots
-
Running189189
Yet Another LLM Leaderboard
🌖Run a Streamlit web app
-
Running on CPU Upgrade143143
Hallucinations Leaderboard
🔥View and submit LLM evaluations
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 43 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 29 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 54 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1
-
Latxa: An Open Language Model and Evaluation Suite for Basque
Paper • 2403.20266 • Published • 3 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124 -
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Paper • 2405.08707 • Published • 33
-
meta-llama/Meta-Llama-3-8B
Text Generation • 8B • Updated • 382k • • 6.26k -
meta-llama/Meta-Llama-3-8B-Instruct
Text Generation • 8B • Updated • 1.09M • • 4.1k -
mlx-community/Meta-Llama-3-8B-Instruct-4bit
Text Generation • 2B • Updated • 5.95k • 78 -
mlabonne/Meta-Llama-3-120B-Instruct
Text Generation • 122B • Updated • 15 • 200
-
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper • 2104.09864 • Published • 13 -
Attention Is All You Need
Paper • 1706.03762 • Published • 73 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Zero-Shot Tokenizer Transfer
Paper • 2405.07883 • Published • 5
-
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Paper • 2406.08587 • Published • 16 -
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Paper • 2406.09170 • Published • 28 -
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Paper • 2407.18901 • Published • 35 -
Benchmarking Agentic Workflow Generation
Paper • 2410.07869 • Published • 28