-
-
-
-
-
-
Inference Providers
Active filters:
grpo
gyung/lfm2-1.2b-koen-mt-v8-rl-10k-merged
Text Generation
•
1B
•
Updated
•
25
•
2
mradermacher/AlRazi0.1-Medical-Thinking-GGUF
3B
•
Updated
•
12
•
1
ericrisco/gemma-3-4b-reasoning
Any-to-Any
•
4B
•
Updated
•
27
•
4
Text Generation
•
27B
•
Updated
•
37
•
3
Text Generation
•
Updated
•
23
•
3
alphadl/R1-Distill-0.6B-Qwen-GRPO
Text Generation
•
0.6B
•
Updated
•
8
•
1
aquiffoo/neo-3-1B-A90M-Instruct
Text Generation
•
Updated
•
1
Nhaass/Qwen3-VL-2B-ChartQA-GRPO
Image-to-Text
•
2B
•
Updated
•
58
•
1
debashis/llama-1b-tool-router-grpo
Text Generation
•
1B
•
Updated
•
7
•
1
Text Generation
•
0.1B
•
Updated
•
7
8B
•
Updated
•
4
sergiopaniego/Qwen2-0.5B-GRPO-test
Updated
Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF
1B
•
Updated
•
64
•
3
nbd22/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora
Updated
sergiopaniego/Qwen2-0.5B-GRPO
Updated
philschmid/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
20
•
8
spinech/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
6
Dongwei/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
2B
•
Updated
•
5
•
1
spinech/qwen2.5-3b-r1-rearc-stage1
Text Generation
•
3B
•
Updated
•
5
Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO
Text Generation
•
8B
•
Updated
•
9
•
1
MasterControlAIML/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured
Text Generation
•
2B
•
Updated
•
7
•
5
mradermacher/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured-GGUF
2B
•
Updated
•
91
•
2
hyunw3/qwen-2.5-0.5b-r1-countdown
Text Generation
•
0.5B
•
Updated
•
5
hyunw3/qwen-2.5-0.5b-r1-countdown_lr1.0e-6
Text Generation
•
0.5B
•
Updated
•
6
mgaimm/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
3
MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b-Latest-Unstructured-To-Structured
Text Generation
•
Updated
•
29
•
5
tuyentx/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
2
pablo-chocobar/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
3