-
-
-
-
-
-
Inference Providers
Active filters:
trl
jack0503/code-usage-model
Text Generation
•
Updated
•
32
•
1
ncgc/qwen-3.0B-sft
Text Generation
•
3B
•
Updated
•
11
•
1
belal212/therapist-gemma
lurf21/Qwen2.5-Coder-14B-NEP-new
Text Generation
•
15B
•
Updated
•
5
•
1
lurf21/Qwen2.5-Coder-3B-NEP-new
Text Generation
•
3B
•
Updated
•
7
•
1
mradermacher/Qwen2.5-Coder-14B-NEP-new-GGUF
15B
•
Updated
•
254
•
1
mradermacher/Qwen2.5-Coder-3B-NEP-new-GGUF
3B
•
Updated
•
236
•
1
6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive
Text Generation
•
Updated
•
13
•
1
trl-lib/Qwen3-4B-LoRA
Updated
•
1
lewtun/dummy-trl-model
Reinforcement Learning
•
Updated
•
4
•
1
ybelkada/gpt-neo-125m-detox
Reinforcement Learning
•
Updated
•
233
ybelkada/gpt-neo-125m-detoxified-long-context
Reinforcement Learning
•
Updated
•
4
dshin/flan-t5-ppo
Reinforcement Learning
•
Updated
•
5
SummerSigh/T5-Base-Rule-Of-Thumb-RM
Reinforcement Learning
•
Updated
•
4
dshin/flan-t5-ppo-testing
Reinforcement Learning
•
Updated
•
3
•
1
SummerSigh/T5-Base-EvilPrompterRM
Reinforcement Learning
•
0.2B
•
Updated
•
8
dshin/flan-t5-ppo-testing-violation
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-b
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-h-use-violation
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-f-use-violation
Reinforcement Learning
•
Updated
•
4
dshin/flan-t5-ppo-user-e-use-violation
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-a-use-violation
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-h-batch-size-8-epoch-0
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-e-batch-size-8-epoch-0
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-h-batch-size-8-epoch-0-use-violation
Reinforcement Learning
•
Updated
•
2
dshin/flan-t5-ppo-user-a-batch-size-8-epoch-0
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-f-batch-size-8-epoch-0
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-f-batch-size-8-epoch-0-use-violation
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-e-batch-size-8-epoch-0-use-violation
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-h-batch-size-8-epoch-1
Reinforcement Learning
•
Updated
•
3