Edit Models filters

Apps

Inference Providers

HF Inference API

Misc

trl-internal-testing/descriptiveness-sentiment-trl-style

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

18

Full-text search

Active filters: trl-internal-testing/descriptiveness-sentiment-trl-style

bikalnetomi/RLHF-PPO-PPOModel-LLama3-1B-v1.0

Text Generation • 1B • Updated Dec 2, 2024 • 1 •

bikalnetomi/RLHF-PPO-PPOModel-LLama3-1B-v1.1

Text Generation • 1B • Updated Dec 2, 2024 •

bikalnetomi/RLHF-PPO-PPOModel-LLama3-1B-v1.3

Text Generation • 1B • Updated Dec 2, 2024 •

bikalnetomi/RLHF-PPO-PPOModel-LLama3-1B-v1.4

Text Generation • 1B • Updated Dec 2, 2024 •

mradermacher/FedPPO-Collaborative-Pythia-70M-a0-GGUF

70.4M • Updated Dec 13, 2024 • 32

mradermacher/FedPPO-Confused-Pythia-70M-a1-GGUF

70.4M • Updated Dec 13, 2024 • 32

mradermacher/FedPPO-Collaborative-Pythia-70M-a1-GGUF

70.4M • Updated Dec 13, 2024 • 98

mradermacher/FedPPO-Isolated-Pythia-70M-a0-GGUF

70.4M • Updated Dec 13, 2024 • 40

mradermacher/FedPPO-Isolated-Pythia-70M-a1-GGUF

70.4M • Updated Dec 13, 2024 • 72

mradermacher/FedPPO-Pythia-70M-a1-GGUF

70.4M • Updated Dec 13, 2024 • 41

mradermacher/FedPPO-Confused-Pythia-70M-a0-GGUF

70.4M • Updated Dec 13, 2024 • 41

mradermacher/FedPPO-Pythia-70M-a0-GGUF

70.4M • Updated Dec 13, 2024 • 11

nologin/ppo

Text Generation • 0.2B • Updated Dec 13, 2024

nileshmalpeddi/ppo

Text Generation • 0.3B • Updated Mar 15 • 15

AMindToThink/ppo

Text Generation • 0.2B • Updated Apr 15

AMindToThink/ppo_push_main_13

Text Generation • 0.2B • Updated Apr 16

AMindToThink/ppo_with_value14

AMindToThink/ppo_with_value15