RLHFlow

university

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

baohao submitted a paper about 1 month ago

Self-Hinting Language Models Enhance Reinforcement Learning

baohao updated a collection 5 months ago

baohao updated a collection 5 months ago

View all activity

Papers

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

View all Papers

RLHFlow 's datasets 88

RLHFlow/iterative-prompt-v1-iter8-20K

Viewer • Updated Jun 12, 2024 • 20k • 12

RLHFlow/iterative-prompt-v1-iter7-20K

Viewer • Updated Jun 12, 2024 • 20k • 19

RLHFlow/iterative-prompt-v1-iter6-20K

Viewer • Updated Jun 12, 2024 • 20k • 7

RLHFlow/iterative-prompt-v1-iter5-20K

Viewer • Updated Jun 12, 2024 • 20k • 15

RLHFlow/iterative-prompt-v1-iter4-20K

Viewer • Updated Jun 12, 2024 • 20k • 25

RLHFlow/pair-preference-dataset-700K

Viewer • Updated May 26, 2024 • 699k • 5 • 3

RLHFlow/test_generation_2k

Viewer • Updated May 12, 2024 • 2k • 29

RLHFlow/SHP-standard

Viewer • Updated May 9, 2024 • 93.3k • 40

RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard

Viewer • Updated May 8, 2024 • 42.3k • 39 • 4

RLHFlow/prompt-collection-v0.1

Viewer • Updated May 8, 2024 • 179k • 59 • 9

RLHFlow/pair-preference-dataset-mix1

Viewer • Updated May 6, 2024 • 548k • 15 • 3

RLHFlow/Prometheus2-preference-standard

Viewer • Updated May 5, 2024 • 200k • 26 • 2

RLHFlow/iterative-prompt-v1-iter3-20K

Viewer • Updated May 3, 2024 • 20k • 10 • 3

RLHFlow/iterative-prompt-v1-iter2-20K

Viewer • Updated May 3, 2024 • 20k • 11 • 3

RLHFlow/iterative-prompt-v1-iter1-20K

Viewer • Updated May 3, 2024 • 20k • 34 • 2

RLHFlow/Argilla-Math-DPO-standard

Viewer • Updated Apr 30, 2024 • 2.42k • 16 • 3

RLHFlow/PKU-SafeRLHF-30K-standard

Viewer • Updated Apr 29, 2024 • 26.9k • 15 • 3

RLHFlow/prm80k-phase2

Viewer • Updated Apr 28, 2024 • 79.5k • 21 • 4

RLHFlow/mix3

Preview • Updated Apr 28, 2024 • 5 • 1

RLHFlow/UltraInteract-filtered-standard

Viewer • Updated Apr 28, 2024 • 162k • 6 • 2

RLHFlow/Capybara-distibalel-Filter-standard

Viewer • Updated Apr 28, 2024 • 14.8k • 15

RLHFlow/Orca-distibalel-standard

Viewer • Updated Apr 28, 2024 • 6.93k • 26 • 1

RLHFlow/Helpsteer-preference-standard

Viewer • Updated Apr 27, 2024 • 37.1k • 29 • 6

RLHFlow/UltraFeedback-preference-standard

Viewer • Updated Apr 27, 2024 • 340k • 115 • 14

RLHFlow/HH-RLHF-Helpful-standard

Viewer • Updated Apr 27, 2024 • 115k • 979 • 4

RLHFlow/CodeUltraFeedback-standard

Viewer • Updated Apr 27, 2024 • 50.2k • 36 • 5

RLHFlow/SFT-OpenHermes-2.5-Standard

Viewer • Updated Apr 24, 2024 • 1M • 71 • 3

RLHFlow/pair_preference_model_dataset

Viewer • Updated Apr 20, 2024 • 699k • 39 • 6