nabeelshan
/

rlhf-gpt2-pipeline

Text Generation

reinforcement-learning

instruction-tuning

Model card Files Files and versions

rlhf-gpt2-pipeline / reward_model_final /merges.txt

Nabeel Shan

Added SFT, Reward Model, and PPO-Aligned Model

46724ea about 2 months ago

history contribute delete

456 kB

File too large to display, you can check the raw version instead.