This model is trained using OpenRLHF.

Looking to dive deeper into LLMs? Explore learning resources, tutorials, and guides at AI Roadmap โ€“ your go-to platform for practical LLM training and deployment knowledge.

  1. Initialized from SmolLM2-135M-Instruct model.
  2. Using dataset preference_dataset_mixture2_and_safe_pku. Shuffled by dataset.shuffle(seed=42). After shuffling:
    • 480k used for training
    • 4k used for evaluation
    • 40k used for final testing
  3. Training using one RTX 3090, for total 29 hours.
  4. Training command:
    deepspeed --module openrlhf.cli.train_rm \
      --max_len 2048 \
      --dataset ./dataset/train \
      --eval_dataset ./dataset/eval \
      --chosen_key chosen \
      --rejected_key rejected \
      --apply_chat_template \
      --train_batch_size 8 \
      --micro_train_batch_size 8 \
      --pretrain HuggingFaceTB/SmolLM2-135M-Instruct \
      --save_path ./checkpoint/SmolLM2-135M-Reward \
      --save_steps 1000 \
      --logging_steps 1 \
      --eval_steps 1000 \
      --zero_stage 0 \
      --max_epochs 1 \
      --bf16 \
      --learning_rate 9e-6 \
      --use_wandb your_40_digit_wandb_token \
      --wandb_project OpenRLHF_rm_train \
      --wandb_run_name qwen3-0.6B-SFT \
      --gradient_checkpointing
    
  5. To run the model, please use:
    from transformers import AutoTokenizer
    from openrlhf.models import get_llm_for_sequence_regression
       
    pretrain = "AI-Roadmap/SmolLM2-135M-rm-60k"
    
    model = get_llm_for_sequence_regression(
      model_name_or_path=pretrain,
      model_type="reward",
      use_flash_attention_2=False,
      bf16=True,
      init_value_head=False,
    )
    
    tokenizer = AutoTokenizer.from_pretrained(
      pretrain,
      trust_remote_code=True,
    )
    
    model.eval().to("cuda")
    
  6. Test accuracy: 0.764425, chosen reward: -0.032832, reject reward: -1.620852

Want to learn how to build and fine-tune models like this? Visit AI Roadmap for more learning materials and LLM insights to supercharge your AI journey!

Downloads last month
6
Safetensors
Model size
0.1B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support