Usage example

by tseyde - opened 5 days ago

5 days ago

•

Thank you very much for this great release!
Could you provide an example of how to load the reward model and score translations with Seed-X-RM-7B (HF/vLLM)?
A detailed example similar to the one for Seed-X-PPO-7B would be great. Thanks!

YuLu0713

ByteDance Seed org 4 days ago

@tseyde Thank you for your attention! We update a scripts to show how to run Seed-X-RM.

tseyde

4 days ago

Perfect, thank you so much!

tseyde changed discussion status to closed 4 days ago

tseyde

4 days ago

•

edited 4 days ago

A quick follow-up: is the additional <s> tag after the <zh> tag intentional? If I'm seeing this correctly, the tokenized prompt would be:
prompt = "<s> Translate the following English sentence into Chinese:\nMay the force be with you <zh><s> 愿原力与你同在 </s>"

tseyde changed discussion status to open 4 days ago

fringek

4 days ago

•

edited 4 days ago

You r right:) Both "prompt" and "chosen" has a bos token, which are consistent with the training process.

tseyde

4 days ago

Amazing :) Thank you for the detailed instructions!

tseyde changed discussion status to closed 4 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment