Safetensors
mistral

Usage example

#2
by tseyde - opened

Thank you very much for this great release!
Could you provide an example of how to load the reward model and score translations with Seed-X-RM-7B (HF/vLLM)?
A detailed example similar to the one for Seed-X-PPO-7B would be great. Thanks!

ByteDance Seed org

@tseyde Thank you for your attention! We update a scripts to show how to run Seed-X-RM.

Perfect, thank you so much!

tseyde changed discussion status to closed

A quick follow-up: is the additional <s> tag after the <zh> tag intentional? If I'm seeing this correctly, the tokenized prompt would be:
prompt = "<s> Translate the following English sentence into Chinese:\nMay the force be with you <zh><s> ζ„ΏεŽŸεŠ›δΈŽδ½ εŒεœ¨ </s>"

tseyde changed discussion status to open

You r right:) Both "prompt" and "chosen" has a bos token, which are consistent with the training process.

Amazing :) Thank you for the detailed instructions!

tseyde changed discussion status to closed

Sign up or log in to comment