Usage example
#2
by
tseyde
- opened
Thank you very much for this great release!
Could you provide an example of how to load the reward model and score translations with Seed-X-RM-7B (HF/vLLM)?
A detailed example similar to the one for Seed-X-PPO-7B would be great. Thanks!
Perfect, thank you so much!
tseyde
changed discussion status to
closed
A quick follow-up: is the additional <s> tag after the <zh> tag intentional? If I'm seeing this correctly, the tokenized prompt would be:
prompt = "<s> Translate the following English sentence into Chinese:\nMay the force be with you <zh><s> ζΏεεδΈδ½ εε¨ </s>"
tseyde
changed discussion status to
open
You r right:) Both "prompt" and "chosen" has a bos token, which are consistent with the training process.
Amazing :) Thank you for the detailed instructions!
tseyde
changed discussion status to
closed