<s> token
#65
by
Muennighoff
- opened
The <s> token (bos token) is never used during pre-training right? (@stas maybe?)
Afaik we only use </s> (eos token) sparingly after documents
Want to try using <s> as a sep token for fine-tuning cc
@TimeRobber
Never is a strong word because if the pretraining dataset holds some <s> occurences it's going to be considered as <bos> but I'd say there shouldn't be many tokens in pretraining dataset that match that.