long outputs

by jacek2024 - opened Jul 22

jacek2024

Jul 22

Could you comment on the issue of the long outputs generated by the latest reasoning models? Are they expected to produce thousands of tokens for each prompt?

igitman

NVIDIA org Jul 22

Yes, these models are expected to think for many tokens before finalizing the answer. We recommend using 64K output tokens. It should be possible to make them more efficient in token usage or even add a controllable token budget with a separate round of RL, but we didn't do it yet.

jacek2024

Jul 23

I see that you changed the eos_token. Will it affect this behaviour?

igitman

NVIDIA org Jul 23

No, it should only affect things if you create a finetuned version of the model. The current models' behavior should stay the same

jacek2024

Jul 23

Could you help clarify what impact this has on generation?

If the model was originally trained to emit 151643 as the eos token, but the runtime now expects 151645, wouldn't that cause a mismatch, where generation might not stop unless the new token happens to be emitted?
Does the model actually emit 151645 under current weights, or was it trained to use 151643? (which EOS token is actually the "correct" one from the model’s point of view?)

Also, as I understand it, this change would require re-exporting the model to GGUF, since llama.cpp converter read these config files. So even though the model weights remain unchanged, a new GGUF would need to be generated to reflect the updated eos_token and its ID.

igitman

NVIDIA org Jul 23

The model would always end with <|im_end|>\n<|endoftext|>, which corresponds to [151645, 198, 151643]. So basically, the new change will stop it 2 tokens before, which shouldn't really matter in most situations. But if you finetune this model on the new data without <|endoftext|>, it will not properly stop without this pr we merged

jacek2024

Jul 23

Thank you

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment