Can't stop it from speweing garbage after about 1000 tokens...

#2
by SuperbEmphasis - opened

My vllm command:

vllm serve /models/Llama-3_3-Nemotron-Super-49B-v1-FP8  --tensor-parallel-size 1 --gpu-memory-utilization 0.95  --quantization modelopt --trust-remote-code --max-model-len 32786

I also tried to add in a jinja llama3 template:

--chat-template /models/templates/llama3.jinja

But for every response... it seems to stream forever...and never stops correctly.
image.png

Sign up or log in to comment