Can't stop it from speweing garbage after about 1000 tokens...

by SuperbEmphasis - opened Jul 15

Jul 15

My vllm command:

vllm serve /models/Llama-3_3-Nemotron-Super-49B-v1-FP8  --tensor-parallel-size 1 --gpu-memory-utilization 0.95  --quantization modelopt --trust-remote-code --max-model-len 32786

I also tried to add in a jinja llama3 template:

--chat-template /models/templates/llama3.jinja

But for every response... it seems to stream forever...and never stops correctly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment