Serve with vLLM

by faheemraza1 - opened Sep 7

Discussion

faheemraza1

Sep 7

Can this model be served with vllm on RTX 3090? if so, please share the command and values of required parameters.

Bellesteck

Sep 14

I may be mistaken, but I think NVFP4 is currently Blackwell only, so that would be a no if true. Would need a 5090, PRO 6000, etc.

Notwithstanding, some documentation would be really helpful on how to run this in vLLM.

faheemraza1

Sep 15

GPT-OSS-20b is MXFP4 and that works quite well on RTX 3090. I was wondering if this would too?

vllm serve openai/gpt-oss-20b --served-model-name gpt-oss-20b --dtype auto --max-model-len 32048 --max-num-seqs 8

Bellesteck

Sep 15

Honestly unsure if this model works. Can't get the parameters to even load. I'm working on my own, will include instructions when (if) I get it working.

https://huggingface.co/Bellesteck/Qwen3-30B-A3B-NVFP4-vLLM

ndgayan

22 days ago

Please share the command to run on vLLM with Blackwell GPUs.

faheemraza1

21 days ago

Please share the command to run on vLLM with Blackwell GPUs.

vllm serve openai/gpt-oss-20b --served-model-name gpt-oss-20b --dtype auto --max-model-len 32048 --max-num-seqs 24

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment