Serve with vLLM

#1
by faheemraza1 - opened

Can this model be served with vllm on RTX 3090? if so, please share the command and values of required parameters.

I may be mistaken, but I think NVFP4 is currently Blackwell only, so that would be a no if true. Would need a 5090, PRO 6000, etc.

Notwithstanding, some documentation would be really helpful on how to run this in vLLM.

GPT-OSS-20b is MXFP4 and that works quite well on RTX 3090. I was wondering if this would too?

vllm serve openai/gpt-oss-20b --served-model-name gpt-oss-20b --dtype auto --max-model-len 32048 --max-num-seqs 8

Honestly unsure if this model works. Can't get the parameters to even load. I'm working on my own, will include instructions when (if) I get it working.

https://huggingface.co/Bellesteck/Qwen3-30B-A3B-NVFP4-vLLM

Please share the command to run on vLLM with Blackwell GPUs.

Please share the command to run on vLLM with Blackwell GPUs.

vllm serve openai/gpt-oss-20b --served-model-name gpt-oss-20b --dtype auto --max-model-len 32048 --max-num-seqs 24

Sign up or log in to comment