Serve with vLLM
Can this model be served with vllm on RTX 3090? if so, please share the command and values of required parameters.
I may be mistaken, but I think NVFP4 is currently Blackwell only, so that would be a no if true. Would need a 5090, PRO 6000, etc.
Notwithstanding, some documentation would be really helpful on how to run this in vLLM.
GPT-OSS-20b is MXFP4 and that works quite well on RTX 3090. I was wondering if this would too?
vllm serve openai/gpt-oss-20b --served-model-name gpt-oss-20b --dtype auto --max-model-len 32048 --max-num-seqs 8
Honestly unsure if this model works. Can't get the parameters to even load. I'm working on my own, will include instructions when (if) I get it working.
Please share the command to run on vLLM with Blackwell GPUs.
Please share the command to run on vLLM with Blackwell GPUs.
vllm serve openai/gpt-oss-20b --served-model-name gpt-oss-20b --dtype auto --max-model-len 32048 --max-num-seqs 24