Draft 0.6B model to pair it with in LMStudio?

#2
by ljupco - opened

Thanks for this one - using it, works well for me.
I'm wondering: LMStudio offers the possibility to pair a bigger "main" model with a smaller "draft" model. The regular 30B-A3B main model, I can pair it it with draft 0.6B model in LMStudio. I'm not sure, but I think there is a speedup there.
I am wondering if a smaller draft model exists, that one can pair with 30B-A6B-16-128K here? To maybe recover some of the speed lost due to doubling the number of experts to A6B-16 (from A3B-8).
I'm not sure how LMStudio decides what models are compatible draft-s to other main-s models, tbh.
Thanks for your help. :-)

Owner

Qwen 3 30B-A3B is a special case -> you will likely reduce speed.
This is because it is already fast - 3B/6B only - and it is a MOE config - 128 experts.
Spec decoding would be a neg here because of the specific MOE structure/size of the experts (350 million or so).

This is the only one -> you can use 0.6B(s) with any of the other Qwen3s to get a speed up.

Thanks! Now you spelled it out - yeah now I tested draft 0.6B on/off with few MoE configs, on the same prompts, and indeed with draft MoE-s run consistently slower. Previously I didn't explicitly test, assuming x5 or x10 params difference (from 0.6B -> to 3B or 6B active params of MoE) would be enough to bring a speedup. But it seems it's not. So - not using draft model for any MoE-s going forward.

Sign up or log in to comment