Qwen/Qwen3-Coder-480B-A35B-Instruct Pruning

#2
by tomasmcm - opened

Do you think it would be possible to apply a similar recipe to Qwen/Qwen3-Coder-480B-A35B-Instruct ? And maybe create a model with 8 experts specialised in frontend code for example.
480B total parameters ÷ 160 experts ≈ 3B parameters per expert, 8 × 3B ≈ 24B, plus shared components like attention layers, which would be a great size for running locally.

I roughly understand your needs. Let me test it to see the minimum number of experts required to avoid garbled output.

Sign up or log in to comment