Qwen/Qwen3-Coder-480B-A35B-Instruct Pruning
#2
by
tomasmcm
- opened
Do you think it would be possible to apply a similar recipe to Qwen/Qwen3-Coder-480B-A35B-Instruct ? And maybe create a model with 8 experts specialised in frontend code for example.
480B total parameters ÷ 160 experts ≈ 3B parameters per expert, 8 × 3B ≈ 24B, plus shared components like attention layers, which would be a great size for running locally.
I roughly understand your needs. Let me test it to see the minimum number of experts required to avoid garbled output.