Qwen/Qwen3-Coder-480B-A35B-Instruct Pruning

by tomasmcm - opened 5 days ago

5 days ago

Do you think it would be possible to apply a similar recipe to Qwen/Qwen3-Coder-480B-A35B-Instruct ? And maybe create a model with 8 experts specialised in frontend code for example.
480B total parameters ÷ 160 experts ≈ 3B parameters per expert, 8 × 3B ≈ 24B, plus shared components like attention layers, which would be a great size for running locally.

huihui-ai

Owner 5 days ago

I roughly understand your needs. Let me test it to see the minimum number of experts required to avoid garbled output.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment