ik_llama.cpp imatrix MLA Quantizations of DeepSeek-V3-0324

This is an IQ3_KS quant of DeepSeek-V3-0324 using ubergarm's IQ3_KS recipe from ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF and Imatrix file from ubergarm/DeepSeek-V3-0324-GGUF.

This quant collection REQUIRES ik_llama.cpp fork to support advanced non-linear SotA quants and Multi-Head Latent Attention (MLA). Do not download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!

See ubergarm/DeepSeek-V3-0324-GGUF for his other quants and more details about them.

I've uploaded the converted BF16 weights gghfez/DeepSeek-V3-0324-256x21B-BF16 if I, or anyone else wants to create similar quants in the future.

Downloads last month
15
GGUF
Model size
672B params
Architecture
deepseek2
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gghfez/DeepSeek-V3-0324-IQ3_KS

Quantized
(19)
this model

Collection including gghfez/DeepSeek-V3-0324-IQ3_KS