`ik_llama.cpp` imatrix MLA Quantizations of DeepSeek-V3-0324

This is an IQ3_KS quant of DeepSeek-V3-0324 using ubergarm's IQ3_KS recipe from ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF and Imatrix file from ubergarm/DeepSeek-V3-0324-GGUF.

This quant collection REQUIRES ik_llama.cpp fork to support advanced non-linear SotA quants and Multi-Head Latent Attention (MLA). Do not download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!

See ubergarm/DeepSeek-V3-0324-GGUF for his other quants and more details about them.

I've uploaded the converted BF16 weights gghfez/DeepSeek-V3-0324-256x21B-BF16 if I, or anyone else wants to create similar quants in the future.

Downloads last month: 33

GGUF

Model size

672B params

Architecture

deepseek2

Hardware compatibility

Model tree for gghfez/DeepSeek-V3-0324-IQ3_KS

Base model

deepseek-ai/DeepSeek-V3-0324

Quantized

(25)

this model

Collection including gghfez/DeepSeek-V3-0324-IQ3_KS

ik_llama quants

Collection

9 items • Updated 8 days ago • 1

ik_llama.cpp imatrix MLA Quantizations of DeepSeek-V3-0324

Model tree for gghfez/DeepSeek-V3-0324-IQ3_KS

Collection including gghfez/DeepSeek-V3-0324-IQ3_KS

`ik_llama.cpp` imatrix MLA Quantizations of DeepSeek-V3-0324