ik_llama quants
Collection
6 items
•
Updated
ik_llama.cpp
imatrix MLA Quantizations of DeepSeek-V3-0324
This is an IQ2_KS quant of DeepSeek-V3-0324 using ubergarm's IQ2_KS recipe from ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF and Imatrix file from ubergarm/DeepSeek-V3-0324-GGUF.
This quant collection REQUIRES ik_llama.cpp fork to support advanced non-linear SotA quants and Multi-Head Latent Attention (MLA). Do not download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
See ubergarm/DeepSeek-V3-0324-GGUF for his other quants and more details about them.
I've uploaded the converted BF16 weights gghfez/DeepSeek-V3-0324-256x21B-BF16 if I, or anyone else wants to create similar quants in the future.
TODO: fix links, etc in the model card.
2-bit
Base model
deepseek-ai/DeepSeek-V3-0324