Xoron-Dev-MultiMoe-GGUF ?

#1
by Rebis - opened

Hi,
Is a GGUF version planned for the near future ?
Thank you in advance.

Yes, I would love to add full GGUF support. However, the model cannot support it yet because of its architectural complexity. GGUF engines and tools like Llama.cpp, LM Studio, or Ollama would not fully support my architecture or the modifications I have made.
Can I ask why you chose GGUF? Is it for quantization, or for general model use with a server, API, or Ollama?

I want to use it with Ollama. Quantification is not a priority since it seems rather small. I would also like to know if you have considered using Qwen3-VL with mergekit. I saw that it was now supported.

Sign up or log in to comment