Spaces:
Running
on
A10G
Has anyone ever gotten this to work?
Do I just have bad luck? I've tried a bunch of repos (most recently THUDM/SWE-Dev-9B) and have always had it error out at some point.
Well I reported here exactly when the error happens and also wrote that it worked in the past.
https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/158
But people keep opening new discussions or making new comments instead of voting it up so this place became a mess. Like you for example don't even tell us what your error is so I have to guess that it's the same I reported already.
I guess the project is abandoned if it wasn't fixed by now.
For those who need features like local Windows support, lower-bit IQ quants, and a download-before-upload workflow, I've created an enhanced fork of this script.
You can find it here: https://huggingface.co/spaces/Fentible/gguf-repo-suite
Clone the repo to your own HF Space or locally using the Quick Start guides.
I could not get it to work on free HF spaces but it might be possible with a rented space. I tested on Windows 10 and made some quants for gemma 3 abliterated by mlabonne.
The bug: ggml-rpc.dll
is very finnicky and it may require you to compile your own version of llama-imatrix
to fix.
Offline needed for 27B+
Worked fine for me, I now have a Q8_0 copy of Pixtral 12B Lumimaid.
From https://huggingface.co/mrcuddle/Lumimaid-v0.2-12B-Pixtral to https://huggingface.co/Koitenshin/Lumimaid-v0.2-12B-Pixtral-Q8_0-GGUF
Did every quant option available using this space. Now available at https://huggingface.co/Koitenshin/Lumimaid_VISION-v0.2-12B-Pixtral-GGUF
in just a couple minutes. No mucking about with setting up my own environments, compiling llama.cpp, etc.
Another attempt, another failure...
Error converting to fp16: INFO:hf-to-gguf:Loading model: granite-vision-3.3-2b-embedding
WARNING:hf-to-gguf:Failed to load model config from downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding: The repository downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/user/app/downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding .
You can inspect the repository content at https://hf.co/downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: GraniteForCausalLM
WARNING:hf-to-gguf:Failed to load model config from downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding: The repository downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/user/app/downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding .
You can inspect the repository content at https://hf.co/downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00003.safetensors'
Traceback (most recent call last):
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8595, in <module>
main()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8589, in main
model_instance.write()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 410, in write
self.prepare_tensors()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2126, in prepare_tensors
super().prepare_tensors()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 277, in prepare_tensors
for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2036, in modify_tensors
n_head = self.hparams["num_attention_heads"]
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'num_attention_heads'
Tried again... Failed again.
Error converting to fp16: INFO:hf-to-gguf:Loading model: MiniCPM-V-4
WARNING:hf-to-gguf:Failed to load model config from downloads/tmp9t9m7d0a/MiniCPM-V-4: The repository downloads/tmp9t9m7d0a/MiniCPM-V-4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/user/app/downloads/tmp9t9m7d0a/MiniCPM-V-4 .
You can inspect the repository content at https://hf.co/downloads/tmp9t9m7d0a/MiniCPM-V-4.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: MiniCPMV
ERROR:hf-to-gguf:Model MiniCPMV is not supported
Another try, and it fails once again... Never gotten it to work.
Error converting to fp16: INFO:hf-to-gguf:Loading model: Ovis2.5-9B
WARNING:hf-to-gguf:Failed to load model config from downloads/tmp6nkpckoz/Ovis2.5-9B: The repository downloads/tmp6nkpckoz/Ovis2.5-9B contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/user/app/downloads/tmp6nkpckoz/Ovis2.5-9B .
You can inspect the repository content at https://hf.co/downloads/tmp6nkpckoz/Ovis2.5-9B.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: Qwen3ForCausalLM
WARNING:hf-to-gguf:Failed to load model config from downloads/tmp6nkpckoz/Ovis2.5-9B: The repository downloads/tmp6nkpckoz/Ovis2.5-9B contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/user/app/downloads/tmp6nkpckoz/Ovis2.5-9B .
You can inspect the repository content at https://hf.co/downloads/tmp6nkpckoz/Ovis2.5-9B.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00004.safetensors'
Traceback (most recent call last):
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8788, in <module>
main()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8782, in main
model_instance.write()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 425, in write
self.prepare_tensors()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 292, in prepare_tensors
for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2923, in modify_tensors
yield from super().modify_tensors(data_torch, name, bid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 260, in modify_tensors
return [(self.map_tensor_name(name), data_torch)]
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 251, in map_tensor_name
raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'llm.lm_head.weight'
You're intentionally testing it on models that Llama.cpp doesn't support yet, of course it's not going to work.
You're intentionally testing it on models that Llama.cpp doesn't support yet, of course it's not going to work.
It's interesting that those 'unsupported' models have GGUF quants available though. This space literally says pick a repo and it will convert it to GGUF. What am I missing? Maybe they need to specify which models work and which don't so I stop wasting my time.
Those quants are from people quantizing it on their own machines, most likely in a sandboxed environment due to the remote code.