Has anyone ever gotten this to work?

#178
by cob05 - opened

Do I just have bad luck? I've tried a bunch of repos (most recently THUDM/SWE-Dev-9B) and have always had it error out at some point.

Well I reported here exactly when the error happens and also wrote that it worked in the past.
https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/158

But people keep opening new discussions or making new comments instead of voting it up so this place became a mess. Like you for example don't even tell us what your error is so I have to guess that it's the same I reported already.

I guess the project is abandoned if it wasn't fixed by now.

For those who need features like local Windows support, lower-bit IQ quants, and a download-before-upload workflow, I've created an enhanced fork of this script.

You can find it here: https://huggingface.co/spaces/Fentible/gguf-repo-suite

Clone the repo to your own HF Space or locally using the Quick Start guides.

I could not get it to work on free HF spaces but it might be possible with a rented space. I tested on Windows 10 and made some quants for gemma 3 abliterated by mlabonne.

The bug: ggml-rpc.dll is very finnicky and it may require you to compile your own version of llama-imatrix to fix.

Offline needed for 27B+

Worked fine for me, I now have a Q8_0 copy of Pixtral 12B Lumimaid.

From https://huggingface.co/mrcuddle/Lumimaid-v0.2-12B-Pixtral to https://huggingface.co/Koitenshin/Lumimaid-v0.2-12B-Pixtral-Q8_0-GGUF

Did every quant option available using this space. Now available at https://huggingface.co/Koitenshin/Lumimaid_VISION-v0.2-12B-Pixtral-GGUF

in just a couple minutes. No mucking about with setting up my own environments, compiling llama.cpp, etc.

Another attempt, another failure...

Error converting to fp16: INFO:hf-to-gguf:Loading model: granite-vision-3.3-2b-embedding
WARNING:hf-to-gguf:Failed to load model config from downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding: The repository downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/user/app/downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding .
 You can inspect the repository content at https://hf.co/downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: GraniteForCausalLM
WARNING:hf-to-gguf:Failed to load model config from downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding: The repository downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/user/app/downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding .
 You can inspect the repository content at https://hf.co/downloads/tmpp4dy37n8/granite-vision-3.3-2b-embedding.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00003.safetensors'
Traceback (most recent call last):
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8595, in <module>
    main()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8589, in main
    model_instance.write()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 410, in write
    self.prepare_tensors()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2126, in prepare_tensors
    super().prepare_tensors()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 277, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2036, in modify_tensors
    n_head = self.hparams["num_attention_heads"]
             ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'num_attention_heads'

Tried again... Failed again.

Error converting to fp16: INFO:hf-to-gguf:Loading model: MiniCPM-V-4
WARNING:hf-to-gguf:Failed to load model config from downloads/tmp9t9m7d0a/MiniCPM-V-4: The repository downloads/tmp9t9m7d0a/MiniCPM-V-4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/user/app/downloads/tmp9t9m7d0a/MiniCPM-V-4 .
 You can inspect the repository content at https://hf.co/downloads/tmp9t9m7d0a/MiniCPM-V-4.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: MiniCPMV
ERROR:hf-to-gguf:Model MiniCPMV is not supported

Another try, and it fails once again... Never gotten it to work.

Error converting to fp16: INFO:hf-to-gguf:Loading model: Ovis2.5-9B
WARNING:hf-to-gguf:Failed to load model config from downloads/tmp6nkpckoz/Ovis2.5-9B: The repository downloads/tmp6nkpckoz/Ovis2.5-9B contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/user/app/downloads/tmp6nkpckoz/Ovis2.5-9B .
 You can inspect the repository content at https://hf.co/downloads/tmp6nkpckoz/Ovis2.5-9B.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: Qwen3ForCausalLM
WARNING:hf-to-gguf:Failed to load model config from downloads/tmp6nkpckoz/Ovis2.5-9B: The repository downloads/tmp6nkpckoz/Ovis2.5-9B contains custom code which must be executed to correctly load the model. You can inspect the repository content at /home/user/app/downloads/tmp6nkpckoz/Ovis2.5-9B .
 You can inspect the repository content at https://hf.co/downloads/tmp6nkpckoz/Ovis2.5-9B.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00004.safetensors'
Traceback (most recent call last):
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8788, in <module>
    main()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8782, in main
    model_instance.write()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 425, in write
    self.prepare_tensors()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 292, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2923, in modify_tensors
    yield from super().modify_tensors(data_torch, name, bid)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 260, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 251, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'llm.lm_head.weight'

@cob05

You're intentionally testing it on models that Llama.cpp doesn't support yet, of course it's not going to work.

@cob05

You're intentionally testing it on models that Llama.cpp doesn't support yet, of course it's not going to work.

It's interesting that those 'unsupported' models have GGUF quants available though. This space literally says pick a repo and it will convert it to GGUF. What am I missing? Maybe they need to specify which models work and which don't so I stop wasting my time.

cob05 changed discussion status to closed

Those quants are from people quantizing it on their own machines, most likely in a sandboxed environment due to the remote code.

Sign up or log in to comment