llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'

#3
by ljupco - opened

Thanks! Great - trying this now. Anyone else getting

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'

? Afaics I'm builing building from anew, and just pure cpu no cuda no nothing

ljubomir@gigul2(422663.llama.cpp:0):~/llama.cpp$ git pull
Already up-to-date.

ljubomir@gigul2(422663.llama.cpp:0):~/llama.cpp$ gilg

  • git log -C --name-status '--pretty=%h %ae %ai : %s'
    1bbe3e0b ljubomir@gigul2 2025-06-26 18:01:38 +0100 : Merge branch 'master' of https://github.com/ggerganov/llama.cpp
    a01047b0 [email protected] 2025-06-26 13:46:53 -0300 : cmake: regen vulkan shaders when shaders-gen sources change (#14398)

M ggml/src/ggml-vulkan/CMakeLists.txt

so then

cmake . -B ./build
cmake --build build --config Release -j

all built fine

ljubomir@gigul2(422663.llama.cpp:0):~/llama.cpp$ l build/bin/llama-server
-rwx------ 1 ljubomir ljubomir 4.8M Jun 26 18:04 build/bin/llama-server

then DL from

https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF/tree/main

then

build/bin/llama-server --model models/gemma-3n-E4B-it-UD-Q8_K_XL.gguf --temp 1.0 --top_k 64 --top_p 0.95 --min_p 0 --ctx-size 32768 &

ends in

....................................................................
llama_model_loader: - type bf16: 108 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 9.36 GiB (11.71 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/gemma-3n-E4B-it-UD-Q8_K_XL.gguf'
srv load_model: failed to load model, 'models/gemma-3n-E4B-it-UD-Q8_K_XL.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

What am I doing wrong? Anyone else seeing this? Thanks.

Unsloth AI org

Did you update llama.cpp?

Thanks. I did up to

a01047b0 [email protected] 2025-06-26 13:46:53 -0300 : cmake: regen vulkan shaders when shaders-gen sources change (#14398)

but now I pulled again, and I saw an update

8846aace [email protected] 2025-06-26 19:34:02 +0200 : model : gemma3n text-only (#14400)

this -

ljubomir@gigul2(422663.llama.cpp:0):~/llama.cpp$ gilg

M convert_hf_to_gguf.py
M gguf-py/gguf/constants.py
M gguf-py/gguf/gguf_writer.py
M gguf-py/gguf/tensor_mapping.py
M src/llama-arch.cpp
M src/llama-arch.h
M src/llama-graph.cpp
M src/llama-graph.h
M src/llama-hparams.h
M src/llama-kv-cache-unified.cpp
M src/llama-model.cpp
M src/llama-model.h
M src/llama-quant.cpp
1bbe3e0b ljubomir@gigul2 2025-06-26 18:01:38 +0100 : Merge branch 'master' of https://github.com/ggerganov/llama.cpp
a01047b0 [email protected] 2025-06-26 13:46:53 -0300 : cmake: regen vulkan shaders when shaders-gen sources change (#14398)

Ok, all good, it works now - thanks! On a 10 year old, 10 core Xeon, cpu only, got 4 tps - yay! :-)

   eval time =  120907.76 ms /   483 tokens (  250.33 ms per token,     3.99 tokens per second)

Ok, time to try for real on another box. Thanks for you help - excellent stuff. :-)

ljupco changed discussion status to closed

Sign up or log in to comment