llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'
Thanks! Great - trying this now. Anyone else getting
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'
? Afaics I'm builing building from anew, and just pure cpu no cuda no nothing
ljubomir@gigul2(422663.llama.cpp:0):~/llama.cpp$ git pull
Already up-to-date.
ljubomir@gigul2(422663.llama.cpp:0):~/llama.cpp$ gilg
- git log -C --name-status '--pretty=%h %ae %ai : %s'
1bbe3e0b ljubomir@gigul2 2025-06-26 18:01:38 +0100 : Merge branch 'master' of https://github.com/ggerganov/llama.cpp
a01047b0 [email protected] 2025-06-26 13:46:53 -0300 : cmake: regen vulkan shaders when shaders-gen sources change (#14398)
M ggml/src/ggml-vulkan/CMakeLists.txt
so then
cmake . -B ./build
cmake --build build --config Release -j
all built fine
ljubomir@gigul2(422663.llama.cpp:0):~/llama.cpp$ l build/bin/llama-server
-rwx------ 1 ljubomir ljubomir 4.8M Jun 26 18:04 build/bin/llama-server
then DL from
https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF/tree/main
then
build/bin/llama-server --model models/gemma-3n-E4B-it-UD-Q8_K_XL.gguf --temp 1.0 --top_k 64 --top_p 0.95 --min_p 0 --ctx-size 32768 &
ends in
....................................................................
llama_model_loader: - type bf16: 108 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 9.36 GiB (11.71 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/gemma-3n-E4B-it-UD-Q8_K_XL.gguf'
srv load_model: failed to load model, 'models/gemma-3n-E4B-it-UD-Q8_K_XL.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
What am I doing wrong? Anyone else seeing this? Thanks.
Did you update llama.cpp?
Thanks. I did up to
a01047b0 [email protected] 2025-06-26 13:46:53 -0300 : cmake: regen vulkan shaders when shaders-gen sources change (#14398)
but now I pulled again, and I saw an update
8846aace [email protected] 2025-06-26 19:34:02 +0200 : model : gemma3n text-only (#14400)
this -
ljubomir@gigul2(422663.llama.cpp:0):~/llama.cpp$ gilg
- git log -C --name-status '--pretty=%h %ae %ai : %s'
68049232 ljubomir@gigul2 2025-06-26 18:37:18 +0100 : Merge branch 'master' of https://github.com/ggerganov/llama.cpp
8846aace [email protected] 2025-06-26 19:34:02 +0200 : model : gemma3n text-only (#14400)
M convert_hf_to_gguf.py
M gguf-py/gguf/constants.py
M gguf-py/gguf/gguf_writer.py
M gguf-py/gguf/tensor_mapping.py
M src/llama-arch.cpp
M src/llama-arch.h
M src/llama-graph.cpp
M src/llama-graph.h
M src/llama-hparams.h
M src/llama-kv-cache-unified.cpp
M src/llama-model.cpp
M src/llama-model.h
M src/llama-quant.cpp
1bbe3e0b ljubomir@gigul2 2025-06-26 18:01:38 +0100 : Merge branch 'master' of https://github.com/ggerganov/llama.cpp
a01047b0 [email protected] 2025-06-26 13:46:53 -0300 : cmake: regen vulkan shaders when shaders-gen sources change (#14398)
Ok, all good, it works now - thanks! On a 10 year old, 10 core Xeon, cpu only, got 4 tps - yay! :-)
eval time = 120907.76 ms / 483 tokens ( 250.33 ms per token, 3.99 tokens per second)
Ok, time to try for real on another box. Thanks for you help - excellent stuff. :-)