GroupQueryAttention operation is not supported
#1
by
DarrenChen
- opened
I tried loading the MobileLLM-R1-140M-ONNX model using ONNX Runtime version 1.19.2 on an Android device, but it reported an error stating that the GroupQueryAttention operation is not supported. How can I resolve this issue?
I compiled ONNXRuntime myself and can now load MobileLLM-R1-140M-ONNX, but I'm encountering issues with text generation. My self-implemented tokenizer consistently fails to output Chinese correctly.