GroupQueryAttention operation is not supported

#1
by DarrenChen - opened

I tried loading the MobileLLM-R1-140M-ONNX model using ONNX Runtime version 1.19.2 on an Android device, but it reported an error stating that the GroupQueryAttention operation is not supported. How can I resolve this issue?

I compiled ONNXRuntime myself and can now load MobileLLM-R1-140M-ONNX, but I'm encountering issues with text generation. My self-implemented tokenizer consistently fails to output Chinese correctly.

Sign up or log in to comment