ONNX model output

#38

by Riddler2024 - opened Sep 29, 2024

Riddler2024

Sep 29, 2024

ONNX output is different from the output of Transformers and SentenceTransformer.

I checked the onnx model using netron.app, and the text_embeds, and 13049 corresponds to the sequence_output and pooled_output of the XLMRobertaModel. However, when I compared it with the model loaded from model.safetensors, I found that the results were different.

jupyterjazz

Jina AI org Oct 2, 2024

Hi @Riddler2024 , have you tried running inference the way we demonstrate in the README? If there's a slight difference, it could be because ONNX uses fp32, while ST or HF may use bf16 when running in a GPU environment. If this doesn't resolve the issue, please share a code snippet so I can reproduce the behavior.

Riddler2024

Oct 8, 2024

I am using the example code from the README. I extracted the onnx model output 13049 and compared it with the output of XLMRobertaModel.forwardin the xlm-roberta-flash-implementation repository, and all outputs were not normalized.

jupyterjazz

Jina AI org Oct 10, 2024

@Riddler2024 , ok I see the issue. ONNX model mimics the forward function, which doesn’t apply any normalization by itself, however both HF encode and SentenceTransformers include a normalization step. This is why the outputs differ. I’d suggest applying the normalization yourself after running inference. Would that be convenient for your application?

Riddler2024 changed discussion status to closed Oct 18, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment