Add OlmOCRBench evaluation results

However, the reported metrics look a bit unusual to us. We’re planning to rerun the evaluation on our side using the official SDK to double-check the results. We’ll follow up once we’ve reproduced and verified the numbers.

Thanks again for the effort!

iyuge2

Z.ai org 2 days ago

Also, could you confirm the inference setup you used? For example, did you run inference via the MaaS API, or through the SDK provided in our GitHub repo (https://github.com/zai-org/GLM-OCR)? Knowing the exact setup would help us reproduce the evaluation more accurately.

iyuge2

Z.ai org 1 day ago

It seems the evaluation was run using the ZAI API for inference. We’ll try reproducing the results with the same setup on our side. Thanks!

staghado

1 day ago

Thanks for looking into this! Here's what I did:

I used the ZAI Python SDK (zai-sdk==0.2.2) with the layout_parsing.create endpoint. The olmOCR-bench PDFs were pre-rendered to PNG at 200 DPI with a max side length of 1540px (aspect ratio preserved; native resolution kept if smaller). Each image was processed 3 times and test pass rates were averaged across repeats. I then ran the official olmocr.bench.benchmark evaluation script with the standard test JSONL files.

For context, I had previously run GLM-OCR standalone using vLLM with just the "Text Recognition:" prompt (no layout detection), which scored 67.5% overall (excl. h&f). The per-category scores largely match between the two setups, except for tables (42.5% → 77.6%) — which makes sense since the API includes layout detection that routes table regions to the "Table Recognition:" prompt. The other categories see only minor differences, confirming that the evaluation is correct.

Category	vLLM (w/o layout)	ZAI API (with layout)
arxiv_math	80.4%	80.7%
multi_column	79.9%	76.7%
old_scans_math	74.9%	68.3%
old_scans	39.9%	37.6%
long_tiny_text	87.6%	86.9%
table_tests	42.5%	77.6%
Overall (excl. h&f)	67.5%	75.2%

The full extraction script is available as a gist.

Hope this helps reproduce!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment