PureDocBench shows document parsing is far from solved, with top models at ~74/100, small specialists competing with large VLMs, and ranking reversals under real degradation.
arXiv preprint arXiv:2603.13398 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
baseline 2polarities
baseline 2representative citing papers
GlotOCR Bench shows that OCR models perform well on fewer than 10 scripts and fail to generalize beyond about 30, with results tracking pretraining coverage and models hallucinating from known scripts on unfamiliar ones.
CC-OCR V2 reveals that state-of-the-art large multimodal models substantially underperform on challenging real-world document processing tasks.
citing papers explorer
-
How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings
PureDocBench shows document parsing is far from solved, with top models at ~74/100, small specialists competing with large VLMs, and ranking reversals under real degradation.
-
GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts
GlotOCR Bench shows that OCR models perform well on fewer than 10 scripts and fail to generalize beyond about 30, with results tracking pretraining coverage and models hallucinating from known scripts on unfamiliar ones.
-
CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing
CC-OCR V2 reveals that state-of-the-art large multimodal models substantially underperform on challenging real-world document processing tasks.
- MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing