CiteVQA requires models to cite specific document regions with bounding boxes alongside answers and finds that even the strongest MLLMs frequently cite the wrong region, with top SAA scores of only 76.0 for closed models and 22.5 for open-source ones.
Mmlongbench-doc: Benchmarking long-context document understanding with visualizations.Advances in Neural Information Processing Systems, 37:95963–96010
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2representative citing papers
MPDocBench-Parse provides 433 annotated multi-page documents and an evaluation protocol covering text/table/formula extraction, merging, figure extraction, reading order, and heading hierarchy for realistic document parsing.
citing papers explorer
-
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence
CiteVQA requires models to cite specific document regions with bounding boxes alongside answers and finds that even the strongest MLLMs frequently cite the wrong region, with top SAA scores of only 76.0 for closed models and 22.5 for open-source ones.
-
MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing
MPDocBench-Parse provides 433 annotated multi-page documents and an evaluation protocol covering text/table/formula extraction, merging, figure extraction, reading order, and heading hierarchy for realistic document parsing.