Omnidocbench: Benchmarking diverse pdf document parsing with comprehensive annotations, 2025

Linke Ouyang, Yuan Qu, Hongbin Zhou, Jiawei Zhu, Rui Zhang, Qunshu Lin, Bin Wang, Zhiyuan Zhao, Man Jiang, Xiaomeng Zhao, Jin Shi, Fan Wu, Pei Chu, Minghao Liu, Zhenxiang Li, Chao Xu, Bo Zhang, Botian Shi, Zhongying Tu, Conghui He · 2025 · arXiv 2412.07626

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

background 2 dataset 2

citation-polarity summary

background 2 use dataset 2

representative citing papers

The Stanford EDGAR Filings Dataset: Reconstructing U.S. Corporate and Financial Disclosures into Layout-Faithful and Token-Efficient Pretraining Data

cs.AI · 2026-06-16 · unverdicted · novelty 7.0

SEFD reconstructs SEC filings into MultiMarkdown to create a 152B-token financial pretraining corpus with low overlap to existing data and introduces EDGAR-Forecast and EDGAR-OCR benchmarks.

RealDocBench: A Benchmark for Field-Level QA and Layout Understanding on Real-World Regulated Documents

cs.CV · 2026-06-05 · unverdicted · novelty 7.0

RealDocBench supplies 1,356 field-level QA questions over 581 real documents and 1,500 annotated pages, evaluating 18 systems on per-field accuracy, cost, and latency.

Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding

cs.CV · 2026-05-19 · conditional · novelty 7.0

Injecting pre-computed layout priors from RT-DETR into VLM prompts raises markdown F1 from 0.37 to 0.92 on a 10k-page OOD benchmark and cuts infinite-loop failures across domains.

MedStruct-S: A Benchmark for Key Discovery, Key-Conditioned QA and Semi-Structured Extraction from OCR Clinical Reports

cs.CL · 2026-05-04 · unverdicted · novelty 7.0

MedStruct-S benchmark shows encoder-only models outperform larger decoder-only ones on key-conditioned QA from noisy OCR clinical reports, with fine-tuned large models winning only when scale is ignored.

FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR

cs.CV · 2025-11-19 · unverdicted · novelty 7.0

FinCriticalED benchmark reveals that OCR and MLLM systems frequently fail to preserve critical financial facts such as numbers and monetary units even when lexical accuracy is high.

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

cs.CV · 2024-12-31 · accept · novelty 7.0

OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.

RT-DocLayout: Real-Time End-to-End Document Layout Analysis with Reading Order in the Wild

cs.CV · 2026-06-22 · unverdicted · novelty 5.0

Presents RT-DocLayout, a 33M-parameter end-to-end model extending RT-DETR that unifies layout classification, detection, segmentation, and reading-order prediction at 132.1 FPS with claimed SOTA results on public benchmarks.

ABot-OCR Technical Report

cs.CV · 2026-05-27 · unverdicted · novelty 5.0

ABot-OCR is a new end-to-end VLM for direct image-to-Markdown transcription using a custom data engine and structure-constrained RL optimization, reporting SOTA scores of 92.81/93.30 on OmniDocBench v1.5/v1.6.

Kimi K2.5: Visual Agentic Intelligence

cs.CL · 2026-02-02 · unverdicted · novelty 5.0

Kimi K2.5 combines joint text-vision training with an Agent Swarm parallel orchestration framework to reach claimed state-of-the-art results on coding, vision, reasoning, and agent tasks while cutting latency up to 4.5 times.

Qwen2.5-VL Technical Report

cs.CV · 2025-02-19 · unverdicted · novelty 5.0

Qwen2.5-VL reports a vision-language model family using native dynamic-resolution ViT and absolute time encoding that matches GPT-4o on document and diagram tasks while supporting hour-long videos with second-level localization.

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

cs.AI · 2025-04-28 · accept · novelty 4.0

A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

cs.MM · 2024-10-28 · unverdicted · novelty 3.0

Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.

citing papers explorer

Showing 9 of 9 citing papers after filters.

The Stanford EDGAR Filings Dataset: Reconstructing U.S. Corporate and Financial Disclosures into Layout-Faithful and Token-Efficient Pretraining Data cs.AI · 2026-06-16 · unverdicted · none · ref 18
SEFD reconstructs SEC filings into MultiMarkdown to create a 152B-token financial pretraining corpus with low overlap to existing data and introduces EDGAR-Forecast and EDGAR-OCR benchmarks.
RealDocBench: A Benchmark for Field-Level QA and Layout Understanding on Real-World Regulated Documents cs.CV · 2026-06-05 · unverdicted · none · ref 18
RealDocBench supplies 1,356 field-level QA questions over 581 real documents and 1,500 annotated pages, evaluating 18 systems on per-field accuracy, cost, and latency.
MedStruct-S: A Benchmark for Key Discovery, Key-Conditioned QA and Semi-Structured Extraction from OCR Clinical Reports cs.CL · 2026-05-04 · unverdicted · none · ref 16
MedStruct-S benchmark shows encoder-only models outperform larger decoder-only ones on key-conditioned QA from noisy OCR clinical reports, with fine-tuned large models winning only when scale is ignored.
FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR cs.CV · 2025-11-19 · unverdicted · none · ref 25
FinCriticalED benchmark reveals that OCR and MLLM systems frequently fail to preserve critical financial facts such as numbers and monetary units even when lexical accuracy is high.
RT-DocLayout: Real-Time End-to-End Document Layout Analysis with Reading Order in the Wild cs.CV · 2026-06-22 · unverdicted · none · ref 25
Presents RT-DocLayout, a 33M-parameter end-to-end model extending RT-DETR that unifies layout classification, detection, segmentation, and reading-order prediction at 132.1 FPS with claimed SOTA results on public benchmarks.
ABot-OCR Technical Report cs.CV · 2026-05-27 · unverdicted · none · ref 37
ABot-OCR is a new end-to-end VLM for direct image-to-Markdown transcription using a custom data engine and structure-constrained RL optimization, reporting SOTA scores of 92.81/93.30 on OmniDocBench v1.5/v1.6.
Kimi K2.5: Visual Agentic Intelligence cs.CL · 2026-02-02 · unverdicted · none · ref 45
Kimi K2.5 combines joint text-vision training with an Agent Swarm parallel orchestration framework to reach claimed state-of-the-art results on coding, vision, reasoning, and agent tasks while cutting latency up to 4.5 times.
Qwen2.5-VL Technical Report cs.CV · 2025-02-19 · unverdicted · none · ref 20
Qwen2.5-VL reports a vision-language model family using native dynamic-resolution ViT and absolute time encoding that matches GPT-4o on document and diagram tasks while supporting hour-long videos with second-level localization.
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction cs.MM · 2024-10-28 · unverdicted · none · ref 174
Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.

Omnidocbench: Benchmarking diverse pdf document parsing with comprehensive annotations, 2025

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer