hub Mixed citations

Monkeyocr: Document parsing with a structure-recognition-relation triplet paradigm

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm , author= · 2025 · arXiv 2506.05218

Mixed citation behavior. Most common role is background (62%).

16 Pith papers citing it

Background 62% of classified citations

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 baseline 3

citation-polarity summary

background 5 baseline 3

representative citing papers

How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings

cs.CV · 2026-05-08 · conditional · novelty 8.0

PureDocBench shows document parsing is far from solved, with top models at ~74/100, small specialists competing with large VLMs, and ranking reversals under real degradation.

MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

MPDocBench-Parse provides a 3,246-page benchmark and evaluation protocol for multi-page document parsing that tests text/table/formula extraction, merging, figure handling, reading order, and heading hierarchy.

GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

GlotOCR Bench shows that OCR models perform well on fewer than 10 scripts and fail to generalize beyond about 30, with results tracking pretraining coverage and models hallucinating from known scripts on unfamiliar ones.

The Character Error Vector: Decomposable errors for page-level OCR evaluation

cs.CV · 2026-04-07 · conditional · novelty 7.0

The Character Error Vector is a decomposable bag-of-characters evaluator for page-level OCR that remains defined under parsing errors and bridges parsing metrics with local CER.

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.

DocAtlas: Multilingual Document Understanding Across 80+ Languages

cs.CL · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

DocAtlas introduces model-free rendering pipelines to create DocTag-annotated datasets across 82 languages and shows DPO adaptation improves multilingual performance without base-language degradation.

InstructTable: Improving Table Structure Recognition Through Instructions

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

InstructTable combines instruction-guided pre-training on structural patterns with visual fine-tuning and a template-free synthetic data generator (TME) to reach state-of-the-art table structure recognition on public benchmarks and a new complex-table test set.

Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.

Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

cs.CV · 2026-03-25 · conditional · novelty 6.0

PaddleOCR-VL uses a Valid Region Focus Module to select key visual tokens and a 0.9B model for guided recognition, delivering SOTA document parsing with far fewer tokens and parameters.

Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training

cs.CV · 2026-03-25 · unverdicted · novelty 6.0

A realistic scene synthesis strategy and document-aware training recipe enable a 1B-parameter MLLM to achieve superior accuracy and robustness in end-to-end parsing of real-world captured documents.

Logics-Parsing-Omni Technical Report

cs.AI · 2026-03-10 · unverdicted · novelty 6.0

Omni Parsing framework converts complex multimodal signals into locatable, enumerable, and traceable structured knowledge via hierarchical detection, recognition, and interpreting with strict evidence alignment.

Thinking with Drafting: Optical Decompression via Logical Reconstruction

cs.CL · 2026-02-12 · unverdicted · novelty 6.0

Thinking with Drafting reconceptualizes visual reasoning as optical decompression by forcing models to draft mental models into executable DSL code for deterministic self-verification on the VisAlg benchmark.

DeepSeek-OCR: Contexts Optical Compression

cs.CV · 2025-10-21 · unverdicted · novelty 6.0

DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

cs.CV · 2025-09-26 · unverdicted · novelty 6.0

MinerU2.5 uses a two-stage decoupled vision-language architecture to achieve state-of-the-art document parsing accuracy with lower computational overhead than existing general and domain-specific models.

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding

cs.AI · 2026-04-14 · unverdicted · novelty 5.0 · 2 refs

DocSeeker improves long-document understanding in MLLMs via a two-stage training process that combines supervised fine-tuning from distilled data with evidence-aware group relative policy optimization and memory-efficient resolution allocation.

PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

cs.CV · 2026-01-29 · unverdicted · novelty 5.0

PaddleOCR-VL-1.5 is a 0.9B VLM achieving 94.5% SOTA accuracy on OmniDocBench v1.5, with added robustness to physical distortions and support for seal recognition plus text spotting.

citing papers explorer

Showing 16 of 16 citing papers.

How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings cs.CV · 2026-05-08 · conditional · none · ref 27
PureDocBench shows document parsing is far from solved, with top models at ~74/100, small specialists competing with large VLMs, and ranking reversals under real degradation.
MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing cs.AI · 2026-05-21 · unverdicted · none · ref 33
MPDocBench-Parse provides a 3,246-page benchmark and evaluation protocol for multi-page document parsing that tests text/table/formula extraction, merging, figure handling, reading order, and heading hierarchy.
GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts cs.CL · 2026-04-14 · unverdicted · none · ref 33
GlotOCR Bench shows that OCR models perform well on fewer than 10 scripts and fail to generalize beyond about 30, with results tracking pretraining coverage and models hallucinating from known scripts on unfamiliar ones.
The Character Error Vector: Decomposable errors for page-level OCR evaluation cs.CV · 2026-04-07 · conditional · none · ref 25
The Character Error Vector is a decomposable bag-of-characters evaluator for page-level OCR that remains defined under parsing errors and bridges parsing metrics with local CER.
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale cs.CV · 2026-04-06 · unverdicted · none · ref 19
A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.
DocAtlas: Multilingual Document Understanding Across 80+ Languages cs.CL · 2026-05-12 · unverdicted · none · ref 3 · 2 links
DocAtlas introduces model-free rendering pipelines to create DocTag-annotated datasets across 82 languages and shows DPO adaptation improves multilingual performance without base-language degradation.
InstructTable: Improving Table Structure Recognition Through Instructions cs.CV · 2026-04-03 · unverdicted · none · ref 17
InstructTable combines instruction-guided pre-training on structural patterns with visual fine-tuning and a template-free synthetic data generator (TME) to reach state-of-the-art table structure recognition on public benchmarks and a new complex-table test set.
Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing cs.CV · 2026-04-03 · unverdicted · none · ref 14
A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.
Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing cs.CV · 2026-03-25 · conditional · none · ref 23
PaddleOCR-VL uses a Valid Region Focus Module to select key visual tokens and a 0.9B model for guided recognition, delivering SOTA document parsing with far fewer tokens and parameters.
Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training cs.CV · 2026-03-25 · unverdicted · none · ref 20
A realistic scene synthesis strategy and document-aware training recipe enable a 1B-parameter MLLM to achieve superior accuracy and robustness in end-to-end parsing of real-world captured documents.
Logics-Parsing-Omni Technical Report cs.AI · 2026-03-10 · unverdicted · none · ref 12
Omni Parsing framework converts complex multimodal signals into locatable, enumerable, and traceable structured knowledge via hierarchical detection, recognition, and interpreting with strict evidence alignment.
Thinking with Drafting: Optical Decompression via Logical Reconstruction cs.CL · 2026-02-12 · unverdicted · none · ref 23
Thinking with Drafting reconceptualizes visual reasoning as optical decompression by forcing models to draft mental models into executable DSL code for deterministic self-verification on the VisAlg benchmark.
DeepSeek-OCR: Contexts Optical Compression cs.CV · 2025-10-21 · unverdicted · none · ref 18
DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing cs.CV · 2025-09-26 · unverdicted · none · ref 17
MinerU2.5 uses a two-stage decoupled vision-language architecture to achieve state-of-the-art document parsing accuracy with lower computational overhead than existing general and domain-specific models.
DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding cs.AI · 2026-04-14 · unverdicted · none · ref 6 · 2 links
DocSeeker improves long-document understanding in MLLMs via a two-stage training process that combines supervised fine-tuning from distilled data with evidence-aware group relative policy optimization and memory-efficient resolution allocation.
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing cs.CV · 2026-01-29 · unverdicted · none · ref 1
PaddleOCR-VL-1.5 is a 0.9B VLM achieving 94.5% SOTA accuracy on OmniDocBench v1.5, with added robustness to physical distortions and support for seal recognition plus text spotting.

Monkeyocr: Document parsing with a structure-recognition-relation triplet paradigm

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer