PureDocBench shows document parsing is far from solved, with top models at ~74/100, small specialists competing with large VLMs, and ranking reversals under real degradation.
hub Mixed citations
InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5
Mixed citation behavior. Most common role is background (33%).
hub tools
citation-role summary
citation-polarity summary
years
2026 20representative citing papers
RealDocBench supplies 1,356 field-level QA questions over 581 real documents and 1,500 annotated pages, evaluating 18 systems on per-field accuracy, cost, and latency.
Starling, a multi-agent LLM system, extracts ~6.3 million nuanced structured records from PubMed across six tasks with reported error rates of 0.6-7.7%, lower than several curated databases.
AEGIS is a benchmark with 7 academic categories, 39 subtypes, 4 forgery strategies, and multi-dimensional tests showing that leading models like GPT-5.1 achieve only 48.80% overall forensic accuracy on AI-generated academic images.
GlotOCR Bench shows that OCR models perform well on fewer than 10 scripts and fail to generalize beyond about 30, with results tracking pretraining coverage and models hallucinating from known scripts on unfamiliar ones.
ParseBench is a new benchmark for document parsing in AI agents that reveals fragmented performance across five semantic dimensions with LlamaParse Agentic scoring highest at 84.9%.
The Character Error Vector is a decomposable bag-of-characters evaluator for page-level OCR that remains defined under parsing errors and bridges parsing metrics with local CER.
A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.
StrucTab achieves SOTA table parsing performance by unifying structural subtasks through sequential reasoning and using decomposed RL rewards in Uni-TabRL, plus a new TableVerse-5K benchmark.
CAPRA is a multi-agent LLM system with evidence anchoring and consistency checking that analyzes software architecture deliverables and meets 88.8% of an eight-criterion evaluation on 10 student reports.
POTATR extends TATR into a 29M-parameter image-to-graph model for contextual page-level table extraction, reporting 0.964 GriTS_Con on PubTables-v2 Single Pages while running 130x faster and 300x cheaper than tested alternatives including MLLMs.
MPDocBench-Parse provides 433 annotated multi-page documents and an evaluation protocol covering text/table/formula extraction, merging, figure extraction, reading order, and heading hierarchy for realistic document parsing.
A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.
A realistic scene synthesis strategy and document-aware training recipe enable a 1B-parameter MLLM to achieve superior accuracy and robustness in end-to-end parsing of real-world captured documents.
Presents RT-DocLayout, a 33M-parameter end-to-end model extending RT-DETR that unifies layout classification, detection, segmentation, and reading-order prediction at 132.1 FPS with claimed SOTA results on public benchmarks.
ABot-OCR is a new end-to-end VLM for direct image-to-Markdown transcription using a custom data engine and structure-constrained RL optimization, reporting SOTA scores of 92.81/93.30 on OmniDocBench v1.5/v1.6.
FastOCR dynamically selects a small subset of visual tokens per decoding step using focal-guided pruning and cross-step reuse, retaining 98% accuracy on Qwen2.5-VL while attending to only 5% of tokens and cutting attention latency by 3x.
RTPrune introduces a reading-twice inspired two-stage pruning technique for DeepSeek-OCR that retains 84.25% tokens while delivering 99.47% accuracy and 1.23x faster prefill on OmniDocBench.
Frontier multimodal LLMs achieve ~85% accuracy and ~90% weighted F1 on digitizing complex handwritten medical forms, with Gemini 3.1 strongest overall and prompt optimization lifting macro metrics over 60%.
PaddleOCR-VL-1.6 improves on PaddleOCR-VL-1.5 via region-aware data optimization and progressive post-training to reach 96.33% on OmniDocBench v1.6.
citing papers explorer
No citing papers match the current filters.