hub Mixed citations

InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5

Yumeng Li, Guang Yang, Hao Liu, Bowen Wang, Colin Zhang · 2025 · arXiv 2512.02498

Mixed citation behavior. Most common role is background (33%).

20 Pith papers citing it

Background 33% of classified citations

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 2 method 2

citation-polarity summary

background 2 baseline 2 use method 2

representative citing papers

How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings

cs.CV · 2026-05-08 · conditional · novelty 8.0

PureDocBench shows document parsing is far from solved, with top models at ~74/100, small specialists competing with large VLMs, and ranking reversals under real degradation.

RealDocBench: A Benchmark for Field-Level QA and Layout Understanding on Real-World Regulated Documents

cs.CV · 2026-06-05 · unverdicted · novelty 7.0

RealDocBench supplies 1,356 field-level QA questions over 581 real documents and 1,500 annotated pages, evaluating 18 systems on per-field accuracy, cost, and latency.

Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

cs.LG · 2026-05-07 · conditional · novelty 7.0 · 3 refs

Starling, a multi-agent LLM system, extracts ~6.3 million nuanced structured records from PubMed across six tasks with reported error rates of 0.6-7.7%, lower than several curated databases.

AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images

cs.CV · 2026-04-30 · unverdicted · novelty 7.0 · 2 refs

AEGIS is a benchmark with 7 academic categories, 39 subtypes, 4 forgery strategies, and multi-dimensional tests showing that leading models like GPT-5.1 achieve only 48.80% overall forensic accuracy on AI-generated academic images.

GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

GlotOCR Bench shows that OCR models perform well on fewer than 10 scripts and fail to generalize beyond about 30, with results tracking pretraining coverage and models hallucinating from known scripts on unfamiliar ones.

ParseBench: A Document Parsing Benchmark for AI Agents

cs.CV · 2026-04-09 · accept · novelty 7.0

ParseBench is a new benchmark for document parsing in AI agents that reveals fragmented performance across five semantic dimensions with LlamaParse Agentic scoring highest at 84.9%.

The Character Error Vector: Decomposable errors for page-level OCR evaluation

cs.CV · 2026-04-07 · conditional · novelty 7.0

The Character Error Vector is a decomposable bag-of-characters evaluator for page-level OCR that remains defined under parsing errors and bridges parsing metrics with local CER.

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.

StrucTab: A Structured Optimization Framework for Table Parsing

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

StrucTab achieves SOTA table parsing performance by unifying structural subtasks through sequential reasoning and using decomposed RL rewards in Uni-TabRL, plus a new TableVerse-5K benchmark.

CAPRA: Scaling Feedback on Software Architecture Deliverables with a Multi-Agent LLM System

cs.SE · 2026-06-17 · unverdicted · novelty 6.0

CAPRA is a multi-agent LLM system with evidence anchoring and consistency checking that analyzes software architecture deliverables and meets 88.8% of an eight-criterion evaluation on 10 student reports.

POTATR: A Lightweight Image-to-Graph Model for Page-Level Table Extraction

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

POTATR extends TATR into a 29M-parameter image-to-graph model for contextual page-level table extraction, reporting 0.964 GriTS_Con on PubTables-v2 Single Pages while running 130x faster and 300x cheaper than tested alternatives including MLLMs.

MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing

cs.AI · 2026-05-21 · unverdicted · novelty 6.0 · 2 refs

MPDocBench-Parse provides 433 annotated multi-page documents and an evaluation protocol covering text/table/formula extraction, merging, figure extraction, reading order, and heading hierarchy for realistic document parsing.

Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.

Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training

cs.CV · 2026-03-25 · unverdicted · novelty 6.0

A realistic scene synthesis strategy and document-aware training recipe enable a 1B-parameter MLLM to achieve superior accuracy and robustness in end-to-end parsing of real-world captured documents.

RT-DocLayout: Real-Time End-to-End Document Layout Analysis with Reading Order in the Wild

cs.CV · 2026-06-22 · unverdicted · novelty 5.0

Presents RT-DocLayout, a 33M-parameter end-to-end model extending RT-DETR that unifies layout classification, detection, segmentation, and reading-order prediction at 132.1 FPS with claimed SOTA results on public benchmarks.

ABot-OCR Technical Report

cs.CV · 2026-05-27 · unverdicted · novelty 5.0

ABot-OCR is a new end-to-end VLM for direct image-to-Markdown transcription using a custom data engine and structure-constrained RL optimization, reporting SOTA scores of 92.81/93.30 on OmniDocBench v1.5/v1.6.

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

cs.CV · 2026-05-17 · unverdicted · novelty 5.0

FastOCR dynamically selects a small subset of visual tokens per decoding step using focal-guided pruning and cross-step reuse, retaining 98% accuracy on Qwen2.5-VL while attending to only 5% of tokens and cutting attention latency by 3x.

RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

cs.CV · 2026-05-01 · unverdicted · novelty 5.0

RTPrune introduces a reading-twice inspired two-stage pruning technique for DeepSeek-OCR that retains 84.25% tokens while delivering 99.47% accuracy and 1.23x faster prefill on OmniDocBench.

From Handwriting to Structured Data: Benchmarking AI Digitisation of Handwritten Forms

cs.CV · 2026-04-14 · unverdicted · novelty 4.0

Frontier multimodal LLMs achieve ~85% accuracy and ~90% weighted F1 on digitizing complex handwritten medical forms, with Gemini 3.1 strongest overall and prompt optimization lifting macro metrics over 60%.

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

cs.CV · 2026-06-02 · unverdicted · novelty 3.0

PaddleOCR-VL-1.6 improves on PaddleOCR-VL-1.5 via region-aware data optimization and progressive post-training to reach 96.33% on OmniDocBench v1.6.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer