How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence

Keze Wang; Yihao Wang; Yue Chen; Ziyi Tang

arxiv: 2605.19309 · v1 · pith:65WMGWVHnew · submitted 2026-05-19 · 💻 cs.CL

How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence

Yue Chen , Yihao Wang , Ziyi Tang , Keze Wang This is my paper

Pith reviewed 2026-05-20 05:58 UTC · model grok-4.3

classification 💻 cs.CL

keywords document layout analysisrobustness evaluationstructural vulnerabilityOCR instabilityauditing frameworkfootprint biasblock-level structural loss rate

0 comments

The pith

Document parsers are more vulnerable to small structurally targeted probes than large area changes, as area size poorly predicts OCR instability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Document layout analysis systems underpin many retrieval and question-answering applications, yet their robustness is usually judged by how much page area a perturbation covers. The paper argues this area-centric view creates a Footprint Bias that misses how changes interact with actual layout blocks. It introduces an output-level auditing approach that measures Block-level Structural Loss Rate instead, showing this tracks perturbation-driven OCR failures far more closely than area does across two parsers and a thousand pages. Exposure descriptors in the framework also distinguish occlusion-driven failures from topology-driven ones. Small probes aimed at structure produce downstream QA and retrieval drops comparable to much larger perturbations.

Core claim

We identify this Footprint Bias and propose a lightweight output-level auditing framework that decouples probe construction, policy-driven targeting, and structure-aware diagnosis. The framework combines Block-level Structural Loss Rate (B-SLR), granularity-aware exposure descriptors, and pathway attribution to analyze where perturbations interact with layout structure and how failures propagate. Across MinerU and PP-StructureV3 on 1,000 pages, affected area weakly tracks perturbation-induced OCR instability (R^2=0.384/0.110), whereas B-SLR aligns much more closely with it (R^2=0.727/0.916). Exposure descriptors further separate occlusion- and topology-dominant pathways, and small structurly

What carries the argument

Block-level Structural Loss Rate (B-SLR), an output-level measure of structural disruption at the block level that better tracks how layout changes drive OCR instability than simple affected area.

If this is right

B-SLR provides a tighter link to actual OCR instability than area-based metrics across tested parsers.
Granularity-aware exposure descriptors distinguish occlusion pathways from topology pathways.
Structurally targeted small probes degrade QA and retrieval performance at rates similar to large-footprint changes.
Robustness evaluation of document intelligence systems should move from footprint stress tests to structure-aware audits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be applied to additional downstream tasks such as summarization or information extraction to test consistency of the structural signal.
System builders might integrate B-SLR-style checks into continuous evaluation pipelines for document parsers to catch layout-specific weaknesses early.
The separation of pathways opens the possibility of targeted defenses that address occlusion versus topology failures differently.

Load-bearing premise

The chosen perturbations and 1,000-page test set are representative of real-world structural vulnerabilities, and B-SLR plus pathway attribution capture the relevant failure modes without post-hoc selection or unstated modeling assumptions.

What would settle it

A new test set or perturbation family in which affected area correlates with OCR instability at least as strongly as B-SLR does would falsify the claim that structure-aware auditing is required.

Figures

Figures reproduced from arXiv: 2605.19309 by Keze Wang, Yihao Wang, Yue Chen, Ziyi Tang.

**Figure 1.** Figure 1: Footprint Bias in DLA robustness evaluation: (a) a large-area perturbation may cause limited error, while (b) a small structural probe can trigger greater parsing failure. Despite progress on clean benchmarks and realistic evaluation settings, DLA robustness is still commonly assessed through aggregate degradation under corruption. Existing protocols often parameterize perturbation severity by global cor… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed tripartite vulnerability auditing framework, linking controlled perturbation [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: tests whether affected area is a reliable severity proxy for perturbation-induced OCR instability. It is not: TOR explains CER only weakly on MinerU (R2=0.384) and almost not at all on PP-StructureV3 (R2=0.110). Even within the matched-TOR region, configurations with comparable footprint exhibit a CER spread of roughly 2.7×, showing that footprint alone cannot deter0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 TOR vs… view at source ↗

**Figure 4.** Figure 4: Phase 1 pathway decomposition. Bars decompose B-SLR into SLRmiss and SLRtopo; higher bars indicate greater structural loss, and configuration identifiers are decoded in Appendix C.1 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Representative visual examples of the probe families used in the controlled perturbation space. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Phase 2 pathway composition of mean per-image structural loss for each policy ( [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Abridged prompt templates for the prompt-based policy variants. All prompts emit the same output [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

read the original abstract

Document Layout Analysis (DLA) pipelines provide structured page representations for retrieval-augmented generation, long-document question answering, and other document intelligence systems, yet their robustness evaluation remains largely area-centric. We identify this Footprint Bias and propose a lightweight output-level auditing framework that decouples probe construction, policy-driven targeting, and structure-aware diagnosis. The framework combines Block-level Structural Loss Rate (B-SLR), granularity-aware exposure descriptors, and pathway attribution to analyze where perturbations interact with layout structure and how failures propagate. Across MinerU and PP-StructureV3 on 1,000 pages, affected area weakly tracks perturbation-induced OCR instability (R^2=0.384/0.110), whereas B-SLR aligns much more closely with it (R^2=0.727/0.916). Exposure descriptors further separate occlusion- and topology-dominant pathways, and small structurally targeted probes cause downstream QA/retrieval degradation comparable to larger-footprint perturbations. These results shift DLA robustness evaluation from footprint-based stress testing toward structure-aware vulnerability auditing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that area-based metrics miss a lot of parser failures while a block-level structural measure tracks OCR instability much better, but the test corpus and perturbation details need more scrutiny to support broad claims.

read the letter

The main thing here is that standard area footprints do a poor job predicting how document parsers actually fail under small changes, while the authors' block-level measure lines up much more closely with real OCR problems and downstream task drops. They name this Footprint Bias and build a simple auditing setup that separates probe design from diagnosis using B-SLR, exposure descriptors, and pathway attribution. On MinerU and PP-StructureV3 across 1000 pages the R-squared values rise from 0.384/0.110 for area to 0.727/0.916 for B-SLR, and targeted small probes hurt QA and retrieval about as much as bigger ones. Exposure descriptors also split occlusion-heavy from topology-heavy failure routes. That comparison and the pathway split are the concrete new pieces; the framework itself is lightweight and directly usable for people running layout analysis in retrieval pipelines. The quantitative head-to-head is the part that lands cleanly. The soft spots sit in the experimental base. The 1000-page set and the exact perturbation generation rules are not described enough in the abstract, so it is hard to judge whether the pages cover enough layout variety or whether the probes were chosen in a way that favors block-level signals. If the corpus leans toward scientific papers with clean blocks, the reported gap could shrink on forms or tables. The stress-test note on sampling and post-hoc selection is on target here. The paper would be tighter with explicit stratification, a clear perturbation recipe, and some error bars or significance checks on the correlations. Still, the central claim that structure-aware auditing beats pure footprint testing holds up on the numbers given. This work is for teams that build or evaluate document parsers for RAG and long-document QA. Readers who need practical ways to find structural weak spots will get usable ideas and a clear baseline comparison. It deserves a serious referee because the idea is grounded in measurable differences and the framework is reproducible in principle. I would send it for review but ask the authors to expand the methods section on corpus construction and perturbation details before acceptance.

Referee Report

3 major / 2 minor

Summary. The paper identifies 'Footprint Bias' in current DLA robustness evaluation, which relies on affected area as the primary metric. It introduces a lightweight auditing framework using Block-level Structural Loss Rate (B-SLR), granularity-aware exposure descriptors, and pathway attribution to analyze how perturbations interact with layout structure. On MinerU and PP-StructureV3 across 1,000 pages, the work reports that affected area weakly correlates with perturbation-induced OCR instability (R²=0.384/0.110) while B-SLR correlates substantially better (R²=0.727/0.916); exposure descriptors separate occlusion- and topology-dominant failure pathways, and small targeted probes produce downstream QA/retrieval degradation comparable to larger perturbations.

Significance. If the central correlations and pathway attributions hold under broader testing, the work provides a concrete shift from area-centric to structure-aware robustness auditing for document intelligence systems. The quantitative R² comparisons offer measurable support for preferring B-SLR over footprint metrics, and the downstream task results highlight practical implications for RAG and QA pipelines. The framework's decoupling of probe construction from diagnosis is a useful methodological contribution.

major comments (3)

[§4 (Experimental Results)] §4 (Experimental Results): The reported R² values (0.727/0.916 for B-SLR vs. 0.384/0.110 for affected area) are presented without error bars, bootstrap confidence intervals, or statistical tests for the difference in correlations. This is load-bearing for the central claim that B-SLR 'aligns much more closely,' as the practical superiority cannot be assessed without quantifying uncertainty or significance.
[§3.2 (Dataset and Perturbation Generation)] §3.2 (Dataset and Perturbation Generation): No information is provided on corpus provenance, sampling strategy, document-type stratification (e.g., scientific papers vs. forms vs. tables), or the precise procedure for generating occlusion and topology perturbations. This directly affects the generalizability of the headline result that B-SLR superiority and pathway separation reflect structural vulnerabilities rather than properties of the chosen 1,000-page distribution.
[§4.3 (Downstream Evaluation)] §4.3 (Downstream Evaluation): The claim that small structurally targeted probes cause 'comparable' QA/retrieval degradation to larger-footprint perturbations lacks quantitative details on effect sizes, variance across runs, or controls that isolate structural targeting from raw perturbation size. This comparison is central to arguing for structure-aware auditing over footprint-based testing.

minor comments (2)

[Abstract] Abstract: The term 'granularity-aware exposure descriptors' is introduced without a one-sentence definition, reducing accessibility for readers outside the immediate subfield.
[§3.1 (Framework Definition)] Notation: B-SLR is defined as a new metric but its exact formula (e.g., how block-level losses are aggregated and normalized) should be stated explicitly in the main text rather than deferred entirely to an appendix.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help strengthen the statistical rigor, transparency, and quantitative support in our work. We address each major comment below and will incorporate revisions to improve the manuscript.

read point-by-point responses

Referee: The reported R² values (0.727/0.916 for B-SLR vs. 0.384/0.110 for affected area) are presented without error bars, bootstrap confidence intervals, or statistical tests for the difference in correlations. This is load-bearing for the central claim that B-SLR 'aligns much more closely,' as the practical superiority cannot be assessed without quantifying uncertainty or significance.

Authors: We agree that uncertainty quantification and significance testing are necessary to substantiate the superiority of B-SLR. In the revised manuscript, we will add bootstrap confidence intervals (via 1,000 resamples) for all reported R² values and apply Steiger's Z-test to assess whether the correlation differences are statistically significant. These results will appear in §4 with corresponding discussion of practical implications. revision: yes
Referee: No information is provided on corpus provenance, sampling strategy, document-type stratification (e.g., scientific papers vs. forms vs. tables), or the precise procedure for generating occlusion and topology perturbations. This directly affects the generalizability of the headline result that B-SLR superiority and pathway separation reflect structural vulnerabilities rather than properties of the chosen 1,000-page distribution.

Authors: We acknowledge that greater detail on the corpus and perturbation procedures is required for assessing generalizability. The revision will expand §3.2 to specify: corpus sources and provenance; stratified sampling by document category (scientific papers, forms, tables, etc.) and complexity metrics; and the exact generation procedures, including masking ratios for occlusion and structural edit rules for topology perturbations. revision: yes
Referee: The claim that small structurally targeted probes cause 'comparable' QA/retrieval degradation to larger-footprint perturbations lacks quantitative details on effect sizes, variance across runs, or controls that isolate structural targeting from raw perturbation size. This comparison is central to arguing for structure-aware auditing over footprint-based testing.

Authors: We agree that the downstream results need explicit quantitative backing and controls. In the revised §4.3 we will report effect sizes (e.g., absolute and relative drops in QA F1 and retrieval nDCG), standard deviations across repeated runs, and ablation controls that hold perturbation area constant while varying structural targeting. This will isolate the contribution of layout-aware probes. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical correlations are independent measurements

full rationale

The paper defines B-SLR and exposure descriptors directly from parsed output structure, then reports empirical R² correlations between these metrics and perturbation-induced OCR instability on a fixed 1,000-page test set, contrasting them with the affected-area baseline. These R² values are computed post-experiment from observed data and do not reduce by construction to any fitted parameter or self-referential definition within the same dataset. No self-citations, uniqueness theorems, or ansatzes from prior author work appear in the abstract or summary to support the core claims. The derivation chain—metric definition, perturbation application, outcome measurement, and correlation reporting—is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper introduces new evaluation concepts and metrics without listing explicit free parameters; it relies on domain assumptions about perturbation realism and metric validity.

axioms (1)

domain assumption Perturbations can be constructed to target layout structure while keeping overall footprint small.
Invoked to demonstrate that small probes produce comparable downstream degradation.

invented entities (2)

Footprint Bias no independent evidence
purpose: Label for the area-centric bias in existing DLA robustness evaluation.
Newly named concept used to motivate the framework.
Block-level Structural Loss Rate (B-SLR) no independent evidence
purpose: Output-level metric for structural vulnerability.
Core new measurement proposed and validated via R^2 comparison.

pith-pipeline@v0.9.0 · 5716 in / 1476 out tokens · 51782 ms · 2026-05-20T05:58:46.136664+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Block-level Structural Loss Rate (B-SLR) ... |{x∈E:¬(x∼m(x))}| / |E| ... IoU≥τ_iou ∧ TextSim≥τ_text
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

TOR explains CER only weakly ... R²=0.384/0.110; B-SLR ... R²=0.727/0.916

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

[1]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , month =

Baek, Youngmin and Nam, Daehyun and Park, Sungrae and Lee, Junyeop and Shin, Seung and Baek, Jeonghun and Lee, Chae Young and Lee, Hwalsuk , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , month =

work page
[2]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Chen, Yufan and Zhang, Jiaming and Peng, Kunyu and Zheng, Junwei and Liu, Ruiping and Torr, Philip and Stiefelhagen, Rainer , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

work page 2024
[3]

2025 , eprint=

PaddleOCR 3.0 Technical Report , author=. 2025 , eprint=

work page 2025
[4]

2017 , eprint=

Improved Regularization of Convolutional Neural Networks with Cutout , author=. 2017 , eprint=

work page 2017
[5]

2025 , eprint=

DocPTBench: Benchmarking End-to-End Photographed Document Parsing and Translation , author=. 2025 , eprint=

work page 2025
[6]

Wichmann and Wieland Brendel , booktitle=

Robert Geirhos and Patricia Rubisch and Claudio Michaelis and Matthias Bethge and Felix A. Wichmann and Wieland Brendel , booktitle=. ImageNet-trained. 2019 , url=

work page 2019
[7]

Augraphy: A Data Augmentation Library for Document Images

Groleau, Alexander and Chee, Kok Wei and Larson, Stefan and Maini, Samay and Boarman, Jonathan. Augraphy: A Data Augmentation Library for Document Images. Document Analysis and Recognition - ICDAR 2023. 2023

work page 2023
[8]

Natural Language Engineering , author=

In-depth analysis of the impact of OCR errors on named entity recognition and linking , volume=. Natural Language Engineering , author=. 2023 , pages=. doi:10.1017/S1351324922000110 , number=

work page doi:10.1017/s1351324922000110 2023
[9]

International Conference on Learning Representations , year=

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , author=. International Conference on Learning Representations , year=

work page
[10]

Mathematics , volume=

Benchmarking Adversarial Patch Selection and Location , author=. Mathematics , volume=. 2025 , publisher=

work page 2025
[11]

TedEval: A Fair Evaluation Metric for Scene Text Detectors , year=

Lee, Chae Young and Baek, Youngmin and Lee, Hwalsuk , booktitle=. TedEval: A Fair Evaluation Metric for Scene Text Detectors , year=

work page
[12]

2022 , eprint=

PP-StructureV2: A Stronger Document Analysis System , author=. 2022 , eprint=

work page 2022
[13]

The Thirteenth International Conference on Learning Representations , year=

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[14]

The Thirteenth International Conference on Learning Representations , year=

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[15]

2020 , url=

Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , author=. 2020 , url=

work page 2020
[16]

and Staar, Peter , title =

Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S. and Staar, Peter , title =. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =. 2022 , isbn =. doi:10.1145/3534678.3539043 , abstract =

work page doi:10.1145/3534678.3539043 2022
[17]

2019 , url=

On the Spectral Bias of Neural Networks , author=. 2019 , url=

work page 2019
[18]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Tsuzuku, Yusuke and Sato, Issei , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

work page
[19]

, title =

Wang, Haohan and Wu, Xindi and Huang, Zeyi and Xing, Eric P. , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

work page
[20]

2024 , eprint=

MinerU: An Open-Source Solution for Precise Document Content Extraction , author=. 2024 , eprint=

work page 2024
[21]

The 64th Annual Meeting of the Association for Computational Linguistics -- Industry Track , year=

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing , author=. The 64th Annual Meeting of the Association for Computational Linguistics -- Industry Track , year=

work page
[22]

2026 , url=

Zenghui Yang and Xingquan Zuo and Hai Huang and Gang Chen and Xinchao Zhao and Tianle Zhang , booktitle=. 2026 , url=

work page 2026
[23]

A Fourier Perspective on Model Robustness in Computer Vision , url =

Yin, Dong and Gontijo Lopes, Raphael and Shlens, Jon and Cubuk, Ekin Dogus and Gilmer, Justin , booktitle =. A Fourier Perspective on Model Robustness in Computer Vision , url =

work page
[24]

DocLayout-

Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He , year=. DocLayout-

work page
[25]

PubLayNet: Largest Dataset Ever for Document Layout Analysis , year=

Zhong, Xu and Tang, Jianbin and Jimeno Yepes, Antonio , booktitle=. PubLayNet: Largest Dataset Ever for Document Layout Analysis , year=

work page
[26]

Proceedings of the AAAI conference on artificial intelligence , pages=

Random erasing data augmentation , author=. Proceedings of the AAAI conference on artificial intelligence , pages=

work page
[27]

2026 , eprint=

Real5-OmniDocBench: A Full-Scale Physical Reconstruction Benchmark for Robust Document Parsing in the Wild , author=. 2026 , eprint=

work page 2026
[28]

SCAN : Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation

Ueda, Nobuhiro and Dong, Yuyang and Boros, Kriszti \'a n and Ito, Daiki and Sera, Takuya and Oyamada, Masafumi. SCAN : Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation. Findings of the A ssociation for C omputational L inguistics: EACL 2026. 2026. doi:10.18653/v1/2026.findings-eacl.82

work page doi:10.18653/v1/2026.findings-eacl.82 2026
[29]

D oc M ath-Eval: Evaluating Math Reasoning Capabilities of LLM s in Understanding Long and Specialized Documents

Zhao, Yilun and Long, Yitao and Liu, Hongjun and Kamoi, Ryo and Nan, Linyong and Chen, Lyuhao and Liu, Yixin and Tang, Xiangru and Zhang, Rui and Cohan, Arman. D oc M ath-Eval: Evaluating Math Reasoning Capabilities of LLM s in Understanding Long and Specialized Documents. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguis...

work page doi:10.18653/v1/2024.acl-long.852 2024
[30]

Lightweight Domain-Specific Language Model for Real-Time Structuring of Medical Prescriptions

Cottet, Jonathan Pattin and Eglin, V \'e ronique and Aussem, Alex. Lightweight Domain-Specific Language Model for Real-Time Structuring of Medical Prescriptions. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 5: Industry Track). 2026. doi:10.18653/v1/2026.eacl-industry.68

work page doi:10.18653/v1/2026.eacl-industry.68 2026
[31]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Zhang, Junyuan and Zhang, Qintong and Wang, Bin and Ouyang, Linke and Wen, Zichen and Li, Ying and Chow, Ka-Ho and He, Conghui and Zhang, Wentao , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2025 , pages =

work page 2025
[32]

Adversarial Patch

Adversarial patch , author=. arXiv preprint arXiv:1712.09665 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , month =

Baek, Youngmin and Nam, Daehyun and Park, Sungrae and Lee, Junyeop and Shin, Seung and Baek, Jeonghun and Lee, Chae Young and Lee, Hwalsuk , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , month =

work page

[2] [2]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Chen, Yufan and Zhang, Jiaming and Peng, Kunyu and Zheng, Junwei and Liu, Ruiping and Torr, Philip and Stiefelhagen, Rainer , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

work page 2024

[3] [3]

2025 , eprint=

PaddleOCR 3.0 Technical Report , author=. 2025 , eprint=

work page 2025

[4] [4]

2017 , eprint=

Improved Regularization of Convolutional Neural Networks with Cutout , author=. 2017 , eprint=

work page 2017

[5] [5]

2025 , eprint=

DocPTBench: Benchmarking End-to-End Photographed Document Parsing and Translation , author=. 2025 , eprint=

work page 2025

[6] [6]

Wichmann and Wieland Brendel , booktitle=

Robert Geirhos and Patricia Rubisch and Claudio Michaelis and Matthias Bethge and Felix A. Wichmann and Wieland Brendel , booktitle=. ImageNet-trained. 2019 , url=

work page 2019

[7] [7]

Augraphy: A Data Augmentation Library for Document Images

Groleau, Alexander and Chee, Kok Wei and Larson, Stefan and Maini, Samay and Boarman, Jonathan. Augraphy: A Data Augmentation Library for Document Images. Document Analysis and Recognition - ICDAR 2023. 2023

work page 2023

[8] [8]

Natural Language Engineering , author=

In-depth analysis of the impact of OCR errors on named entity recognition and linking , volume=. Natural Language Engineering , author=. 2023 , pages=. doi:10.1017/S1351324922000110 , number=

work page doi:10.1017/s1351324922000110 2023

[9] [9]

International Conference on Learning Representations , year=

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , author=. International Conference on Learning Representations , year=

work page

[10] [10]

Mathematics , volume=

Benchmarking Adversarial Patch Selection and Location , author=. Mathematics , volume=. 2025 , publisher=

work page 2025

[11] [11]

TedEval: A Fair Evaluation Metric for Scene Text Detectors , year=

Lee, Chae Young and Baek, Youngmin and Lee, Hwalsuk , booktitle=. TedEval: A Fair Evaluation Metric for Scene Text Detectors , year=

work page

[12] [12]

2022 , eprint=

PP-StructureV2: A Stronger Document Analysis System , author=. 2022 , eprint=

work page 2022

[13] [13]

The Thirteenth International Conference on Learning Representations , year=

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[14] [14]

The Thirteenth International Conference on Learning Representations , year=

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[15] [15]

2020 , url=

Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , author=. 2020 , url=

work page 2020

[16] [16]

and Staar, Peter , title =

Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S. and Staar, Peter , title =. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =. 2022 , isbn =. doi:10.1145/3534678.3539043 , abstract =

work page doi:10.1145/3534678.3539043 2022

[17] [17]

2019 , url=

On the Spectral Bias of Neural Networks , author=. 2019 , url=

work page 2019

[18] [18]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Tsuzuku, Yusuke and Sato, Issei , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

work page

[19] [19]

, title =

Wang, Haohan and Wu, Xindi and Huang, Zeyi and Xing, Eric P. , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

work page

[20] [20]

2024 , eprint=

MinerU: An Open-Source Solution for Precise Document Content Extraction , author=. 2024 , eprint=

work page 2024

[21] [21]

The 64th Annual Meeting of the Association for Computational Linguistics -- Industry Track , year=

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing , author=. The 64th Annual Meeting of the Association for Computational Linguistics -- Industry Track , year=

work page

[22] [22]

2026 , url=

Zenghui Yang and Xingquan Zuo and Hai Huang and Gang Chen and Xinchao Zhao and Tianle Zhang , booktitle=. 2026 , url=

work page 2026

[23] [23]

A Fourier Perspective on Model Robustness in Computer Vision , url =

Yin, Dong and Gontijo Lopes, Raphael and Shlens, Jon and Cubuk, Ekin Dogus and Gilmer, Justin , booktitle =. A Fourier Perspective on Model Robustness in Computer Vision , url =

work page

[24] [24]

DocLayout-

Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He , year=. DocLayout-

work page

[25] [25]

PubLayNet: Largest Dataset Ever for Document Layout Analysis , year=

Zhong, Xu and Tang, Jianbin and Jimeno Yepes, Antonio , booktitle=. PubLayNet: Largest Dataset Ever for Document Layout Analysis , year=

work page

[26] [26]

Proceedings of the AAAI conference on artificial intelligence , pages=

Random erasing data augmentation , author=. Proceedings of the AAAI conference on artificial intelligence , pages=

work page

[27] [27]

2026 , eprint=

Real5-OmniDocBench: A Full-Scale Physical Reconstruction Benchmark for Robust Document Parsing in the Wild , author=. 2026 , eprint=

work page 2026

[28] [28]

SCAN : Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation

Ueda, Nobuhiro and Dong, Yuyang and Boros, Kriszti \'a n and Ito, Daiki and Sera, Takuya and Oyamada, Masafumi. SCAN : Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation. Findings of the A ssociation for C omputational L inguistics: EACL 2026. 2026. doi:10.18653/v1/2026.findings-eacl.82

work page doi:10.18653/v1/2026.findings-eacl.82 2026

[29] [29]

D oc M ath-Eval: Evaluating Math Reasoning Capabilities of LLM s in Understanding Long and Specialized Documents

Zhao, Yilun and Long, Yitao and Liu, Hongjun and Kamoi, Ryo and Nan, Linyong and Chen, Lyuhao and Liu, Yixin and Tang, Xiangru and Zhang, Rui and Cohan, Arman. D oc M ath-Eval: Evaluating Math Reasoning Capabilities of LLM s in Understanding Long and Specialized Documents. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguis...

work page doi:10.18653/v1/2024.acl-long.852 2024

[30] [30]

Lightweight Domain-Specific Language Model for Real-Time Structuring of Medical Prescriptions

Cottet, Jonathan Pattin and Eglin, V \'e ronique and Aussem, Alex. Lightweight Domain-Specific Language Model for Real-Time Structuring of Medical Prescriptions. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 5: Industry Track). 2026. doi:10.18653/v1/2026.eacl-industry.68

work page doi:10.18653/v1/2026.eacl-industry.68 2026

[31] [31]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Zhang, Junyuan and Zhang, Qintong and Wang, Bin and Ouyang, Linke and Wen, Zichen and Li, Ying and Chow, Ka-Ho and He, Conghui and Zhang, Wentao , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2025 , pages =

work page 2025

[32] [32]

Adversarial Patch

Adversarial patch , author=. arXiv preprint arXiv:1712.09665 , year=

work page internal anchor Pith review Pith/arXiv arXiv