How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence
Pith reviewed 2026-05-20 05:58 UTC · model grok-4.3
The pith
Document parsers are more vulnerable to small structurally targeted probes than large area changes, as area size poorly predicts OCR instability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We identify this Footprint Bias and propose a lightweight output-level auditing framework that decouples probe construction, policy-driven targeting, and structure-aware diagnosis. The framework combines Block-level Structural Loss Rate (B-SLR), granularity-aware exposure descriptors, and pathway attribution to analyze where perturbations interact with layout structure and how failures propagate. Across MinerU and PP-StructureV3 on 1,000 pages, affected area weakly tracks perturbation-induced OCR instability (R^2=0.384/0.110), whereas B-SLR aligns much more closely with it (R^2=0.727/0.916). Exposure descriptors further separate occlusion- and topology-dominant pathways, and small structurly
What carries the argument
Block-level Structural Loss Rate (B-SLR), an output-level measure of structural disruption at the block level that better tracks how layout changes drive OCR instability than simple affected area.
If this is right
- B-SLR provides a tighter link to actual OCR instability than area-based metrics across tested parsers.
- Granularity-aware exposure descriptors distinguish occlusion pathways from topology pathways.
- Structurally targeted small probes degrade QA and retrieval performance at rates similar to large-footprint changes.
- Robustness evaluation of document intelligence systems should move from footprint stress tests to structure-aware audits.
Where Pith is reading between the lines
- The framework could be applied to additional downstream tasks such as summarization or information extraction to test consistency of the structural signal.
- System builders might integrate B-SLR-style checks into continuous evaluation pipelines for document parsers to catch layout-specific weaknesses early.
- The separation of pathways opens the possibility of targeted defenses that address occlusion versus topology failures differently.
Load-bearing premise
The chosen perturbations and 1,000-page test set are representative of real-world structural vulnerabilities, and B-SLR plus pathway attribution capture the relevant failure modes without post-hoc selection or unstated modeling assumptions.
What would settle it
A new test set or perturbation family in which affected area correlates with OCR instability at least as strongly as B-SLR does would falsify the claim that structure-aware auditing is required.
Figures
read the original abstract
Document Layout Analysis (DLA) pipelines provide structured page representations for retrieval-augmented generation, long-document question answering, and other document intelligence systems, yet their robustness evaluation remains largely area-centric. We identify this Footprint Bias and propose a lightweight output-level auditing framework that decouples probe construction, policy-driven targeting, and structure-aware diagnosis. The framework combines Block-level Structural Loss Rate (B-SLR), granularity-aware exposure descriptors, and pathway attribution to analyze where perturbations interact with layout structure and how failures propagate. Across MinerU and PP-StructureV3 on 1,000 pages, affected area weakly tracks perturbation-induced OCR instability (R^2=0.384/0.110), whereas B-SLR aligns much more closely with it (R^2=0.727/0.916). Exposure descriptors further separate occlusion- and topology-dominant pathways, and small structurally targeted probes cause downstream QA/retrieval degradation comparable to larger-footprint perturbations. These results shift DLA robustness evaluation from footprint-based stress testing toward structure-aware vulnerability auditing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper identifies 'Footprint Bias' in current DLA robustness evaluation, which relies on affected area as the primary metric. It introduces a lightweight auditing framework using Block-level Structural Loss Rate (B-SLR), granularity-aware exposure descriptors, and pathway attribution to analyze how perturbations interact with layout structure. On MinerU and PP-StructureV3 across 1,000 pages, the work reports that affected area weakly correlates with perturbation-induced OCR instability (R²=0.384/0.110) while B-SLR correlates substantially better (R²=0.727/0.916); exposure descriptors separate occlusion- and topology-dominant failure pathways, and small targeted probes produce downstream QA/retrieval degradation comparable to larger perturbations.
Significance. If the central correlations and pathway attributions hold under broader testing, the work provides a concrete shift from area-centric to structure-aware robustness auditing for document intelligence systems. The quantitative R² comparisons offer measurable support for preferring B-SLR over footprint metrics, and the downstream task results highlight practical implications for RAG and QA pipelines. The framework's decoupling of probe construction from diagnosis is a useful methodological contribution.
major comments (3)
- [§4 (Experimental Results)] §4 (Experimental Results): The reported R² values (0.727/0.916 for B-SLR vs. 0.384/0.110 for affected area) are presented without error bars, bootstrap confidence intervals, or statistical tests for the difference in correlations. This is load-bearing for the central claim that B-SLR 'aligns much more closely,' as the practical superiority cannot be assessed without quantifying uncertainty or significance.
- [§3.2 (Dataset and Perturbation Generation)] §3.2 (Dataset and Perturbation Generation): No information is provided on corpus provenance, sampling strategy, document-type stratification (e.g., scientific papers vs. forms vs. tables), or the precise procedure for generating occlusion and topology perturbations. This directly affects the generalizability of the headline result that B-SLR superiority and pathway separation reflect structural vulnerabilities rather than properties of the chosen 1,000-page distribution.
- [§4.3 (Downstream Evaluation)] §4.3 (Downstream Evaluation): The claim that small structurally targeted probes cause 'comparable' QA/retrieval degradation to larger-footprint perturbations lacks quantitative details on effect sizes, variance across runs, or controls that isolate structural targeting from raw perturbation size. This comparison is central to arguing for structure-aware auditing over footprint-based testing.
minor comments (2)
- [Abstract] Abstract: The term 'granularity-aware exposure descriptors' is introduced without a one-sentence definition, reducing accessibility for readers outside the immediate subfield.
- [§3.1 (Framework Definition)] Notation: B-SLR is defined as a new metric but its exact formula (e.g., how block-level losses are aggregated and normalized) should be stated explicitly in the main text rather than deferred entirely to an appendix.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help strengthen the statistical rigor, transparency, and quantitative support in our work. We address each major comment below and will incorporate revisions to improve the manuscript.
read point-by-point responses
-
Referee: The reported R² values (0.727/0.916 for B-SLR vs. 0.384/0.110 for affected area) are presented without error bars, bootstrap confidence intervals, or statistical tests for the difference in correlations. This is load-bearing for the central claim that B-SLR 'aligns much more closely,' as the practical superiority cannot be assessed without quantifying uncertainty or significance.
Authors: We agree that uncertainty quantification and significance testing are necessary to substantiate the superiority of B-SLR. In the revised manuscript, we will add bootstrap confidence intervals (via 1,000 resamples) for all reported R² values and apply Steiger's Z-test to assess whether the correlation differences are statistically significant. These results will appear in §4 with corresponding discussion of practical implications. revision: yes
-
Referee: No information is provided on corpus provenance, sampling strategy, document-type stratification (e.g., scientific papers vs. forms vs. tables), or the precise procedure for generating occlusion and topology perturbations. This directly affects the generalizability of the headline result that B-SLR superiority and pathway separation reflect structural vulnerabilities rather than properties of the chosen 1,000-page distribution.
Authors: We acknowledge that greater detail on the corpus and perturbation procedures is required for assessing generalizability. The revision will expand §3.2 to specify: corpus sources and provenance; stratified sampling by document category (scientific papers, forms, tables, etc.) and complexity metrics; and the exact generation procedures, including masking ratios for occlusion and structural edit rules for topology perturbations. revision: yes
-
Referee: The claim that small structurally targeted probes cause 'comparable' QA/retrieval degradation to larger-footprint perturbations lacks quantitative details on effect sizes, variance across runs, or controls that isolate structural targeting from raw perturbation size. This comparison is central to arguing for structure-aware auditing over footprint-based testing.
Authors: We agree that the downstream results need explicit quantitative backing and controls. In the revised §4.3 we will report effect sizes (e.g., absolute and relative drops in QA F1 and retrieval nDCG), standard deviations across repeated runs, and ablation controls that hold perturbation area constant while varying structural targeting. This will isolate the contribution of layout-aware probes. revision: yes
Circularity Check
No significant circularity; empirical correlations are independent measurements
full rationale
The paper defines B-SLR and exposure descriptors directly from parsed output structure, then reports empirical R² correlations between these metrics and perturbation-induced OCR instability on a fixed 1,000-page test set, contrasting them with the affected-area baseline. These R² values are computed post-experiment from observed data and do not reduce by construction to any fitted parameter or self-referential definition within the same dataset. No self-citations, uniqueness theorems, or ansatzes from prior author work appear in the abstract or summary to support the core claims. The derivation chain—metric definition, perturbation application, outcome measurement, and correlation reporting—is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Perturbations can be constructed to target layout structure while keeping overall footprint small.
invented entities (2)
-
Footprint Bias
no independent evidence
-
Block-level Structural Loss Rate (B-SLR)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Block-level Structural Loss Rate (B-SLR) ... |{x∈E:¬(x∼m(x))}| / |E| ... IoU≥τ_iou ∧ TextSim≥τ_text
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
TOR explains CER only weakly ... R²=0.384/0.110; B-SLR ... R²=0.727/0.916
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Baek, Youngmin and Nam, Daehyun and Park, Sungrae and Lee, Junyeop and Shin, Seung and Baek, Jeonghun and Lee, Chae Young and Lee, Hwalsuk , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , month =
-
[2]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =
Chen, Yufan and Zhang, Jiaming and Peng, Kunyu and Zheng, Junwei and Liu, Ruiping and Torr, Philip and Stiefelhagen, Rainer , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =
work page 2024
- [3]
-
[4]
Improved Regularization of Convolutional Neural Networks with Cutout , author=. 2017 , eprint=
work page 2017
-
[5]
DocPTBench: Benchmarking End-to-End Photographed Document Parsing and Translation , author=. 2025 , eprint=
work page 2025
-
[6]
Wichmann and Wieland Brendel , booktitle=
Robert Geirhos and Patricia Rubisch and Claudio Michaelis and Matthias Bethge and Felix A. Wichmann and Wieland Brendel , booktitle=. ImageNet-trained. 2019 , url=
work page 2019
-
[7]
Augraphy: A Data Augmentation Library for Document Images
Groleau, Alexander and Chee, Kok Wei and Larson, Stefan and Maini, Samay and Boarman, Jonathan. Augraphy: A Data Augmentation Library for Document Images. Document Analysis and Recognition - ICDAR 2023. 2023
work page 2023
-
[8]
Natural Language Engineering , author=
In-depth analysis of the impact of OCR errors on named entity recognition and linking , volume=. Natural Language Engineering , author=. 2023 , pages=. doi:10.1017/S1351324922000110 , number=
-
[9]
International Conference on Learning Representations , year=
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , author=. International Conference on Learning Representations , year=
-
[10]
Benchmarking Adversarial Patch Selection and Location , author=. Mathematics , volume=. 2025 , publisher=
work page 2025
-
[11]
TedEval: A Fair Evaluation Metric for Scene Text Detectors , year=
Lee, Chae Young and Baek, Youngmin and Lee, Hwalsuk , booktitle=. TedEval: A Fair Evaluation Metric for Scene Text Detectors , year=
-
[12]
PP-StructureV2: A Stronger Document Analysis System , author=. 2022 , eprint=
work page 2022
-
[13]
The Thirteenth International Conference on Learning Representations , year=
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks , author=. The Thirteenth International Conference on Learning Representations , year=
-
[14]
The Thirteenth International Conference on Learning Representations , year=
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations , author=. The Thirteenth International Conference on Learning Representations , year=
-
[15]
Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , author=. 2020 , url=
work page 2020
-
[16]
Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S. and Staar, Peter , title =. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =. 2022 , isbn =. doi:10.1145/3534678.3539043 , abstract =
- [17]
-
[18]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =
Tsuzuku, Yusuke and Sato, Issei , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =
- [19]
-
[20]
MinerU: An Open-Source Solution for Precise Document Content Extraction , author=. 2024 , eprint=
work page 2024
-
[21]
The 64th Annual Meeting of the Association for Computational Linguistics -- Industry Track , year=
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing , author=. The 64th Annual Meeting of the Association for Computational Linguistics -- Industry Track , year=
-
[22]
Zenghui Yang and Xingquan Zuo and Hai Huang and Gang Chen and Xinchao Zhao and Tianle Zhang , booktitle=. 2026 , url=
work page 2026
-
[23]
A Fourier Perspective on Model Robustness in Computer Vision , url =
Yin, Dong and Gontijo Lopes, Raphael and Shlens, Jon and Cubuk, Ekin Dogus and Gilmer, Justin , booktitle =. A Fourier Perspective on Model Robustness in Computer Vision , url =
- [24]
-
[25]
PubLayNet: Largest Dataset Ever for Document Layout Analysis , year=
Zhong, Xu and Tang, Jianbin and Jimeno Yepes, Antonio , booktitle=. PubLayNet: Largest Dataset Ever for Document Layout Analysis , year=
-
[26]
Proceedings of the AAAI conference on artificial intelligence , pages=
Random erasing data augmentation , author=. Proceedings of the AAAI conference on artificial intelligence , pages=
-
[27]
Real5-OmniDocBench: A Full-Scale Physical Reconstruction Benchmark for Robust Document Parsing in the Wild , author=. 2026 , eprint=
work page 2026
-
[28]
SCAN : Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation
Ueda, Nobuhiro and Dong, Yuyang and Boros, Kriszti \'a n and Ito, Daiki and Sera, Takuya and Oyamada, Masafumi. SCAN : Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation. Findings of the A ssociation for C omputational L inguistics: EACL 2026. 2026. doi:10.18653/v1/2026.findings-eacl.82
-
[29]
Zhao, Yilun and Long, Yitao and Liu, Hongjun and Kamoi, Ryo and Nan, Linyong and Chen, Lyuhao and Liu, Yixin and Tang, Xiangru and Zhang, Rui and Cohan, Arman. D oc M ath-Eval: Evaluating Math Reasoning Capabilities of LLM s in Understanding Long and Specialized Documents. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguis...
-
[30]
Lightweight Domain-Specific Language Model for Real-Time Structuring of Medical Prescriptions
Cottet, Jonathan Pattin and Eglin, V \'e ronique and Aussem, Alex. Lightweight Domain-Specific Language Model for Real-Time Structuring of Medical Prescriptions. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 5: Industry Track). 2026. doi:10.18653/v1/2026.eacl-industry.68
-
[31]
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =
Zhang, Junyuan and Zhang, Qintong and Wang, Bin and Ouyang, Linke and Wen, Zichen and Li, Ying and Chow, Ka-Ho and He, Conghui and Zhang, Wentao , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2025 , pages =
work page 2025
-
[32]
Adversarial patch , author=. arXiv preprint arXiv:1712.09665 , year=
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.