DocQT: Improving Document Forgery Localization Robustness via Diverse JPEG Quantization Tables
Pith reviewed 2026-05-20 06:07 UTC · model grok-4.3
The pith
Training on real-world JPEG quantization tables improves forgery localization and cuts false positives when models explicitly read the quantization table as input.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that training under operationally calibrated quantization tables sampled from real document workflows yields substantial localization gains on DocTamper and significantly reduces the pixel-level false positive rate on authentic operational documents. This improvement materializes only for architectures that explicitly ingest the quantization table as input. In contrast, training under standard quality factor augmentation does not adequately proxy the compression diversity encountered in operational settings.
What carries the argument
The DocQT quantization-table bank combined with Real-QT training for architectures that condition on the JPEG quantization table, contrasted against Standard-QT augmentation.
If this is right
- Localization accuracy rises on standard manipulated-document benchmarks when models train on diverse real quantization tables.
- Pixel-level false positive rates drop on genuine operational documents for models that receive the quantization table as input.
- Standard quality-factor augmentation alone fails to deliver equivalent operational robustness.
- Explicit conditioning on the quantization table supplies a measurable robustness advantage for deployment.
- The released quantization-table dataset enables further calibration of detection systems to real compression profiles.
Where Pith is reading between the lines
- Other image-forensics tasks that rely on compressed inputs could incorporate quantization-table conditioning to improve generalization beyond laboratory conditions.
- Document-processing pipelines should collect representative compression statistics from their own workflows when developing or updating detectors.
- Model architectures may benefit from treating quantization-table input as a default design choice rather than an optional feature.
- Repeating the controlled comparison on additional operational corpora would test whether the observed gains transfer across different document sources and compression environments.
Load-bearing premise
That the sampled quantization tables from the operational document corpus represent the full variety of compression profiles that appear in real-world document workflows and that the factorial study isolates this factor from other variables.
What would settle it
Retraining the same architectures on a new, independent collection of quantization tables from a different operational document source and finding no localization gains or no reduction in false positive rate on the original test sets would falsify the central claim.
Figures
read the original abstract
Document manipulation localization models achieve strong performance on public benchmarks yet fail to generalize to operational document workflows. We identify a critical and overlooked source of this gap: the mismatch between the narrow distribution of JPEG quantization tables used during training -restricted to standard libjpeg quality factors -and the heterogeneous compression profiles encountered in real-world insurance document pipelines. To isolate this factor, we conduct a controlled factorial study comparing two architectures with contrasting levels of quantization table awareness -FFDN [2] and Mesorch [20] -each trained under either standard quality factor augmentation (Standard-QT ) or operationally calibrated quantization tables sampled from DocQT, a quantization-table bank derived from a MAIF operational image corpus (Real-QT ), and evaluated under three recompression conditions. Training under Real-QT yields substantial localization gains on DocTamper [15] and significantly reduces the pixel-level false positive rate on authentic operational documents, but only for architectures that explicitly ingest the quantization table as input. The released DocQT quantization-table dataset and compression-reproduction material are directly available at https://github.com/Kyliroco/Improving-Document-Forgery-Localization-Robustness-via-Diverse-JPEG-Quantization-Tables. These results demonstrate that standard quality factor augmentation does not adequately proxy operational compression diversity, and that architectural choices explicitly conditioning on the quantization table provide a meaningful robustness advantage for real-world deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript identifies a mismatch between the narrow JPEG quantization tables (standard libjpeg quality factors) used in training document forgery localization models and the heterogeneous compression profiles in real-world insurance document pipelines. It conducts a factorial study comparing FFDN and Mesorch architectures trained under Standard-QT versus Real-QT (sampled from the DocQT bank derived from the MAIF operational corpus) and evaluates under three recompression conditions. Results show substantial localization gains on DocTamper and reduced pixel-level false positive rates on authentic operational documents, but only for architectures that explicitly ingest the quantization table as input. The work releases the DocQT dataset and reproduction code.
Significance. If the results hold, the work has clear practical significance for improving robustness of forgery localization models in operational settings such as insurance document workflows. It demonstrates that standard quality-factor augmentation is insufficient and that explicit QT conditioning combined with diverse real-world QT training provides a measurable advantage. The release of the DocQT quantization-table bank and code is a notable strength for reproducibility and future work.
major comments (2)
- [§4] §4 (Experimental Setup and Results): The factorial comparison isolates the benefit of Real-QT within the MAIF-sampled distribution, but the manuscript provides no quantitative comparison of the MAIF QT distribution (e.g., histogram overlap, coverage metrics, or statistical tests) against other operational document corpora or broader insurance pipelines. This leaves the external-validity claim—that Real-QT captures the heterogeneous compression profiles needed for generalization—unsupported by direct evidence.
- [§3.2] §3.2 (DocQT Construction): The description of how the MAIF corpus was sampled to build the quantization-table bank does not report exclusion criteria, scanner/quality-setting diversity statistics, or sensitivity analysis to potential biases in the corpus (e.g., over-representation of specific devices). Without these, it is difficult to assess whether the reported gains on DocTamper and FPR reduction on operational documents would transfer outside the study distribution.
minor comments (2)
- [Abstract] Abstract: The phrase 'three recompression conditions' is used without naming them; a brief parenthetical or forward reference to the relevant subsection would improve clarity for readers.
- [Figures/Tables] Figure captions and Table 1: Ensure all axis labels and column headers explicitly distinguish Standard-QT from Real-QT conditions and report error bars or confidence intervals where quantitative gains are claimed.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make to improve the clarity and strength of our claims regarding the DocQT dataset and its applicability.
read point-by-point responses
-
Referee: [§4] §4 (Experimental Setup and Results): The factorial comparison isolates the benefit of Real-QT within the MAIF-sampled distribution, but the manuscript provides no quantitative comparison of the MAIF QT distribution (e.g., histogram overlap, coverage metrics, or statistical tests) against other operational document corpora or broader insurance pipelines. This leaves the external-validity claim—that Real-QT captures the heterogeneous compression profiles needed for generalization—unsupported by direct evidence.
Authors: We acknowledge that direct quantitative comparisons (such as histogram overlaps or statistical tests) to other operational corpora would provide stronger support for the external validity of our findings. Unfortunately, we do not have access to additional private operational document corpora from other insurance providers, as these are typically not publicly available due to confidentiality. In the revised manuscript, we will expand the discussion in §4 to include more detailed statistics on the MAIF QT distribution (e.g., the number of unique quantization tables, the range and distribution of quality factors, and examples of common tables). We will also explicitly discuss the limitations of generalizing beyond the MAIF distribution and suggest that the released DocQT dataset can facilitate such comparisons in future work by other researchers. This addresses the concern while being transparent about the scope of our claims. revision: partial
-
Referee: [§3.2] §3.2 (DocQT Construction): The description of how the MAIF corpus was sampled to build the quantization-table bank does not report exclusion criteria, scanner/quality-setting diversity statistics, or sensitivity analysis to potential biases in the corpus (e.g., over-representation of specific devices). Without these, it is difficult to assess whether the reported gains on DocTamper and FPR reduction on operational documents would transfer outside the study distribution.
Authors: We agree that additional details on the corpus sampling process would improve the transparency and allow better assessment of potential biases. In the revised version, we will expand §3.2 to include: (1) the exclusion criteria used when sampling from the MAIF corpus (e.g., image resolution thresholds, file format filters), (2) available statistics on scanner and quality-setting diversity where such metadata is present in the corpus, and (3) a brief sensitivity analysis or discussion of possible biases, such as device over-representation. We will also add a limitations paragraph noting that the DocQT reflects the MAIF operational environment and may not capture all possible real-world variations. revision: yes
- We cannot perform direct quantitative comparisons against other specific operational corpora because we lack access to those proprietary datasets.
Circularity Check
Empirical study with no derivation chain or self-referential reductions
full rationale
The paper is a controlled empirical comparison of forgery localization models trained under Standard-QT versus Real-QT regimes sampled from an external MAIF corpus. No mathematical derivations, equations, or first-principles predictions are presented that could reduce to fitted parameters or self-citations by construction. Performance gains are reported from direct experimentation on DocTamper and operational documents, with dataset and code released for independent reproduction. All cited prior work (FFDN, Mesorch, DocTamper) functions as external baselines rather than load-bearing self-references that close a circular loop. The central claims rest on observable experimental outcomes rather than any definitional or predictive equivalence to the inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard machine learning assumptions hold for data splits, training procedures, and evaluation metrics in the factorial study.
invented entities (1)
-
DocQT quantization-table bank
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Training under Real-QT yields substantial localization gains on DocTamper and significantly reduces the pixel-level false positive rate on authentic operational documents, but only for architectures that explicitly ingest the quantization table as input.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the mismatch between the narrow distribution of JPEG quantization tables used during training... and the heterogeneous compression profiles encountered in real-world insurance document pipelines
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Dang-Nguyen, Duc-Tien and Pasquini, Cecilia and Conotter, Valentina and Boato, Giulia , title =. Proceedings of the 6th. 2015 , doi =
work page 2015
-
[3]
2015 13th International Conference on Document Analysis and Recognition (
Nayef, Nibal and Luqman, Muhammad Muzzamil and Prum, Sophea and Eskenazi, Sebastien and Chazalon, Joseph and Ogier, Jean-Marc , title =. 2015 13th International Conference on Document Analysis and Recognition (. 2015 , doi =
work page 2015
-
[4]
Find it! Fraud Detection Contest Report , booktitle =
Artaud, Chlo\'. Find it! Fraud Detection Contest Report , booktitle =. 2018 , doi =
work page 2018
-
[5]
and Yamagishi, Junichi and Echizen, Isao , title =
Nguyen, Huy H. and Yamagishi, Junichi and Echizen, Isao , title =. 2019 , doi =
work page 2019
-
[6]
Wu, Yue and AbdAlmageed, Wael and Natarajan, Premkumar , title =. 2019. 2019 , doi =
work page 2019
-
[7]
Huang, Zheng and Chen, Kai and He, Jianhua and Bai, Xiang and Karatzas, Dimosthenis and Lu, Shijian and Jawahar, C. V. , title =. 2019 International Conference on Document Analysis and Recognition (. 2019 , doi =
work page 2019
-
[8]
Jaume, Guillaume and Ekenel, Hazim Kemal and Thiran, Jean-Philippe , title =. 2019 , eprint =
work page 2019
- [9]
-
[10]
Liu, Xiaohong and Liu, Yaojie and Chen, Jun and Liu, Xiaoming , title =. 2022 , doi =
work page 2022
-
[11]
International Journal of Computer Vision , volume =
Kwon, Myung-Joon and Nam, Seung-Hun and Yu, In-Jae and Lee, Heung-Kyu and Kim, Changick , title =. International Journal of Computer Vision , volume =. 2022 , doi =
work page 2022
-
[12]
Journal of Cybersecurity , volume =
Wang, Yuxin and Zhang, Boqiang and Xie, Hongtao and Zhang, Yongdong , title =. Journal of Cybersecurity , volume =
-
[13]
Guillaro, Fabrizio and Cozzolino, Davide and Sud, Avneesh and Dufour, Nicholas and Verdoliva, Luisa , title =. 2023. 2023 , doi =
work page 2023
-
[14]
Qu, Chenfan and Liu, Chongyu and Liu, Yuliang and Chen, Xinhong and Peng, Dezhi and Guo, Fengjun and Jin, Lianwen , title =. 2023. 2023 , doi =
work page 2023
-
[15]
Guo, Xiao and Liu, Xiaohong and Ren, Zhiyuan and Grosz, Steven and Masi, Iacopo and Liu, Xiaoming , title =. 2023 , eprint =
work page 2023
- [16]
-
[17]
Journal of Information Hiding and Privacy Protection , volume =
Wan, Kun , title =. Journal of Information Hiding and Privacy Protection , volume =. 2023 , doi =
work page 2023
- [18]
-
[19]
Ma, Xiaochen and Du, Bo and Jiang, Zhuohang and Du, Xia and Al Hammadi, Ahmed Y. and Zhou, Jizhe , title =. 2024 , eprint =
work page 2024
-
[20]
Advances in Neural Information Processing Systems (
Ma, Xiaochen and Zhu, Xuekang and Su, Lei and Du, Bo and Jiang, Zhuohang and Tong, Bingkui and Lei, Zeyu and Yang, Xinyu and Pun, Chi-Man and Lv, Jiancheng and Zhou, Jizhe , title =. Advances in Neural Information Processing Systems (
-
[21]
Multimedia Tools and Applications , publisher =
O'Flaherty, Juli\'. Multimedia Tools and Applications , publisher =. 2025 , doi =
work page 2025
-
[22]
Zhu, Xuekang and Ma, Xiaochen and Su, Lei and Jiang, Zhuohang and Du, Bo and Wang, Xiwen and Lei, Zeyu and Feng, Wentao and Pun, Chi-Man and Zhou, Jizhe , title =. 2024 , eprint =
work page 2024
-
[23]
Chen, Zhongxi and Chen, Shen and Yao, Taiping and Sun, Ke and Ding, Shouhong and Lin, Xianming and Cao, Liujuan and Ji, Rongrong , title =. Computer Vision --. 2025 , doi =
work page 2025
-
[24]
Pattern Recognition , volume =
Luo, Dongliang and Liu, Yuliang and Yang, Rui and Liu, Xianjin and Zeng, Jishen and Zhou, Yu and Bai, Xiang , title =. Pattern Recognition , volume =. 2025 , doi =
work page 2025
-
[25]
Song, Yalin and Jiang, Wenbin and Chai, Xiuli and Gan, Zhihua and Zhou, Mengyuan and Chen, Lei , title =. 2025 , doi =
work page 2025
-
[26]
Du, Bo and Zhu, Xuekang and Ma, Xiaochen and Qu, Chenfan and Feng, Kaiwen and Yang, Zhe and Pun, Chi-Man and Liu, Jian and Zhou, Jizhe , title =. 2025 , eprint =
work page 2025
-
[27]
Naseeb, Chan and Cheema, Adeel Ashraf and Sami, Hassan and Afzal, Tayyab and Omair, Muhammad and Habib, Usman , title =. 2026 , eprint =
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.