Receipt Replay OOD: A Small Benchmark for Screen Replay Detection Under Domain Shift

Alexander Vinogradov

arxiv: 2605.26855 · v1 · pith:5WKR3BQFnew · submitted 2026-05-26 · 💻 cs.CV

Receipt Replay OOD: A Small Benchmark for Screen Replay Detection Under Domain Shift

Alexander Vinogradov This is my paper

Pith reviewed 2026-06-29 18:20 UTC · model grok-4.3

classification 💻 cs.CV

keywords screen replay detectionout-of-domain robustnesspresentation attack detectiondocument verificationbenchmark datasetdomain shift

0 comments

The pith

Receipts form a privacy-safe out-of-domain test for screen replay detectors trained on identity documents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Receipt Replay OOD as a small benchmark to measure how screen replay detection models handle domain shift away from identity documents. Receipts match identity documents in planar shape, curved corners, wear artifacts, and printed patterns but carry no personal information. Cross-domain tests on this set show clear drops in model performance. This setup lets researchers probe generalization limits without the privacy barriers that block larger public ID datasets.

Core claim

Receipt Replay OOD demonstrates that document replay detection models suffer measurable performance loss when evaluated on receipts, which share planar geometry, curved corners, wear-and-tear artifacts, and text or logo patterns with identity documents yet avoid personally identifiable information constraints.

What carries the argument

The Receipt Replay OOD benchmark dataset, used as an out-of-domain test set to expose generalization failures under domain shift.

If this is right

Existing replay detectors trained on public ID datasets will show reduced accuracy on receipt images.
Domain shift between document types directly degrades presentation attack detection performance.
Receipts can substitute for identity documents in robustness testing while sidestepping data-privacy rules.
Benchmark results quantify the size of the generalization gap that future training methods must close.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Larger receipt collections with varied lighting and camera angles could strengthen the benchmark for future OOD studies.
Training recipes that explicitly regularize for shape and texture invariance might reduce the observed domain gap.
The same receipt proxy idea could apply to other document-security tasks that currently face PII restrictions.

Load-bearing premise

Receipts share enough geometric, textural, and artifact characteristics with identity documents to make performance on receipts a meaningful signal of out-of-domain robustness.

What would settle it

A controlled experiment in which the same models are tested on both Receipt Replay OOD and a held-out set of real identity documents under matched replay conditions; if accuracy patterns diverge sharply, the proxy claim fails.

read the original abstract

Public datasets such as DLC-2021, SynID, and KID34K have significantly contributed to research on presentation attack detection for identity documents, including screen replay attacks. However, evaluation of out-of-domain (OOD) robustness remains insufficiently explored, especially under realistic domain shifts. In this work, we introduce Receipt Replay OOD, a small out-of-domain benchmark for screen replay detection. Receipts share several characteristics with identity documents, including planar geometry, curved corners, wear-and-tear artifacts, and text or logo patterns, while avoiding personally identifiable information constraints commonly associated with identity documents. We evaluate document replay detection models under cross-domain conditions and demonstrate the impact of domain shift on generalization performance. The dataset is publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Small new receipt-based OOD benchmark for screen replay detection that fills a narrow gap but stays modest in scope and evidence.

read the letter

The main thing here is a new small dataset called Receipt Replay OOD meant to test how well screen replay detectors generalize when you move from identity documents to receipts. The authors release it publicly and run cross-domain checks to show the expected drop in performance.

What works is the practical choice of receipts as a proxy. They share flat geometry, rounded corners, printed text, and wear patterns with ID documents but skip the privacy headaches, which makes the benchmark easier to share. The evaluation demonstrates domain shift effects on existing models, and the dataset builds directly on prior collections like DLC-2021 without claiming new methods or theory.

The soft spots are the limited scale and the thin support for the proxy claim. A small benchmark gives only weak statistical footing for robustness claims, and the paper does not appear to include detailed metrics, model architectures, or ablation studies in the abstract. The similarity argument is plausible on the surface but rests on high-level visual overlap rather than measured equivalence across attack types.

This paper is for people already working on presentation attack detection who need extra OOD test material. A reader in that subfield can pull the data and run their own checks.

It deserves peer review. New public benchmarks, even small ones, can support later work if the collection details hold up under scrutiny.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces Receipt Replay OOD, a small out-of-domain benchmark for screen replay detection. Receipts are proposed as a suitable proxy for identity documents because they share planar geometry, curved corners, wear-and-tear artifacts, and text or logo patterns, while avoiding PII constraints. The work evaluates document replay detection models under cross-domain conditions and demonstrates the impact of domain shift on generalization performance. The dataset is made publicly available.

Significance. If the evaluation supports the claims, this benchmark fills a gap in OOD robustness testing for presentation attack detection by providing a PII-free alternative to identity document datasets such as DLC-2021, SynID, and KID34K. The public availability of the dataset supports reproducibility and further research in the field. This is a modest but useful contribution for testing generalization of replay detectors.

minor comments (2)

The abstract would be strengthened by including at least one quantitative result illustrating the domain shift impact.
[Dataset] Provide more details on the number of samples, acquisition conditions, and any preprocessing steps applied to the receipt images in the dataset description section.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of Receipt Replay OOD as a useful PII-free benchmark for OOD robustness in screen replay detection, and for recommending minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces a benchmark dataset and reports cross-domain evaluation results. It contains no equations, derivations, parameter fitting, or predictive claims that could reduce to inputs by construction. The justification for receipts as a proxy for ID documents is an explicit design assumption rather than a derived result. No self-citations are load-bearing for any central claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on the domain assumption that receipts are suitable visual proxies for identity documents in replay detection tasks.

axioms (1)

domain assumption Receipts share planar geometry, curved corners, wear-and-tear artifacts, and text or logo patterns with identity documents.
Invoked in abstract to justify use of receipts as OOD proxy while avoiding PII.

pith-pipeline@v0.9.1-grok · 5644 in / 1074 out tokens · 37582 ms · 2026-06-29T18:20:03.670929+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 6 canonical work pages

[1]

Introduction and Related Works Multiple datasets have been introduced for presentation attack detection in the domain of identity documents. The DLC-2021 dataset (Polevoy et al., 2022), based on images from the MIDV family, contains 10 document types and multiple attack categories including color photocopies, grayscale copies, and screen replay attacks ca...

2021
[2]

At the same time, receipts avoid legal and privacy limitations associated with identity documents, making them suitable for public OOD benchmarking

Dataset description Receipts were selected as lightweight real-world planar objects sharing several visual characteristics with identity documents, including text regions, curved geometry, folds, and wear-and-tear artifacts. At the same time, receipts avoid legal and privacy limitations associated with identity documents, making them suitable for public O...
[3]

Experiments To support this research, we trained three models for the screen replay detection task using the original DLC-2021 (RE) train-test split. The evaluated architectures included a custom ResNet-inspired model trained from scratch, EfficientNet-B0V2 fine-tuned from ImageNet pretraining, and ViT-Small with frozen DINOv2 backbone and fine-tuned clas...

2021
[4]

Park, E.-J., Back, S.-Y., Kim, J., & Woo, S. S. (2023). KID34K: A dataset for online identity card fraud detection . In Proceedings of the 2023 ACM Workshop on Information Hiding and Multimedia Security (pp. 191–196). ACM. https://doi.org/10.1145/3583780.3615122

work page doi:10.1145/3583780.3615122 2023
[5]

V., Sigareva, I

Polevoy, D. V., Sigareva, I. V., Ershova, D. M., Arlazarov, V. V., Nikolaev, D. P., Ming, Z., Luqman, M. M., & Burie, J.-C. (2022). Document liveness challenge dataset (DLC-2021) . Journal of Imaging, 8 (7), 181. https://doi.org/10.3390/jimaging8070181

work page doi:10.3390/jimaging8070181 2022
[6]

Steinmann, D., Divo, F., Kraus, M., Wüst, A., Struppek, L., Friedrich, F., & Kersting, K. (2024). Navigating shortcuts, spurious correlations, and confounders: From origins via detection to mitigation . arXiv:2412.05152

work page arXiv 2024
[7]

Stehouwer, J., Jourabloo, A., Liu, Y., & Liu, X. (2020). Noise modeling, synthesis and classification for generic object anti-spoofing . arXiv:2003.13043

work page arXiv 2020
[8]

E., Stockhardt, F., González-Soler, L

Tapia, J. E., Stockhardt, F., González-Soler, L. J., & Busch, C. (2025). SynID: Passport synthetic dataset for presentation attack detection . arXiv:2505.07540

work page arXiv 2025
[9]

Vinogradov, A. (2025). Can generative models actually forge realistic identity documents? arXiv:2601.00829. 2 DLC ROC AUC RR OOD ROC AUC Δ ROC AUC Custom CNN 89.5 47.75 -41.75 EfficientNet-B0V2 88.45 65.32 -23.14 ViT-S/DINOv2 87.12 82.17 -4.96

work page arXiv 2025

[1] [1]

Introduction and Related Works Multiple datasets have been introduced for presentation attack detection in the domain of identity documents. The DLC-2021 dataset (Polevoy et al., 2022), based on images from the MIDV family, contains 10 document types and multiple attack categories including color photocopies, grayscale copies, and screen replay attacks ca...

2021

[2] [2]

At the same time, receipts avoid legal and privacy limitations associated with identity documents, making them suitable for public OOD benchmarking

Dataset description Receipts were selected as lightweight real-world planar objects sharing several visual characteristics with identity documents, including text regions, curved geometry, folds, and wear-and-tear artifacts. At the same time, receipts avoid legal and privacy limitations associated with identity documents, making them suitable for public O...

[3] [3]

Experiments To support this research, we trained three models for the screen replay detection task using the original DLC-2021 (RE) train-test split. The evaluated architectures included a custom ResNet-inspired model trained from scratch, EfficientNet-B0V2 fine-tuned from ImageNet pretraining, and ViT-Small with frozen DINOv2 backbone and fine-tuned clas...

2021

[4] [4]

Park, E.-J., Back, S.-Y., Kim, J., & Woo, S. S. (2023). KID34K: A dataset for online identity card fraud detection . In Proceedings of the 2023 ACM Workshop on Information Hiding and Multimedia Security (pp. 191–196). ACM. https://doi.org/10.1145/3583780.3615122

work page doi:10.1145/3583780.3615122 2023

[5] [5]

V., Sigareva, I

Polevoy, D. V., Sigareva, I. V., Ershova, D. M., Arlazarov, V. V., Nikolaev, D. P., Ming, Z., Luqman, M. M., & Burie, J.-C. (2022). Document liveness challenge dataset (DLC-2021) . Journal of Imaging, 8 (7), 181. https://doi.org/10.3390/jimaging8070181

work page doi:10.3390/jimaging8070181 2022

[6] [6]

Steinmann, D., Divo, F., Kraus, M., Wüst, A., Struppek, L., Friedrich, F., & Kersting, K. (2024). Navigating shortcuts, spurious correlations, and confounders: From origins via detection to mitigation . arXiv:2412.05152

work page arXiv 2024

[7] [7]

Stehouwer, J., Jourabloo, A., Liu, Y., & Liu, X. (2020). Noise modeling, synthesis and classification for generic object anti-spoofing . arXiv:2003.13043

work page arXiv 2020

[8] [8]

E., Stockhardt, F., González-Soler, L

Tapia, J. E., Stockhardt, F., González-Soler, L. J., & Busch, C. (2025). SynID: Passport synthetic dataset for presentation attack detection . arXiv:2505.07540

work page arXiv 2025

[9] [9]

Vinogradov, A. (2025). Can generative models actually forge realistic identity documents? arXiv:2601.00829. 2 DLC ROC AUC RR OOD ROC AUC Δ ROC AUC Custom CNN 89.5 47.75 -41.75 EfficientNet-B0V2 88.45 65.32 -23.14 ViT-S/DINOv2 87.12 82.17 -4.96

work page arXiv 2025