pith. machine review for the scientific record. sign in

arxiv: 2604.12443 · v1 · submitted 2026-04-14 · 💻 cs.CV

Recognition: unknown

DiffusionPrint: Learning Generative Fingerprints for Diffusion-Based Inpainting Localization

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:41 UTC · model grok-4.3

classification 💻 cs.CV
keywords diffusion inpaintingimage forgery localizationgenerative fingerprintscontrastive learningforensic feature mappatch-level learninglatent decoder artifacts
0
0 comments X

The pith

A patch-level contrastive learner extracts consistent generative fingerprints from diffusion-inpainted regions to aid forgery localization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Modern diffusion inpainters regenerate images through latent decoders that erase the camera noise traces traditional forensic tools depend on. The paper shows that inpainted patches from the same diffusion model nevertheless carry a shared generative fingerprint that contrastive training can isolate. By training a convolutional network with a MoCo-style objective and hard-negative mining, DiffusionPrint produces a secondary feature map that existing fusion detectors can use as an extra modality. When added to TruFor, MMFusion, and a simple baseline, the map raises localization accuracy, including on mask shapes and model architectures never seen in training. A sympathetic reader would care because this offers a practical way to keep up with generative forgers who now rely on full-image diffusion pipelines.

Core claim

DiffusionPrint trains a convolutional backbone with a MoCo-style contrastive objective and cross-category hard negative mining plus a generator-aware classification head; the resulting forensic feature map acts as a highly discriminative secondary modality that, when fused into existing IFL pipelines, improves localization of diffusion-based inpainting across multiple generators, with gains of up to 28 percent on mask types held out from fine-tuning and confirmed generalization to unseen generative architectures.

What carries the argument

Patch-level contrastive learning that treats inpainted regions from the same diffusion model as positives and uses cross-category hard negatives to produce a forensic feature map robust to latent-decoder spectral distortions.

If this is right

  • The learned feature map can be fused into TruFor, MMFusion, or lightweight baselines to raise localization performance on diffusion inpainting.
  • Gains reach up to 28 percent on mask types that were never shown during fine-tuning.
  • The same feature map generalizes to inpainting pipelines from generative architectures not encountered in training.
  • The method supplies a secondary forensic modality that works even when camera-level noise patterns have been erased by latent decoding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fingerprints could support model attribution tasks that identify which specific diffusion architecture created a given inpainted region.
  • The contrastive training recipe might transfer to other latent-decoder pipelines such as those used in text-to-image generation or video synthesis.
  • A lightweight version of the backbone could be deployed on-device for real-time screening of social-media uploads.
  • Combining the new generative-fingerprint channel with residual noise or frequency-domain cues could produce still stronger hybrid detectors.

Load-bearing premise

Inpainted regions produced by one diffusion model share a consistent generative fingerprint that survives latent decoding and can be recovered by patch-level contrastive learning.

What would settle it

A controlled test in which adding the learned DiffusionPrint feature map to TruFor or MMFusion yields no measurable gain in localization IoU or F1 on a held-out set of diffusion models and mask shapes would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.12443 by Paschalis Giakoumoglou, Symeon Papadopoulos.

Figure 1
Figure 1. Figure 1: Positive pair construction for the two image categories in [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of DiffusionPrint. An anchor patch is encoded [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Radial power spectra in the mid-to-high frequency band [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative forgery localization results. Comparison between traditional noise-based extractors (NP++) and our proposed Dif [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Failure cases. Top: an SD 2.1 inpainting correctly lo￾calized, but with an activation on an authentic object too. Bottom: a case exhibiting object bias, where the model activates on salient non-inpainted regions rather than the manipulated area. Input im￾ages show the inpainted region outlined by the ground-truth mask. also exhibits object bias, activating on salient non-inpainted regions rather than the m… view at source ↗
read the original abstract

Modern diffusion-based inpainting models pose significant challenges for image forgery localization (IFL), as their full regeneration pipelines reconstruct the entire image via a latent decoder, disrupting the camera-level noise patterns that existing forensic methods rely on. We propose DiffusionPrint, a patch-level contrastive learning framework that learns a forensic signal robust to the spectral distortions introduced by latent decoding. It exploits the fact that inpainted regions generated by the same model share a consistent generative fingerprint, using this as a self-supervisory signal. DiffusionPrint trains a convolutional backbone via a MoCo-style objective with cross-category hard negative mining and a generator-aware classification head, producing a forensic feature map that serves as a highly discriminative secondary modality in fusion-based IFL frameworks. Integrated into TruFor, MMFusion, and a lightweight fusion baseline, DiffusionPrint consistently improves localization across multiple generative models, with gains of up to +28% on mask types unseen during fine-tuning and confirmed generalization to unseen generative architectures. Code is available at https://github.com/mever-team/diffusionprint

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces DiffusionPrint, a patch-level contrastive learning framework (MoCo-style objective with cross-category hard negative mining and a generator-aware classification head) that extracts a forensic feature map from diffusion-based inpainted regions by exploiting consistent generative fingerprints across patches from the same model. This map is integrated as a secondary modality into fusion-based image forgery localization pipelines (TruFor, MMFusion, and a lightweight baseline), yielding reported localization improvements across multiple generative models, including gains of up to +28% on mask types unseen during fine-tuning and generalization to unseen diffusion architectures. Code is released at https://github.com/mever-team/diffusionprint.

Significance. If the empirical results hold under detailed validation, DiffusionPrint supplies a practical, self-supervised forensic signal that addresses the disruption of camera noise patterns by latent decoding in modern diffusion inpainting. The approach of learning generator-specific fingerprints via contrastive loss on the generative process itself, combined with public code for reproducibility, represents a constructive contribution to IFL in the diffusion era. The claimed robustness to unseen masks and architectures, if substantiated, would be a notable strength.

major comments (2)
  1. Abstract: the reported quantitative gains (including +28% on unseen masks) and generalization claims lack accompanying details on dataset sizes, number of generative models, statistical tests, ablation studies on the contrastive components (temperature, queue size, augmentation), or exact baseline implementations; without these, it is difficult to assess whether the improvements are robust or sensitive to post-hoc choices.
  2. §4 (Experimental results, assumed from abstract claims): the central hypothesis that inpainted regions from the same diffusion model share a consistent fingerprint robust to latent-decoding distortions is load-bearing for the method; the manuscript should include a targeted analysis (e.g., feature clustering by generator or t-SNE visualizations) demonstrating that the learned embeddings separate by model rather than by image content or mask geometry.
minor comments (3)
  1. Method section: the integration of the generator-aware classification head with the MoCo backbone would be clearer with a diagram or explicit pseudocode showing how the head is used only at training time.
  2. Figure captions and tables: several result tables would benefit from explicit reporting of standard deviations across multiple runs or seeds to support the generalization claims.
  3. Related work: the discussion of prior contrastive forensic methods could include a brief comparison table highlighting differences in negative mining strategy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and positive recommendation for minor revision. We address each major comment below and outline the changes we will make to the manuscript.

read point-by-point responses
  1. Referee: Abstract: the reported quantitative gains (including +28% on unseen masks) and generalization claims lack accompanying details on dataset sizes, number of generative models, statistical tests, ablation studies on the contrastive components (temperature, queue size, augmentation), or exact baseline implementations; without these, it is difficult to assess whether the improvements are robust or sensitive to post-hoc choices.

    Authors: We agree that the abstract would benefit from more context on the experimental scale. In the revised version, we will modify the abstract to mention the number of diffusion models and the overall dataset size used for training and evaluation. Additionally, we will expand Section 4 to include ablation studies on the contrastive learning hyperparameters (temperature, queue size, augmentations), statistical analyses of the improvements, and clearer descriptions of the baseline implementations. This will help demonstrate the robustness of the results. revision: yes

  2. Referee: §4 (Experimental results, assumed from abstract claims): the central hypothesis that inpainted regions from the same diffusion model share a consistent fingerprint robust to latent-decoding distortions is load-bearing for the method; the manuscript should include a targeted analysis (e.g., feature clustering by generator or t-SNE visualizations) demonstrating that the learned embeddings separate by model rather than by image content or mask geometry.

    Authors: We concur that a direct demonstration of the hypothesis would strengthen the paper. Although the generalization performance to unseen models and mask types provides supporting evidence, we will incorporate t-SNE visualizations of the feature embeddings in the revised Section 4. These plots will be colored according to the generating model to illustrate clustering by model rather than by content or mask geometry. We will also discuss how this supports the robustness to latent-decoding distortions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central contribution is a patch-level contrastive learning setup (MoCo-style with hard negatives and generator-aware head) that treats same-model inpainted patches as positive pairs to learn a forensic feature map. This is a standard self-supervised objective applied to the hypothesis that diffusion inpainting leaves model-specific traces robust to latent decoding. No equations reduce the output forensic map to an input parameter by construction, no fitted quantity is relabeled as a prediction, and no load-bearing step depends on a self-citation chain or imported uniqueness theorem. Reported gains are empirical improvements when the learned map is fused into external baselines (TruFor, MMFusion) on held-out masks and architectures. The derivation remains self-contained against external benchmarks and code release.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that diffusion models imprint a detectable, model-consistent fingerprint on inpainted patches that survives latent decoding; no free parameters or invented entities are explicitly named in the abstract, but typical contrastive training involves many untuned hyperparameters.

free parameters (1)
  • MoCo training hyperparameters (temperature, queue size, augmentation strength)
    Contrastive frameworks require several hyperparameters that are tuned on data and affect the learned fingerprint quality.
axioms (1)
  • domain assumption Inpainted regions from the same generative model share a consistent forensic fingerprint robust to latent decoding distortions
    This is the self-supervisory signal explicitly exploited in the abstract.

pith-pipeline@v0.9.0 · 5485 in / 1376 out tokens · 40398 ms · 2026-05-10T15:41:31.471814+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 4 canonical work pages · 2 internal anchors

  1. [1]

    Bringing generative AI into creative cloud with Adobe Firefly.https : / / blog

    Adobe. Bringing generative AI into creative cloud with Adobe Firefly.https : / / blog . adobe . com / en / publish/2023/03/21/bringing- gen- ai- to- creative-cloud-adobe-firefly, 2023. Accessed: 2025-09-26. 1

  2. [2]

    Blended latent diffusion.ACM Trans

    Omri Avrahami, Ohad Fried, and Dani Lischinski. Blended latent diffusion.ACM Trans. Graph., 42(4), 2023. 1, 2

  3. [3]

    Dragon: A large-scale dataset of realistic images generated by diffusion models, 2025

    Giulia Bertazzini, Daniele Baracchi, Dasara Shullani, Isao Echizen, and Alessandro Piva. Dragon: A large-scale dataset of realistic images generated by diffusion models, 2025. 5

  4. [4]

    Im- proved DCT coefficient analysis for forgery localization in JPEG images

    Tiziano Bianchi, Alessia De Rosa, and Alessandro Piva. Im- proved DCT coefficient analysis for forgery localization in JPEG images. InIEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), pages 2444–

  5. [5]

    Flux.https://github.com/ black- forest- labs/flux, 2024

    Black Forest Labs. Flux.https://github.com/ black- forest- labs/flux, 2024. Accessed: 2025- 09-19. 1, 2

  6. [6]

    Advances in ai-generated images and videos.International Journal of Interactive Multimedia and Artificial Intelligence, 9(1):173– 208, 2024

    Hessen Bougueffa, Mamadou Keita, Wassim Hamidouche, Abdelmalik Taleb-Ahmed, Helena Liz-L ´opez, Alejandro Mart´ın, David Camacho, and Abdenour Hadid. Advances in ai-generated images and videos.International Journal of Interactive Multimedia and Artificial Intelligence, 9(1):173– 208, 2024. 1

  7. [7]

    Emerg- ing properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9650–9660, 2021. 3, 5

  8. [8]

    Cloud Yu, and Chih-Chuan Chang

    I-Cheng Chang, J. Cloud Yu, and Chih-Chuan Chang. A forgery detection algorithm for exemplar-based inpainting images using multi-region relation.Image and Vision Com- puting, 31(1):57–71, 2013. 2

  9. [9]

    A simple framework for contrastive learn- ing of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- offrey Hinton. A simple framework for contrastive learn- ing of visual representations. InICML, pages 1597–1607. PMLR, 2020. 2, 3, 5

  10. [10]

    PRNU-based detection of small-size image forgeries

    Giovanni Chierchia, Sara Parrilli, Giovanni Poggi, Luisa Verdoliva, and Carlo Sansone. PRNU-based detection of small-size image forgeries. InInternational Conference on Digital Signal Processing (DSP), pages 1–6. IEEE, 2011. 2

  11. [11]

    Intriguing properties of syn- thetic images: From generative adversarial networks to dif- fusion models

    Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. Intriguing properties of syn- thetic images: From generative adversarial networks to dif- fusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Work- shops, pages 973–982, 2023. 3

  12. [12]

    Noiseprint: A cnn- based camera model fingerprint.IEEE Transactions on In- formation Forensics and Security, 15:144–159, 2020

    Davide Cozzolino and Luisa Verdoliva. Noiseprint: A cnn- based camera model fingerprint.IEEE Transactions on In- formation Forensics and Security, 15:144–159, 2020. 2, 5

  13. [13]

    Splicebuster: A new blind image splicing detector

    Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. Splicebuster: A new blind image splicing detector. InIEEE International Workshop on Information Forensics and Secu- rity (WIFS), pages 1–6. IEEE, 2015. 2

  14. [14]

    RAISE: A raw images dataset for dig- ital image forensics

    Duc-Tien Dang-Nguyen, Cecilia Pasquini, Valentina Conot- ter, and Giulia Boato. RAISE: A raw images dataset for dig- ital image forensics. InProceedings of the 6th ACM multi- media systems conference, pages 219–224, 2015. 5

  15. [15]

    MVSS-Net: Multi-view multi-scale super- vised networks for image manipulation detection.IEEE Trans

    Chengbo Dong, Xinru Chen, Ruohan Hu, Juan Cao, and Xirong Li. MVSS-Net: Multi-view multi-scale super- vised networks for image manipulation detection.IEEE Trans. Pattern Analysis and Machine Intel., 45(3):3539– 3553, 2022. 2

  16. [16]

    Exposing digital forgeries from JPEG ghosts

    Hany Farid. Exposing digital forgeries from JPEG ghosts. IEEE Transactions on Information Forensics and Security, 4 (1):154–160, 2009. 2

  17. [17]

    Image forgery localization via fine-grained analysis of cfa artifacts.IEEE Transactions on Information Forensics and Security, 7(5):1566–1577, 2012

    Pasquale Ferrara, Tiziano Bianchi, Alessia De Rosa, and Alessandro Piva. Image forgery localization via fine-grained analysis of cfa artifacts.IEEE Transactions on Information Forensics and Security, 7(5):1566–1577, 2012. 2

  18. [18]

    Relational Rep- resentation Distillation, 2024

    Nikolaos Giakoumoglou and Tania Stathaki. Relational Rep- resentation Distillation, 2024. 3

  19. [19]

    SynCo: Syn- thetic Hard Negatives for Contrastive Visual Representation Learning, 2025

    Nikolaos Giakoumoglou and Tania Stathaki. SynCo: Syn- thetic Hard Negatives for Contrastive Visual Representation Learning, 2025. 3

  20. [20]

    A Review on Discriminative Self-supervised Learn- ing Methods in Computer Vision, 2025

    Nikolaos Giakoumoglou, Tania Stathaki, and Athanasios Gkelias. A Review on Discriminative Self-supervised Learn- ing Methods in Computer Vision, 2025. 3

  21. [21]

    SAGI: Se- mantically aligned and uncertainty guided ai image inpaint- ing

    Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos, and Panagiotis C Petrantonakis. SAGI: Se- mantically aligned and uncertainty guided ai image inpaint- ing. InProc. IEEE/CVF Int. Conf Computer Vision (ICCV),

  22. [22]

    Bootstrap your own latent - a new approach to self-supervised learning

    Jean-Bastien Grill, Florian Strub, Florent Altch ´e, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Ghesh- laghi Azar, Bilal Piot, koray kavukcuoglu, Remi Munos, and Michal Valko. Bootstrap your own latent - a new approach to self-supervised learning. InAdvances in Neural Information Process...

  23. [23]

    TruFor: Leveraging all-round clues for trustworthy image forgery detection and localiza- tion

    Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. TruFor: Leveraging all-round clues for trustworthy image forgery detection and localiza- tion. InProc. IEEE/CVF Conf. Computer Vision Pattern Recogn. (CVPR), pages 20606–20615, 2023. 2, 5, 6, 7, 1

  24. [24]

    Momentum contrast for unsupervised visual repre- sentation learning

    Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual repre- sentation learning. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),

  25. [25]

    Content-aware de- tection of JPEG grid inconsistencies for intuitive image forensics.Journal of Visual Communication and Image Rep- resentation, 54:155–170, 2018

    Chryssanthi Iakovidou, Markos Zampoglou, Symeon Pa- padopoulos, and Yiannis Kompatsiaris. Content-aware de- tection of JPEG grid inconsistencies for intuitive image forensics.Journal of Visual Communication and Image Rep- resentation, 54:155–170, 2018. 2

  26. [26]

    Autosplice: A text-prompt manipu- lated image dataset for media forensics

    Shan Jia, Mingzhen Huang, Zhou Zhou, Yan Ju, Jialing Cai, and Siwei Lyu. Autosplice: A text-prompt manipu- lated image dataset for media forensics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 893–903, 2023. 2

  27. [27]

    BrushNet: A plug-and-play image inpainting model with decomposed dual-branch diffusion

    Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, and Qiang Xu. BrushNet: A plug-and-play image inpainting model with decomposed dual-branch diffusion. InEuropean Conference on Computer Vision, pages 150–168. Springer,

  28. [28]

    Hard Negative Mix- ing for Contrastive Learning

    Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion, Philippe Weinzaepfel, and Diane Larlus. Hard Negative Mix- ing for Contrastive Learning. InNeurIPS, 2020. 3

  29. [29]

    Fusion transformer with object mask guidance for image forgery analysis

    Dimitrios Karageorgiou, Giorgos Kordopatis-Zilos, and Symeon Papadopoulos. Fusion transformer with object mask guidance for image forgery analysis. InProc. IEEE/CVF Conf. Computer Vission Pattern Recogn., pages 4345–4355,

  30. [30]

    Localization of diffusion-based inpainting in digital images.IEEE Trans Inf

    Haodong Li, Weiqi Luo, and Jiwu Huang. Localization of diffusion-based inpainting in digital images.IEEE Trans Inf. Forensics Security, 12(12):3050–3064, 2017. 2

  31. [31]

    An efficient forgery detection algorithm for object removal by exemplar-based image inpainting.Journal of Visual Com- munication and Image Representation, 30:75–85, 2015

    Zaoshan Liang, Gaobo Yang, Xiangling Ding, and Leida Li. An efficient forgery detection algorithm for object removal by exemplar-based image inpainting.Journal of Visual Com- munication and Image Representation, 30:75–85, 2015. 2

  32. [32]

    PSCC-Net: Progressive spatio-channel correlation network for image manipulation detection and localization.IEEE Trans

    Xiaohong Liu, Yaojie Liu, Jun Chen, and Xiaoming Liu. PSCC-Net: Progressive spatio-channel correlation network for image manipulation detection and localization.IEEE Trans. Circuits Systems Video Technol., 32(11):7505–7517,

  33. [33]

    Tbformer: Two-branch transformer for image forgery localization.IEEE Signal Processing Letters, 2023

    Yaqi Liu, Binbin Lv, Xin Jin, Xiaoyu Chen, and Xiaokun Zhang. Tbformer: Two-branch transformer for image forgery localization.IEEE Signal Processing Letters, 2023. 2

  34. [34]

    MUN: Image forgery localization based on M 3 encoder and UN decoder.Proceedings of the AAAI Conference on Artificial Intelligence, 39(6):5685– 5693, 2025

    Yaqi Liu, Shuhuan Chen, Haichao Shi, Xiao-Yu Zhang, Song Xiao, and Qiang Cai. MUN: Image forgery localization based on M 3 encoder and UN decoder.Proceedings of the AAAI Conference on Artificial Intelligence, 39(6):5685– 5693, 2025. 6

  35. [35]

    Digi- tal camera identification from sensor pattern noise.IEEE Transactions on Information Forensics and Security, 1(2): 205–214, 2006

    Jan Lukas, Jessica Fridrich, and Miroslav Goljan. Digi- tal camera identification from sensor pattern noise.IEEE Transactions on Information Forensics and Security, 1(2): 205–214, 2006. 2

  36. [36]

    Using noise inconsisten- cies for blind image forensics.Image and Vision Computing, 27(10):1497–1503, 2009

    Babak Mahdian and Stanislav Saic. Using noise inconsisten- cies for blind image forensics.Image and Vision Computing, 27(10):1497–1503, 2009. 2

  37. [37]

    Hd-painter: High-resolution prompt-faithful text-guided image inpainting, 2024

    Ara Manukyan. Hd-painter: High-resolution prompt-faithful text-guided image inpainting, 2024. 2

  38. [38]

    TGIF: Text-guided inpainting forgery dataset

    Hannes Mareen, Dimitrios Karageorgiou, Glenn Van Wal- lendael, Peter Lambert, and Symeon Papadopoulos. TGIF: Text-guided inpainting forgery dataset. In2024 IEEE In- ternational Workshop on Information Forensics and Security (WIFS), 2024. 1, 2

  39. [39]

    Tgif2: Extended text-guided inpaint- ing forgery dataset and benchmark, 2026

    Hannes Mareen, Dimitrios Karageorgiou, Paschalis Gi- akoumoglou, Peter Lambert, Symeon Papadopoulos, and Glenn Van Wallendael. Tgif2: Extended text-guided inpaint- ing forgery dataset and benchmark, 2026. 2, 7

  40. [40]

    Ai- generated image detectors overrely on global artifacts: Ev- idence from inpainting exchange, 2026

    Elif Nebioglu, Emirhan Bilgic ¸, and Adrian Popescu. Ai- generated image detectors overrely on global artifacts: Ev- idence from inpainting exchange, 2026. 2

  41. [41]

    GLIDE: Towards photorealistic image genera- tion and editing with text-guided diffusion models

    Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. GLIDE: Towards photorealistic image genera- tion and editing with text-guided diffusion models. InPro- ceedings of the 39th International Conference on Machine Learning, pages 16784–16804. PMLR, 2022. 2

  42. [42]

    ZERO: A local JPEG grid origin detector based on the number of DCT zeros and its applications in image forensics.Image Processing On Line, 11:396–433, 2021

    Tina Nikoukhah, J ´er´emy Anger, Miguel Colom, Jean-Michel Morel, and Rafael Grompone von Gioi. ZERO: A local JPEG grid origin detector based on the number of DCT zeros and its applications in image forensics.Image Processing On Line, 11:396–433, 2021. 2

  43. [43]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion mod- els for high-resolution image synthesis.arXiv preprint arXiv:2307.01952, 2023. 2

  44. [44]

    Image forgery identification using convolu- tion neural network.International Journal of Recent Tech- nology and Engineering, 8(1):311–320, 2019

    N Hema Rajini. Image forgery identification using convolu- tion neural network.International Journal of Recent Tech- nology and Engineering, 8(1):311–320, 2019. 2

  45. [45]

    Contrastive learning with hard negative samples,

    Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, and Ste- fanie Jegelka. Contrastive learning with hard negative sam- ples.arXiv preprint arXiv:2010.04592, 2020. 3

  46. [46]

    High-resolution image syn- thesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-resolution image syn- thesis with latent diffusion models. InProc. IEEE/CVF Conf. Computer Vission Pattern Recogn., 2022. 1, 2

  47. [47]

    Rethinking image editing detection in the era of generative AI revolution

    Zhihao Sun, Haipeng Fang, Juan Cao, Xinying Zhao, and Danding Wang. Rethinking image editing detection in the era of generative AI revolution. InProceedings of the 32nd ACM International Conference on Multimedia, pages 3538– 3547, 2024. 2

  48. [48]

    Exploring multi-modal fusion for image manipulation detection and lo- calization

    Konstantinos Triaridis and Vasileios Mezaris. Exploring multi-modal fusion for image manipulation detection and lo- calization. InInt. Conf. Multimedia Model., pages 198–211. Springer, 2024. 2, 6, 7, 1

  49. [49]

    Media forensics and deepfakes: an overview.IEEE J Selected Topics Signal Process., 14(5): 910–932, 2020

    Luisa Verdoliva. Media forensics and deepfakes: an overview.IEEE J Selected Topics Signal Process., 14(5): 910–932, 2020. 1

  50. [50]

    Fleet, Radu Soricut, Jason Baldridge, Mo- hammad Norouzi, Peter Anderson, and William Chan

    Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont- Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mo- hammad Norouzi, Peter Anderson, and William Chan. Im- agen editor and EditBench: Advancing and evaluating text- guided image inpainting. InProc. IEEE/CVF Conf. Com- puter Vission Patt...

  51. [51]

    OpenSDI: Spotting diffusion-generated images in the open world

    Yabin Wang, Zhiwu Huang, and Xiaopeng Hong. OpenSDI: Spotting diffusion-generated images in the open world. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 4291–4301, 2025. 2

  52. [52]

    DIRE for diffusion-generated image detection

    Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. DIRE for diffusion-generated image detection. InProc. IEEE/CVF Int. Conf Computer Vision (ICCV), pages 22445–22455, 2023. 2

  53. [53]

    Iid-net: Image inpainting de- tection network via neural architecture search and attention

    Haiwei Wu and Jiantao Zhou. Iid-net: Image inpainting de- tection network via neural architecture search and attention. IEEE Transactions on Circuits and Systems for Video Tech- nology, 32(3):1172–1185, 2022. 2

  54. [54]

    Detection of digital doctoring in exemplar-based in- painted images

    Qiong Wu, Shao-Jie Sun, Wei Zhu, Guo-Hui Li, and Dan Tu. Detection of digital doctoring in exemplar-based in- painted images. In2008 International Conference on Ma- chine Learning and Cybernetics, pages 1222–1226, 2008. 2

  55. [55]

    ManTra-Net: Manipulation tracing network for detection and localization of image forgeries with anomalous features

    Yue Wu, Wael AbdAlmageed, and Premkumar Natarajan. ManTra-Net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. InProc. IEEE/CVF Conf. Computer Vision Pattern Recogn., pages 9543–9552, 2019. 2

  56. [56]

    Alvarez, and Ping Luo

    Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. Segformer: Simple and effi- cient design for semantic segmentation with transformers. In Advances in Neural Information Processing Systems, pages 12077–12090. Curran Associates, Inc., 2021. 6, 1

  57. [57]

    Common Inpainted Objects In-N-Out of Context

    Tianze Yang, Tyson Jordan, Ninghao Liu, and Jin Sun. Com- mon inpainted objects in-n-out of context.arXiv preprint arXiv:2506.00721, 2025. 2

  58. [58]

    arXiv preprint arXiv:2304.06790 , year=

    Tao Yu, Runseng Feng, Ruoyu Feng, Jinming Liu, Xin Jin, Wenjun Zeng, and Zhibo Chen. Inpaint anything: Segment anything meets image inpainting.arXiv preprint arXiv:2304.06790, 2023. 2

  59. [59]

    A robust forgery detection algorithm for object removal by exemplar-based image in- painting.Multimedia Tools and Applications, 77(10):11823– 11842, 2018

    Dengyong Zhang, Zaoshan Liang, Gaobo Yang, Qingguo Li, Leida Li, and Xingming Sun. A robust forgery detection algorithm for object removal by exemplar-based image in- painting.Multimedia Tools and Applications, 77(10):11823– 11842, 2018. 2

  60. [60]

    Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers.IEEE Transactions on Intelligent Transportation Systems, 2023

    Jiaming Zhang, Huayao Liu, Kailun Yang, Xinxin Hu, Ruip- ing Liu, and Rainer Stiefelhagen. Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers.IEEE Transactions on Intelligent Transportation Systems, 2023. 7, 1

  61. [61]

    Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE Transactions on Image Processing, 26(7):3142–3155, 2017

    Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE Transactions on Image Processing, 26(7):3142–3155, 2017. 5

  62. [62]

    Image region forgery detection: A deep learning ap- proach

    Ying Zhang, Jonathan Goh, Lei Lei Win, and Vrizlynn LL Thing. Image region forgery detection: A deep learning ap- proach. InSG-CRC, pages 1–11, 2016. 2

  63. [63]

    PRNU-based image forgery localization with deep multi- scale fusion.ACM Transactions on Multimedia Computing, Communications and Applications, 19(2):1–20, 2023

    Yushu Zhang, Qing Tan, Shuren Qi, and Mingfu Xue. PRNU-based image forgery localization with deep multi- scale fusion.ACM Transactions on Multimedia Computing, Communications and Applications, 19(2):1–20, 2023. 2

  64. [64]

    Ressl: Relational self-supervised learning with weak augmentation.Advances in Neural Information Processing Systems, 34:2543–2555,

    Mingkai Zheng, Shan You, Fei Wang, Chen Qian, Changshui Zhang, Xiaogang Wang, and Chang Xu. Ressl: Relational self-supervised learning with weak augmentation.Advances in Neural Information Processing Systems, 34:2543–2555,

  65. [65]

    Morariu, and Larry S

    Peng Zhou, Xintong Han, Vlad I. Morariu, and Larry S. Davis. Two-stream neural networks for tampered face de- tection, 2018. 2

  66. [66]

    A task is worth one word: Learning with task prompts for high-quality versatile image inpainting

    Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, and Kai Chen. A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. In European Conference on Computer Vision, pages 195–211. Springer, 2024. 2 DiffusionPrint: Learning Generative Fingerprints for Diffusion-Based Inpainting Localization Supplementary Material

  67. [67]

    However, in the context of image forensics, aug- mentations must be chosen carefully to avoid destroying the delicate traces left by the generative process

    Augmentation Strategies Contrastive learning relies heavily on data augmentation to generate diverse, positive views of the same underlying in- stance. However, in the context of image forensics, aug- mentations must be chosen carefully to avoid destroying the delicate traces left by the generative process. In this section, we evaluate the impact of diffe...

  68. [68]

    Lite Baseline Architecture. In addition to the state-of-the-art frameworks, we evaluate a custom lightweight two-stream baseline (Lite Baseline) to isolate the effectiveness of the forensic modalities with a simpler fusion mechanism. The RGB stream utilizes an ImageNet-pretrained Mix Transformer encoder (MiT-B2) from the SegFormer architecture [56]. For t...

  69. [69]

    The lite baseline was adapted from the TruFor implementation

    Training Details For the integration of DiffusionPrint into the TruFor [23] and MMFusion [48] frameworks, we retain their original ar- chitectural designs and training protocols, referring readers to the respective papers for exhaustive network details. The lite baseline was adapted from the TruFor implementation. Across all frameworks, input images are r...