COCO-Inpaint: A Benchmark for Detecting and Localizing Inpainting-Based Image Manipulations

Haozhen Yan; Huijia Zhu; Jiahui Zhan; Jianfu Zhang; Jun Lan; Suning Lang; Yan Hong; Yikun Ji

arxiv: 2504.18361 · v2 · pith:VIZSQHCXnew · submitted 2025-04-25 · 💻 cs.CV · cs.AI

COCO-Inpaint: A Benchmark for Detecting and Localizing Inpainting-Based Image Manipulations

Haozhen Yan , Yan Hong , Jiahui Zhan , Suning Lang , Yikun Ji , Huijia Zhu , Jun Lan , Jianfu Zhang This is my paper

Pith reviewed 2026-05-22 17:46 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords image manipulation detectioninpainting forgerylocalization benchmarkIMDL evaluationCOCO-Inpaintimage forensicsforgery localization

0 comments

The pith

COCO-Inpaint benchmark supplies 238,302 images from six inpainting models to test forgery detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces COCO-Inpaint to fill the gap in benchmarks for detecting inpainting-based image manipulations. It generates high-quality samples using six state-of-the-art models and four mask strategies, including optional text guidance, to create diverse and semantically rich cases. The design emphasizes intrinsic inconsistencies between inpainted and authentic regions instead of obvious semantic mismatches. A standard evaluation protocol with three metrics then measures how existing detection methods perform and where they encounter difficulties.

Core claim

We present COCO-Inpaint, a comprehensive benchmark with high-quality inpainting samples generated by six state-of-the-art inpainting models, diverse generation scenarios enabled by four mask generation strategies with optional text guidance, and large-scale coverage of 238,302 inpainted images with rich semantic diversity. The benchmark is constructed to highlight intrinsic inconsistencies between inpainted and authentic regions rather than superficial semantic artifacts such as object shapes. We further establish a rigorous evaluation protocol with three standard metrics to benchmark existing IMDL methods and reveal current trends and challenges.

What carries the argument

The COCO-Inpaint dataset of controlled inpainted images that isolates intrinsic inconsistencies for evaluation of image manipulation detection and localization methods.

If this is right

Existing IMDL methods can be directly compared on inpainting manipulations using a shared large-scale test set.
Detection approaches must address intrinsic region inconsistencies rather than relying on semantic cues alone.
The four mask strategies allow testing of robustness across different editing patterns.
Large semantic diversity ensures evaluations cover varied real-world image content.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The benchmark could guide creation of inpainting models that deliberately reduce detectable inconsistencies.
Similar controlled generation approaches might apply to building test sets for other manipulation types.
Text-guided inpainting cases open questions about how language conditioning affects forensic traces.

Load-bearing premise

The inpainted regions generated by the six models contain intrinsic inconsistencies representative of real-world manipulations that standard metrics can expose in current detection methods.

What would settle it

Running the existing IMDL methods on the full COCO-Inpaint set and finding that they reach near-perfect scores on all three metrics would show the benchmark fails to reveal meaningful limitations.

Figures

Figures reproduced from arXiv: 2504.18361 by Haozhen Yan, Huijia Zhu, Jiahui Zhan, Jianfu Zhang, Jun Lan, Suning Lang, Yan Hong, Yikun Ji.

**Figure 2.** Figure 2: Visualization of the COCO-Inpaint dataset. Mask1, Mask2, Mask3, and Mask4 represent Random Polygon, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization results of the IMDL models on the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

Recent advances in image manipulation have enabled highly photorealistic content generation, but also lowered the barrier to arbitrary editing, raising concerns about multimedia authenticity and security. Existing Image Manipulation Detection and Localization (IMDL) methods mainly target splicing or copy-move forgeries, while benchmarks for inpainting-based manipulations remain limited. To bridge this gap, we present COCO-Inpaint, a comprehensive benchmark specifically designed for inpainting detection and localization, with three key contributions: 1) High-quality inpainting samples generated by six state-of-the-art inpainting models, 2) Diverse generation scenarios enabled by four mask generation strategies with optional text guidance, and 3) Large-scale coverage of 238,302 inpainted images with rich semantic diversity. Our benchmark is constructed to highlight intrinsic inconsistencies between inpainted and authentic regions, rather than superficial semantic artifacts such as object shapes. We further establish a rigorous evaluation protocol with three standard metrics to benchmark existing IMDL methods and reveal current trends and challenges.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

COCO-Inpaint fills a gap with the first large benchmark for inpainting manipulations but its validation that samples isolate intrinsic inconsistencies remains thin.

read the letter

COCO-Inpaint stands out as the first substantial public benchmark for inpainting manipulations in the IMDL space. It generates 238,302 images from COCO using six state-of-the-art inpainting models and four mask strategies, some with text guidance, which is new ground compared to the splicing and copy-move focus of earlier benchmarks. The paper does well in scaling the dataset and designing it to emphasize intrinsic inconsistencies between inpainted and authentic regions, like texture or lighting mismatches, instead of semantic artifacts such as odd object shapes. Establishing a clear evaluation protocol with three standard metrics is practical for benchmarking existing methods and revealing challenges. The soft spots center on validation of the generation process. The claim that these samples produce detectable intrinsic inconsistencies representative of real-world inpainting is central, but the abstract does not detail perceptual studies, comparisons to real manipulations, or ablations on mask-induced artifacts. If the full paper provides this evidence, it would be more convincing; as described, this assumption is the least secure part. This work is for researchers in computer vision and digital media forensics who need datasets to develop or test detection and localization methods. Readers looking for a ready-to-use benchmark with baselines will find value in it. It deserves a serious referee because it provides a new, large-scale resource that addresses a real gap in the literature. I recommend sending it for peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces COCO-Inpaint, a benchmark dataset of 238,302 inpainted images generated from COCO using six state-of-the-art inpainting models, four mask generation strategies (with optional text guidance), and a focus on creating samples that expose intrinsic inconsistencies (texture, lighting, blending) rather than semantic artifacts such as unnatural object shapes. It establishes an evaluation protocol using three standard IMDL metrics to benchmark existing detection and localization methods and highlight current limitations.

Significance. A well-validated inpainting-specific benchmark would address a clear gap in the IMDL literature, where most existing datasets target splicing or copy-move forgeries. If the generated samples demonstrably avoid superficial semantic artifacts and produce inconsistencies representative of real-world manipulations, the resource could enable more targeted progress on inpainting detection and provide reproducible baselines for future work.

major comments (2)

[Abstract and §3 (Dataset Construction)] The central claim that the benchmark highlights intrinsic inconsistencies rather than superficial semantic artifacts (Abstract and §3) is load-bearing for the contribution but lacks supporting validation. No perceptual study, comparison against real inpainted images, or ablation quantifying boundary/shape artifacts introduced by the four mask strategies is reported; without this, it remains unclear whether standard IMDL metrics will isolate the intended intrinsic features or simply detect generation artifacts.
[§4 (Evaluation Protocol)] The evaluation protocol (§4) applies three standard metrics to existing IMDL methods but does not include controls for selection bias or confirmation that the 238k images are balanced across semantic categories and mask types. This weakens the claim that the benchmark reveals representative trends and challenges in current methods.

minor comments (2)

[§3] Clarify the exact overlap or differences between the four mask generation strategies and whether text guidance is applied uniformly or selectively across models.
[§3] Provide more detail on the train/validation/test splits and any steps taken to prevent data leakage from the original COCO annotations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate where revisions will be made to strengthen the presentation of our benchmark.

read point-by-point responses

Referee: [Abstract and §3 (Dataset Construction)] The central claim that the benchmark highlights intrinsic inconsistencies rather than superficial semantic artifacts (Abstract and §3) is load-bearing for the contribution but lacks supporting validation. No perceptual study, comparison against real inpainted images, or ablation quantifying boundary/shape artifacts introduced by the four mask strategies is reported; without this, it remains unclear whether standard IMDL metrics will isolate the intended intrinsic features or simply detect generation artifacts.

Authors: We agree that explicit validation of the claim would strengthen the paper. Our mask generation strategies were intentionally designed to produce irregular, non-semantic boundaries (e.g., random scribbles, boundary perturbations, and text-guided regions) rather than complete object removal or unnatural shapes. The six SOTA inpainting models were selected precisely because they minimize visible blending artifacts. We will revise §3 to expand the description of each mask strategy with additional qualitative examples and a brief discussion of why these choices reduce semantic artifacts. We will also add a limitations paragraph acknowledging the absence of a formal perceptual study or real-world inpainting comparison, as no large-scale public dataset of verified real inpainted forgeries currently exists for direct benchmarking. revision: partial
Referee: [§4 (Evaluation Protocol)] The evaluation protocol (§4) applies three standard metrics to existing IMDL methods but does not include controls for selection bias or confirmation that the 238k images are balanced across semantic categories and mask types. This weakens the claim that the benchmark reveals representative trends and challenges in current methods.

Authors: The 238,302 images were generated by applying the four mask strategies uniformly to images from the COCO validation and test sets, which are already balanced across 80 semantic categories. To make this explicit, we will add summary statistics (e.g., histograms or tables) in §4 or the supplementary material showing the distribution of object categories, mask area ratios, and mask types across the full benchmark. This will confirm coverage and allow readers to assess potential selection effects. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark construction is self-contained dataset generation

full rationale

The paper constructs COCO-Inpaint by applying six external SOTA inpainting models and four mask-generation strategies to COCO images, then defines an evaluation protocol using standard IMDL metrics. No equations, fitted parameters, or predictions are derived; the central claim is the existence and scale of the resulting 238k-image collection with its stated properties. The design choice to emphasize intrinsic inconsistencies is an input assumption rather than a result obtained from the paper's own outputs or self-citations. No load-bearing step reduces to a prior result by the same authors or by redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

No free parameters or invented physical entities. The work rests on standard computer-vision assumptions about image semantics and forgery detection metrics.

axioms (2)

domain assumption Existing IMDL methods primarily target splicing or copy-move forgeries rather than inpainting
Stated in the abstract as motivation for the benchmark.
standard math Standard metrics (precision, recall, F1 or equivalent) are appropriate for evaluating inpainting localization
Abstract refers to three standard metrics without further justification.

pith-pipeline@v0.9.0 · 5726 in / 1314 out tokens · 71410 ms · 2026-05-22T17:46:48.800595+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multi-axis Analysis of Image Manipulation Localization
cs.CV 2026-05 unverdicted novelty 6.0

Introduces the AUDITS benchmark for multi-axis evaluation of image manipulation localization under domain shifts and other factors.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · cited by 1 Pith paper · 6 internal anchors

[1]

Omri Avrahami, Ohad Fried, and Dani Lischinski. 2023. Blended latent diffusion. ACM transactions on graphics (TOG) 42, 4 (2023), 1–11

work page 2023
[2]

Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 18208–18218

work page 2022
[3]

Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2023. InstructPix2Pix: Learning To Follow Image Editing Instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 18392–18402

work page 2023
[4]

Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng- Ann Heng, and Stan Z Li. 2024. A survey on generative diffusion models. IEEE Transactions on Knowledge and Data Engineering (2024)

work page 2024
[5]

Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, and Hengshuang Zhao. 2024. Zero-shot image editing with reference imitation. Advances in Neural Information Processing Systems 37 (2024), 84010– 84032

work page 2024
[6]

Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, et al. 2024. GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization.arXiv preprint arXiv:2406.16531 (2024)

work page arXiv 2024
[7]

Ciprian Corneanu, Raghudeep Gadde, and Aleix M Martinez. 2024. LatentPaint: Image Inpainting in Latent Space With Diffusion Models. In IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV). 4334–4343

work page 2024
[8]

Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2023. On the detection of synthetic images generated by diffusion models. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 1–5

work page 2023
[9]

Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2015. Efficient Dense- Field Copy–Move Forgery Detection. IEEE Transactions on Information Forensics and Security 10, 11 (Nov 2015), 2284–2297. doi:10.1109/TIFS.2015.2455334

work page doi:10.1109/tifs.2015.2455334 2015
[10]

Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2015. Splicebuster: A new blind image splicing detector. In 2015 IEEE International Workshop on Information Forensics and Security (WIFS) . IEEE, Roma, Italy, 1–6. doi:10.1109/ WIFS.2015.7368565

work page arXiv 2015
[11]

Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2015. Splicebuster: A new blind image splicing detector. In 2015 IEEE International Workshop on Information Forensics and Security (WIFS) . IEEE, 1–6

work page 2015
[12]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- agenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition . Ieee, 248–255

work page 2009
[13]

Chengbo Dong, Xinru Chen, Ruohan Hu, Juan Cao, and Xirong Li. 2022. Mvss-net: Multi-view multi-scale supervised networks for image manipulation detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3539– 3553

work page 2022
[14]

Jing Dong, Wei Wang, and Tieniu Tan. 2013. Casia image tampering detection evaluation database. In 2013 IEEE China summit and international conference on signal and information processing . IEEE, 422–426

work page 2013
[15]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[16]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144

work page 2020
[17]

Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. 2023. TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 20606–20615

work page 2023
[18]

Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. 2023. Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 20606–20615

work page 2023
[19]

Xiao Guo, Xiaohong Liu, Zhiyuan Ren, Steven Grosz, Iacopo Masi, and Xiaoming Liu. 2023. Hierarchical fine-grained image forgery detection and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. 3155–3165

work page 2023
[20]

Jing Hao, Zhixin Zhang, Shicai Yang, Di Xie, and Shiliang Pu. 2021. Transforen- sics: image forgery localization with dense self-attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision . 15055–15064

work page 2021
[21]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851

work page 2020
[22]

Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, and Ram Nevatia. 2020. SPAN: Spatial pyramid attention network for image manipulation localization. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16 . Springer, 312–328

work page 2020
[23]

Shan Jia, Mingzhen Huang, Zhou Zhou, Yan Ju, Jialing Cai, and Siwei Lyu. 2023. AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. 893–903

work page 2023
[24]

Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, and Qiang Xu. 2024. Brushnet: A plug-and-play image inpainting model with decomposed dual- branch diffusion. In European Conference on Computer Vision . Springer, 150–168

work page 2024
[25]

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2023. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 6007–6017

work page 2023
[26]

Nupur Kumari, Bingliang Zhang, Sheng-Yu Wang, Eli Shechtman, Richard Zhang, and Jun-Yan Zhu. 2023. Ablating concepts in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision . 22691– 22702

work page 2023
[27]

Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung-Kyu Lee, and Changick Kim. 2022. Learning jpeg compression artifacts for image manipulation detection and localization. International Journal of Computer Vision 130, 8 (2022), 1875– 1895

work page 2022
[28]

Black Forest Labs. 2024. FLUX. https://github.com/black-forest-labs/flux

work page 2024
[29]

Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Fuli Feng

work page
[30]

Improving synthetic image detection towards generalization: An image trans- formation perspective

Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective. arXiv preprint arXiv:2408.06741 (2024)

work page arXiv 2024
[31]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13 . Springer, 740– 755

work page 2014
[32]

Anji Liu, Mathias Niepert, and Guy Van den Broeck. 2023. Image Inpainting via Tractable Steering of Diffusion Models. arXiv preprint arXiv:2401.03349 (2023)

work page arXiv 2023
[33]

Fuxiao Liu, Yinghan Wang, Tianlu Wang, and Vicente Ordonez. 2020. Visual news: Benchmark and challenges in news image captioning. arXiv preprint arXiv:2010.03743 (2020)

work page arXiv 2020
[34]

Xiaohong Liu, Yaojie Liu, Jun Chen, and Xiaoming Liu. 2022. PSCC-Net: Pro- gressive spatio-channel correlation network for image manipulation detection and localization. IEEE Transactions on Circuits and Systems for Video Technology 32, 11 (2022), 7505–7517

work page 2022
[35]

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. RePaint: Inpainting using denoising diffusion proba- bilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 11461–11471

work page 2022
[36]

Xiaochen Ma, Bo Du, Zhuohang Jiang, Ahmed Y Al Hammadi, and Jizhe Zhou

work page
[37]

arXiv preprint arXiv:2307.14863 (2023)

IML-ViT: Benchmarking Image Manipulation Localization by Vision Trans- former. arXiv preprint arXiv:2307.14863 (2023)

work page arXiv 2023
[38]

Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, et al . 2025. Imdl-benco: A comprehensive benchmark and codebase for image manipulation detection & localization. Advances in Neural Information Processing Systems 37 (2025), 134591–134613

work page 2025
[39]

Gaël Mahfoudi, Badr Tajini, Florent Retraint, Frederic Morain-Nicolier, Jean Luc Dugelay, and Marc Pic. 2019. Defacto: Image and face manipulation dataset. In 2019 27Th european signal processing conference (EUSIPCO) . IEEE, 1–5

work page 2019
[40]

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2023. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 6038–6047

work page 2023
[41]

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[42]

Adam Novozamsky, Babak Mahdian, and Stanislav Saic. 2020. IMD2020: A large- scale annotated dataset tailored for detecting manipulated images. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops

work page 2020
[43]

Yulin Pan, Chaojie Mao, Zeyinzi Jiang, Zhen Han, Jingfeng Zhang, and Xiangteng He. 2024. Locate, Assign, Refine: Taming Customized Promptable Image Inpaint- ing. arXiv preprint arXiv:2403.19534 (2024)

work page arXiv 2024
[44]

Gaurav Parmar, Richard Zhang, and Jun-Yan Zhu. 2022. On aliased resizing and surprising subtleties in gan evaluation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 11410–11420

work page 2022
[45]

Patrick Pérez, Michel Gangnet, and Andrew Blake. 2023. Poisson image editing. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2 . 577–582

work page 2023
[46]

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. Sdxl: Improving latent diffu- sion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen

work page
[48]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022). Haozhen Yan, Jiahui Zhan, Yikun Ji, Yan Hong, Jun Lan, Huijia Zhu, Weiqiang Wang, and Jianfu Zhang

work page internal anchor Pith review Pith/arXiv arXiv 2022
[49]

Yuan Rao and Jiangqun Ni. 2016. A deep learning approach to detection of splicing and copy-move forgeries in images. In 2016 IEEE International Workshop on Information Forensics and Security (WIFS) . IEEE, Abu Dhabi, United Arab Emirates, 1–6. doi:10.1109/WIFS.2016.7823911

work page doi:10.1109/wifs.2016.7823911 2016
[50]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Mod- els. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695

work page 2022
[51]

Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. 2022. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings . 1–10

work page 2022
[52]

Chaehun Shin, Jooyoung Choi, Heeseung Kim, and Sungroh Yoon. 2024. Large- Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator. arXiv preprint arXiv:2411.15466 (2024)

work page arXiv 2024
[53]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[54]

Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. 2022. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2149–2159

work page 2022
[55]

Luisa Verdoliva. 2020. Media forensics and deepfakes: an overview. IEEE journal of selected topics in signal processing 14, 5 (2020), 910–932

work page 2020
[56]

Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, and Yu-Gang Jiang. 2022. ObjectFormer for Image Manipula- tion Detection and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 2364–2373

work page 2022
[57]

Yinhuai Wang, Jiwen Yu, and Jian Zhang. 2022. Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490 (2022)

work page arXiv 2022
[58]

Haiwei Wu and Jiantao Zhou. 2021. IID-Net: Image inpainting detection network via neural architecture search and attention. IEEE Transactions on Circuits and Systems for Video Technology 32, 3 (2021), 1172–1185

work page 2021
[59]

Yue Wu et al. 2019. ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 9535–9544. doi:10.1109/CVPR.2019.00977

work page doi:10.1109/cvpr.2019.00977 2019
[60]

Yue Wu, Wael AbdAlmageed, and Premkumar Natarajan. 2019. Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 9543–9552

work page 2019
[61]

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and Efficient Design for Semantic Segmenta- tion with Transformers. In Neural Information Processing Systems (NeurIPS)

work page 2021
[62]

Shaoan Xie, Zhifei Zhang, Zhe Lin, Tobias Hinz, and Kun Zhang. 2023. Smart- brush: Text and shape guided object inpainting with diffusion model. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22428–22437

work page 2023
[63]

Shaoan Xie, Yang Zhao, Zhisheng Xiao, Kelvin CK Chan, Yandong Li, Yanwu Xu, Kun Zhang, and Tingbo Hou. 2023. Dreaminpainter: Text-guided subject-driven image inpainting with diffusion models. arXiv preprint arXiv:2312.03771 (2023)

work page arXiv 2023
[64]

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2024. Diffusion Models: A Comprehensive Survey of Methods and Applications. arXiv:2209.00796 [cs.LG] https://arxiv.org/abs/2209.00796

work page arXiv 2024
[65]

Shiyuan Yang, Xiaodong Chen, and Jing Liao. 2023. Uni-paint: A unified frame- work for multimodal image inpainting with pretrained diffusion model. In ACM International Conference on Multimedia (MM) . 3190–3199

work page 2023
[66]

Siyuan Yang, Lu Zhang, Liqian Ma, Yu Liu, JingJing Fu, and You He. 2023. Magi- cremover: Tuning-free text-guided image inpainting with diffusion models.arXiv preprint arXiv:2310.02848 (2023)

work page arXiv 2023
[67]

Yicheng Yang, Pengxiang Li, Lu Zhang, Liqian Ma, Ping Hu, Siyu Du, Yun- zhi Zhuge, Xu Jia, and Huchuan Lu. 2024. DreamMix: Decoupling Object At- tributes for Enhanced Editability in Customized Image Inpainting. arXiv preprint arXiv:2411.17223 (2024)

work page arXiv 2024
[68]

Ahmet Burak Yildirim, Vedat Baday, Erkut Erdem, Aykut Erdem, and Aysegul Dundar. 2023. Inst-inpaint: Instructing to remove objects with diffusion models. arXiv preprint arXiv:2304.03246 (2023)

work page arXiv 2023
[69]

Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang

work page
[70]

In Proceedings of the IEEE/CVF international conference on computer vision

Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision . 4471–4480

work page
[71]

Tao Yu, Runseng Feng, Ruoyu Feng, Jinming Liu, Xin Jin, Wenjun Zeng, and Zhibo Chen. 2023. Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023)

work page arXiv 2023
[72]

Markos Zampoglou, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2015. Detecting image splicing in the wild (web). In 2015 IEEE international conference on multimedia & expo workshops (ICMEW) . IEEE, 1–6

work page 2015
[73]

Guanhua Zhang, Jiabao Ji, Yang Zhang, Mo Yu, Tommi Jaakkola, and Shiyu Chang. 2023. Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models. arXiv:2304.03322 [cs.CV]

work page arXiv 2023
[74]

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene Parsing through ADE20K Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

work page 2017
[75]

Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, and Ishan Misra. 2022. Detecting twenty-thousand classes using image-level supervision. In Computer Vision–ECCV 2022: 17th European Conference, Tel A viv, Israel, October 23–27, 2022, Proceedings, Part IX . Springer, 350–368

work page 2022
[76]

Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, and Kai Chen. 2024. A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. In European Conference on Computer Vision. Springer, 195–211

work page 2024
[77]

Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Omri Avrahami, Ohad Fried, and Dani Lischinski. 2023. Blended latent diffusion. ACM transactions on graphics (TOG) 42, 4 (2023), 1–11

work page 2023

[2] [2]

Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 18208–18218

work page 2022

[3] [3]

Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2023. InstructPix2Pix: Learning To Follow Image Editing Instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 18392–18402

work page 2023

[4] [4]

Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng- Ann Heng, and Stan Z Li. 2024. A survey on generative diffusion models. IEEE Transactions on Knowledge and Data Engineering (2024)

work page 2024

[5] [5]

Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, and Hengshuang Zhao. 2024. Zero-shot image editing with reference imitation. Advances in Neural Information Processing Systems 37 (2024), 84010– 84032

work page 2024

[6] [6]

Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, et al. 2024. GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization.arXiv preprint arXiv:2406.16531 (2024)

work page arXiv 2024

[7] [7]

Ciprian Corneanu, Raghudeep Gadde, and Aleix M Martinez. 2024. LatentPaint: Image Inpainting in Latent Space With Diffusion Models. In IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV). 4334–4343

work page 2024

[8] [8]

Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2023. On the detection of synthetic images generated by diffusion models. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 1–5

work page 2023

[9] [9]

Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2015. Efficient Dense- Field Copy–Move Forgery Detection. IEEE Transactions on Information Forensics and Security 10, 11 (Nov 2015), 2284–2297. doi:10.1109/TIFS.2015.2455334

work page doi:10.1109/tifs.2015.2455334 2015

[10] [10]

Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2015. Splicebuster: A new blind image splicing detector. In 2015 IEEE International Workshop on Information Forensics and Security (WIFS) . IEEE, Roma, Italy, 1–6. doi:10.1109/ WIFS.2015.7368565

work page arXiv 2015

[11] [11]

Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2015. Splicebuster: A new blind image splicing detector. In 2015 IEEE International Workshop on Information Forensics and Security (WIFS) . IEEE, 1–6

work page 2015

[12] [12]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- agenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition . Ieee, 248–255

work page 2009

[13] [13]

Chengbo Dong, Xinru Chen, Ruohan Hu, Juan Cao, and Xirong Li. 2022. Mvss-net: Multi-view multi-scale supervised networks for image manipulation detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3539– 3553

work page 2022

[14] [14]

Jing Dong, Wei Wang, and Tieniu Tan. 2013. Casia image tampering detection evaluation database. In 2013 IEEE China summit and international conference on signal and information processing . IEEE, 422–426

work page 2013

[15] [15]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020

[16] [16]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144

work page 2020

[17] [17]

Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. 2023. TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 20606–20615

work page 2023

[18] [18]

Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. 2023. Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 20606–20615

work page 2023

[19] [19]

Xiao Guo, Xiaohong Liu, Zhiyuan Ren, Steven Grosz, Iacopo Masi, and Xiaoming Liu. 2023. Hierarchical fine-grained image forgery detection and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. 3155–3165

work page 2023

[20] [20]

Jing Hao, Zhixin Zhang, Shicai Yang, Di Xie, and Shiliang Pu. 2021. Transforen- sics: image forgery localization with dense self-attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision . 15055–15064

work page 2021

[21] [21]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851

work page 2020

[22] [22]

Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, and Ram Nevatia. 2020. SPAN: Spatial pyramid attention network for image manipulation localization. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16 . Springer, 312–328

work page 2020

[23] [23]

Shan Jia, Mingzhen Huang, Zhou Zhou, Yan Ju, Jialing Cai, and Siwei Lyu. 2023. AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. 893–903

work page 2023

[24] [24]

Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, and Qiang Xu. 2024. Brushnet: A plug-and-play image inpainting model with decomposed dual- branch diffusion. In European Conference on Computer Vision . Springer, 150–168

work page 2024

[25] [25]

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2023. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 6007–6017

work page 2023

[26] [26]

Nupur Kumari, Bingliang Zhang, Sheng-Yu Wang, Eli Shechtman, Richard Zhang, and Jun-Yan Zhu. 2023. Ablating concepts in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision . 22691– 22702

work page 2023

[27] [27]

Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung-Kyu Lee, and Changick Kim. 2022. Learning jpeg compression artifacts for image manipulation detection and localization. International Journal of Computer Vision 130, 8 (2022), 1875– 1895

work page 2022

[28] [28]

Black Forest Labs. 2024. FLUX. https://github.com/black-forest-labs/flux

work page 2024

[29] [29]

Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Fuli Feng

work page

[30] [30]

Improving synthetic image detection towards generalization: An image trans- formation perspective

Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective. arXiv preprint arXiv:2408.06741 (2024)

work page arXiv 2024

[31] [31]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13 . Springer, 740– 755

work page 2014

[32] [32]

Anji Liu, Mathias Niepert, and Guy Van den Broeck. 2023. Image Inpainting via Tractable Steering of Diffusion Models. arXiv preprint arXiv:2401.03349 (2023)

work page arXiv 2023

[33] [33]

Fuxiao Liu, Yinghan Wang, Tianlu Wang, and Vicente Ordonez. 2020. Visual news: Benchmark and challenges in news image captioning. arXiv preprint arXiv:2010.03743 (2020)

work page arXiv 2020

[34] [34]

Xiaohong Liu, Yaojie Liu, Jun Chen, and Xiaoming Liu. 2022. PSCC-Net: Pro- gressive spatio-channel correlation network for image manipulation detection and localization. IEEE Transactions on Circuits and Systems for Video Technology 32, 11 (2022), 7505–7517

work page 2022

[35] [35]

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. RePaint: Inpainting using denoising diffusion proba- bilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 11461–11471

work page 2022

[36] [36]

Xiaochen Ma, Bo Du, Zhuohang Jiang, Ahmed Y Al Hammadi, and Jizhe Zhou

work page

[37] [37]

arXiv preprint arXiv:2307.14863 (2023)

IML-ViT: Benchmarking Image Manipulation Localization by Vision Trans- former. arXiv preprint arXiv:2307.14863 (2023)

work page arXiv 2023

[38] [38]

Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, et al . 2025. Imdl-benco: A comprehensive benchmark and codebase for image manipulation detection & localization. Advances in Neural Information Processing Systems 37 (2025), 134591–134613

work page 2025

[39] [39]

Gaël Mahfoudi, Badr Tajini, Florent Retraint, Frederic Morain-Nicolier, Jean Luc Dugelay, and Marc Pic. 2019. Defacto: Image and face manipulation dataset. In 2019 27Th european signal processing conference (EUSIPCO) . IEEE, 1–5

work page 2019

[40] [40]

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2023. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 6038–6047

work page 2023

[41] [41]

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[42] [42]

Adam Novozamsky, Babak Mahdian, and Stanislav Saic. 2020. IMD2020: A large- scale annotated dataset tailored for detecting manipulated images. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops

work page 2020

[43] [43]

Yulin Pan, Chaojie Mao, Zeyinzi Jiang, Zhen Han, Jingfeng Zhang, and Xiangteng He. 2024. Locate, Assign, Refine: Taming Customized Promptable Image Inpaint- ing. arXiv preprint arXiv:2403.19534 (2024)

work page arXiv 2024

[44] [44]

Gaurav Parmar, Richard Zhang, and Jun-Yan Zhu. 2022. On aliased resizing and surprising subtleties in gan evaluation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 11410–11420

work page 2022

[45] [45]

Patrick Pérez, Michel Gangnet, and Andrew Blake. 2023. Poisson image editing. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2 . 577–582

work page 2023

[46] [46]

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. Sdxl: Improving latent diffu- sion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[47] [47]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen

work page

[48] [48]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022). Haozhen Yan, Jiahui Zhan, Yikun Ji, Yan Hong, Jun Lan, Huijia Zhu, Weiqiang Wang, and Jianfu Zhang

work page internal anchor Pith review Pith/arXiv arXiv 2022

[49] [49]

Yuan Rao and Jiangqun Ni. 2016. A deep learning approach to detection of splicing and copy-move forgeries in images. In 2016 IEEE International Workshop on Information Forensics and Security (WIFS) . IEEE, Abu Dhabi, United Arab Emirates, 1–6. doi:10.1109/WIFS.2016.7823911

work page doi:10.1109/wifs.2016.7823911 2016

[50] [50]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Mod- els. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695

work page 2022

[51] [51]

Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. 2022. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings . 1–10

work page 2022

[52] [52]

Chaehun Shin, Jooyoung Choi, Heeseung Kim, and Sungroh Yoon. 2024. Large- Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator. arXiv preprint arXiv:2411.15466 (2024)

work page arXiv 2024

[53] [53]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020

[54] [54]

Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. 2022. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2149–2159

work page 2022

[55] [55]

Luisa Verdoliva. 2020. Media forensics and deepfakes: an overview. IEEE journal of selected topics in signal processing 14, 5 (2020), 910–932

work page 2020

[56] [56]

Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, and Yu-Gang Jiang. 2022. ObjectFormer for Image Manipula- tion Detection and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 2364–2373

work page 2022

[57] [57]

Yinhuai Wang, Jiwen Yu, and Jian Zhang. 2022. Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490 (2022)

work page arXiv 2022

[58] [58]

Haiwei Wu and Jiantao Zhou. 2021. IID-Net: Image inpainting detection network via neural architecture search and attention. IEEE Transactions on Circuits and Systems for Video Technology 32, 3 (2021), 1172–1185

work page 2021

[59] [59]

Yue Wu et al. 2019. ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 9535–9544. doi:10.1109/CVPR.2019.00977

work page doi:10.1109/cvpr.2019.00977 2019

[60] [60]

Yue Wu, Wael AbdAlmageed, and Premkumar Natarajan. 2019. Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 9543–9552

work page 2019

[61] [61]

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and Efficient Design for Semantic Segmenta- tion with Transformers. In Neural Information Processing Systems (NeurIPS)

work page 2021

[62] [62]

Shaoan Xie, Zhifei Zhang, Zhe Lin, Tobias Hinz, and Kun Zhang. 2023. Smart- brush: Text and shape guided object inpainting with diffusion model. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22428–22437

work page 2023

[63] [63]

Shaoan Xie, Yang Zhao, Zhisheng Xiao, Kelvin CK Chan, Yandong Li, Yanwu Xu, Kun Zhang, and Tingbo Hou. 2023. Dreaminpainter: Text-guided subject-driven image inpainting with diffusion models. arXiv preprint arXiv:2312.03771 (2023)

work page arXiv 2023

[64] [64]

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2024. Diffusion Models: A Comprehensive Survey of Methods and Applications. arXiv:2209.00796 [cs.LG] https://arxiv.org/abs/2209.00796

work page arXiv 2024

[65] [65]

Shiyuan Yang, Xiaodong Chen, and Jing Liao. 2023. Uni-paint: A unified frame- work for multimodal image inpainting with pretrained diffusion model. In ACM International Conference on Multimedia (MM) . 3190–3199

work page 2023

[66] [66]

Siyuan Yang, Lu Zhang, Liqian Ma, Yu Liu, JingJing Fu, and You He. 2023. Magi- cremover: Tuning-free text-guided image inpainting with diffusion models.arXiv preprint arXiv:2310.02848 (2023)

work page arXiv 2023

[67] [67]

Yicheng Yang, Pengxiang Li, Lu Zhang, Liqian Ma, Ping Hu, Siyu Du, Yun- zhi Zhuge, Xu Jia, and Huchuan Lu. 2024. DreamMix: Decoupling Object At- tributes for Enhanced Editability in Customized Image Inpainting. arXiv preprint arXiv:2411.17223 (2024)

work page arXiv 2024

[68] [68]

Ahmet Burak Yildirim, Vedat Baday, Erkut Erdem, Aykut Erdem, and Aysegul Dundar. 2023. Inst-inpaint: Instructing to remove objects with diffusion models. arXiv preprint arXiv:2304.03246 (2023)

work page arXiv 2023

[69] [69]

Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang

work page

[70] [70]

In Proceedings of the IEEE/CVF international conference on computer vision

Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision . 4471–4480

work page

[71] [71]

Tao Yu, Runseng Feng, Ruoyu Feng, Jinming Liu, Xin Jin, Wenjun Zeng, and Zhibo Chen. 2023. Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023)

work page arXiv 2023

[72] [72]

Markos Zampoglou, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2015. Detecting image splicing in the wild (web). In 2015 IEEE international conference on multimedia & expo workshops (ICMEW) . IEEE, 1–6

work page 2015

[73] [73]

Guanhua Zhang, Jiabao Ji, Yang Zhang, Mo Yu, Tommi Jaakkola, and Shiyu Chang. 2023. Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models. arXiv:2304.03322 [cs.CV]

work page arXiv 2023

[74] [74]

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene Parsing through ADE20K Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

work page 2017

[75] [75]

Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, and Ishan Misra. 2022. Detecting twenty-thousand classes using image-level supervision. In Computer Vision–ECCV 2022: 17th European Conference, Tel A viv, Israel, October 23–27, 2022, Proceedings, Part IX . Springer, 350–368

work page 2022

[76] [76]

Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, and Kai Chen. 2024. A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. In European Conference on Computer Vision. Springer, 195–211

work page 2024

[77] [77]

Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016