COCO-Inpaint: A Benchmark for Detecting and Localizing Inpainting-Based Image Manipulations
Pith reviewed 2026-05-22 17:46 UTC · model grok-4.3
The pith
COCO-Inpaint benchmark supplies 238,302 images from six inpainting models to test forgery detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present COCO-Inpaint, a comprehensive benchmark with high-quality inpainting samples generated by six state-of-the-art inpainting models, diverse generation scenarios enabled by four mask generation strategies with optional text guidance, and large-scale coverage of 238,302 inpainted images with rich semantic diversity. The benchmark is constructed to highlight intrinsic inconsistencies between inpainted and authentic regions rather than superficial semantic artifacts such as object shapes. We further establish a rigorous evaluation protocol with three standard metrics to benchmark existing IMDL methods and reveal current trends and challenges.
What carries the argument
The COCO-Inpaint dataset of controlled inpainted images that isolates intrinsic inconsistencies for evaluation of image manipulation detection and localization methods.
If this is right
- Existing IMDL methods can be directly compared on inpainting manipulations using a shared large-scale test set.
- Detection approaches must address intrinsic region inconsistencies rather than relying on semantic cues alone.
- The four mask strategies allow testing of robustness across different editing patterns.
- Large semantic diversity ensures evaluations cover varied real-world image content.
Where Pith is reading between the lines
- The benchmark could guide creation of inpainting models that deliberately reduce detectable inconsistencies.
- Similar controlled generation approaches might apply to building test sets for other manipulation types.
- Text-guided inpainting cases open questions about how language conditioning affects forensic traces.
Load-bearing premise
The inpainted regions generated by the six models contain intrinsic inconsistencies representative of real-world manipulations that standard metrics can expose in current detection methods.
What would settle it
Running the existing IMDL methods on the full COCO-Inpaint set and finding that they reach near-perfect scores on all three metrics would show the benchmark fails to reveal meaningful limitations.
Figures
read the original abstract
Recent advances in image manipulation have enabled highly photorealistic content generation, but also lowered the barrier to arbitrary editing, raising concerns about multimedia authenticity and security. Existing Image Manipulation Detection and Localization (IMDL) methods mainly target splicing or copy-move forgeries, while benchmarks for inpainting-based manipulations remain limited. To bridge this gap, we present COCO-Inpaint, a comprehensive benchmark specifically designed for inpainting detection and localization, with three key contributions: 1) High-quality inpainting samples generated by six state-of-the-art inpainting models, 2) Diverse generation scenarios enabled by four mask generation strategies with optional text guidance, and 3) Large-scale coverage of 238,302 inpainted images with rich semantic diversity. Our benchmark is constructed to highlight intrinsic inconsistencies between inpainted and authentic regions, rather than superficial semantic artifacts such as object shapes. We further establish a rigorous evaluation protocol with three standard metrics to benchmark existing IMDL methods and reveal current trends and challenges.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces COCO-Inpaint, a benchmark dataset of 238,302 inpainted images generated from COCO using six state-of-the-art inpainting models, four mask generation strategies (with optional text guidance), and a focus on creating samples that expose intrinsic inconsistencies (texture, lighting, blending) rather than semantic artifacts such as unnatural object shapes. It establishes an evaluation protocol using three standard IMDL metrics to benchmark existing detection and localization methods and highlight current limitations.
Significance. A well-validated inpainting-specific benchmark would address a clear gap in the IMDL literature, where most existing datasets target splicing or copy-move forgeries. If the generated samples demonstrably avoid superficial semantic artifacts and produce inconsistencies representative of real-world manipulations, the resource could enable more targeted progress on inpainting detection and provide reproducible baselines for future work.
major comments (2)
- [Abstract and §3 (Dataset Construction)] The central claim that the benchmark highlights intrinsic inconsistencies rather than superficial semantic artifacts (Abstract and §3) is load-bearing for the contribution but lacks supporting validation. No perceptual study, comparison against real inpainted images, or ablation quantifying boundary/shape artifacts introduced by the four mask strategies is reported; without this, it remains unclear whether standard IMDL metrics will isolate the intended intrinsic features or simply detect generation artifacts.
- [§4 (Evaluation Protocol)] The evaluation protocol (§4) applies three standard metrics to existing IMDL methods but does not include controls for selection bias or confirmation that the 238k images are balanced across semantic categories and mask types. This weakens the claim that the benchmark reveals representative trends and challenges in current methods.
minor comments (2)
- [§3] Clarify the exact overlap or differences between the four mask generation strategies and whether text guidance is applied uniformly or selectively across models.
- [§3] Provide more detail on the train/validation/test splits and any steps taken to prevent data leakage from the original COCO annotations.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate where revisions will be made to strengthen the presentation of our benchmark.
read point-by-point responses
-
Referee: [Abstract and §3 (Dataset Construction)] The central claim that the benchmark highlights intrinsic inconsistencies rather than superficial semantic artifacts (Abstract and §3) is load-bearing for the contribution but lacks supporting validation. No perceptual study, comparison against real inpainted images, or ablation quantifying boundary/shape artifacts introduced by the four mask strategies is reported; without this, it remains unclear whether standard IMDL metrics will isolate the intended intrinsic features or simply detect generation artifacts.
Authors: We agree that explicit validation of the claim would strengthen the paper. Our mask generation strategies were intentionally designed to produce irregular, non-semantic boundaries (e.g., random scribbles, boundary perturbations, and text-guided regions) rather than complete object removal or unnatural shapes. The six SOTA inpainting models were selected precisely because they minimize visible blending artifacts. We will revise §3 to expand the description of each mask strategy with additional qualitative examples and a brief discussion of why these choices reduce semantic artifacts. We will also add a limitations paragraph acknowledging the absence of a formal perceptual study or real-world inpainting comparison, as no large-scale public dataset of verified real inpainted forgeries currently exists for direct benchmarking. revision: partial
-
Referee: [§4 (Evaluation Protocol)] The evaluation protocol (§4) applies three standard metrics to existing IMDL methods but does not include controls for selection bias or confirmation that the 238k images are balanced across semantic categories and mask types. This weakens the claim that the benchmark reveals representative trends and challenges in current methods.
Authors: The 238,302 images were generated by applying the four mask strategies uniformly to images from the COCO validation and test sets, which are already balanced across 80 semantic categories. To make this explicit, we will add summary statistics (e.g., histograms or tables) in §4 or the supplementary material showing the distribution of object categories, mask area ratios, and mask types across the full benchmark. This will confirm coverage and allow readers to assess potential selection effects. revision: yes
Circularity Check
No circularity: benchmark construction is self-contained dataset generation
full rationale
The paper constructs COCO-Inpaint by applying six external SOTA inpainting models and four mask-generation strategies to COCO images, then defines an evaluation protocol using standard IMDL metrics. No equations, fitted parameters, or predictions are derived; the central claim is the existence and scale of the resulting 238k-image collection with its stated properties. The design choice to emphasize intrinsic inconsistencies is an input assumption rather than a result obtained from the paper's own outputs or self-citations. No load-bearing step reduces to a prior result by the same authors or by redefinition.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Existing IMDL methods primarily target splicing or copy-move forgeries rather than inpainting
- standard math Standard metrics (precision, recall, F1 or equivalent) are appropriate for evaluating inpainting localization
Forward citations
Cited by 1 Pith paper
-
Multi-axis Analysis of Image Manipulation Localization
Introduces the AUDITS benchmark for multi-axis evaluation of image manipulation localization under domain shifts and other factors.
Reference graph
Works this paper leans on
-
[1]
Omri Avrahami, Ohad Fried, and Dani Lischinski. 2023. Blended latent diffusion. ACM transactions on graphics (TOG) 42, 4 (2023), 1–11
work page 2023
-
[2]
Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 18208–18218
work page 2022
-
[3]
Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2023. InstructPix2Pix: Learning To Follow Image Editing Instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 18392–18402
work page 2023
-
[4]
Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng- Ann Heng, and Stan Z Li. 2024. A survey on generative diffusion models. IEEE Transactions on Knowledge and Data Engineering (2024)
work page 2024
-
[5]
Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, and Hengshuang Zhao. 2024. Zero-shot image editing with reference imitation. Advances in Neural Information Processing Systems 37 (2024), 84010– 84032
work page 2024
- [6]
-
[7]
Ciprian Corneanu, Raghudeep Gadde, and Aleix M Martinez. 2024. LatentPaint: Image Inpainting in Latent Space With Diffusion Models. In IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV). 4334–4343
work page 2024
-
[8]
Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2023. On the detection of synthetic images generated by diffusion models. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 1–5
work page 2023
-
[9]
Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2015. Efficient Dense- Field Copy–Move Forgery Detection. IEEE Transactions on Information Forensics and Security 10, 11 (Nov 2015), 2284–2297. doi:10.1109/TIFS.2015.2455334
- [10]
-
[11]
Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2015. Splicebuster: A new blind image splicing detector. In 2015 IEEE International Workshop on Information Forensics and Security (WIFS) . IEEE, 1–6
work page 2015
-
[12]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- agenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition . Ieee, 248–255
work page 2009
-
[13]
Chengbo Dong, Xinru Chen, Ruohan Hu, Juan Cao, and Xirong Li. 2022. Mvss-net: Multi-view multi-scale supervised networks for image manipulation detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3539– 3553
work page 2022
-
[14]
Jing Dong, Wei Wang, and Tieniu Tan. 2013. Casia image tampering detection evaluation database. In 2013 IEEE China summit and international conference on signal and information processing . IEEE, 422–426
work page 2013
-
[15]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[16]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144
work page 2020
-
[17]
Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. 2023. TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 20606–20615
work page 2023
-
[18]
Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. 2023. Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 20606–20615
work page 2023
-
[19]
Xiao Guo, Xiaohong Liu, Zhiyuan Ren, Steven Grosz, Iacopo Masi, and Xiaoming Liu. 2023. Hierarchical fine-grained image forgery detection and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. 3155–3165
work page 2023
-
[20]
Jing Hao, Zhixin Zhang, Shicai Yang, Di Xie, and Shiliang Pu. 2021. Transforen- sics: image forgery localization with dense self-attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision . 15055–15064
work page 2021
-
[21]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851
work page 2020
-
[22]
Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, and Ram Nevatia. 2020. SPAN: Spatial pyramid attention network for image manipulation localization. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16 . Springer, 312–328
work page 2020
-
[23]
Shan Jia, Mingzhen Huang, Zhou Zhou, Yan Ju, Jialing Cai, and Siwei Lyu. 2023. AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. 893–903
work page 2023
-
[24]
Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, and Qiang Xu. 2024. Brushnet: A plug-and-play image inpainting model with decomposed dual- branch diffusion. In European Conference on Computer Vision . Springer, 150–168
work page 2024
-
[25]
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2023. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 6007–6017
work page 2023
-
[26]
Nupur Kumari, Bingliang Zhang, Sheng-Yu Wang, Eli Shechtman, Richard Zhang, and Jun-Yan Zhu. 2023. Ablating concepts in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision . 22691– 22702
work page 2023
-
[27]
Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung-Kyu Lee, and Changick Kim. 2022. Learning jpeg compression artifacts for image manipulation detection and localization. International Journal of Computer Vision 130, 8 (2022), 1875– 1895
work page 2022
-
[28]
Black Forest Labs. 2024. FLUX. https://github.com/black-forest-labs/flux
work page 2024
-
[29]
Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Fuli Feng
-
[30]
Improving synthetic image detection towards generalization: An image trans- formation perspective
Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective. arXiv preprint arXiv:2408.06741 (2024)
-
[31]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13 . Springer, 740– 755
work page 2014
- [32]
- [33]
-
[34]
Xiaohong Liu, Yaojie Liu, Jun Chen, and Xiaoming Liu. 2022. PSCC-Net: Pro- gressive spatio-channel correlation network for image manipulation detection and localization. IEEE Transactions on Circuits and Systems for Video Technology 32, 11 (2022), 7505–7517
work page 2022
-
[35]
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. RePaint: Inpainting using denoising diffusion proba- bilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 11461–11471
work page 2022
-
[36]
Xiaochen Ma, Bo Du, Zhuohang Jiang, Ahmed Y Al Hammadi, and Jizhe Zhou
-
[37]
arXiv preprint arXiv:2307.14863 (2023)
IML-ViT: Benchmarking Image Manipulation Localization by Vision Trans- former. arXiv preprint arXiv:2307.14863 (2023)
-
[38]
Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, et al . 2025. Imdl-benco: A comprehensive benchmark and codebase for image manipulation detection & localization. Advances in Neural Information Processing Systems 37 (2025), 134591–134613
work page 2025
-
[39]
Gaël Mahfoudi, Badr Tajini, Florent Retraint, Frederic Morain-Nicolier, Jean Luc Dugelay, and Marc Pic. 2019. Defacto: Image and face manipulation dataset. In 2019 27Th european signal processing conference (EUSIPCO) . IEEE, 1–5
work page 2019
-
[40]
Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2023. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 6038–6047
work page 2023
-
[41]
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[42]
Adam Novozamsky, Babak Mahdian, and Stanislav Saic. 2020. IMD2020: A large- scale annotated dataset tailored for detecting manipulated images. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops
work page 2020
- [43]
-
[44]
Gaurav Parmar, Richard Zhang, and Jun-Yan Zhu. 2022. On aliased resizing and surprising subtleties in gan evaluation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 11410–11420
work page 2022
-
[45]
Patrick Pérez, Michel Gangnet, and Andrew Blake. 2023. Poisson image editing. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2 . 577–582
work page 2023
-
[46]
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. Sdxl: Improving latent diffu- sion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[47]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen
-
[48]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022). Haozhen Yan, Jiahui Zhan, Yikun Ji, Yan Hong, Jun Lan, Huijia Zhu, Weiqiang Wang, and Jianfu Zhang
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[49]
Yuan Rao and Jiangqun Ni. 2016. A deep learning approach to detection of splicing and copy-move forgeries in images. In 2016 IEEE International Workshop on Information Forensics and Security (WIFS) . IEEE, Abu Dhabi, United Arab Emirates, 1–6. doi:10.1109/WIFS.2016.7823911
-
[50]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Mod- els. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695
work page 2022
-
[51]
Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. 2022. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings . 1–10
work page 2022
- [52]
-
[53]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[54]
Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. 2022. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2149–2159
work page 2022
-
[55]
Luisa Verdoliva. 2020. Media forensics and deepfakes: an overview. IEEE journal of selected topics in signal processing 14, 5 (2020), 910–932
work page 2020
-
[56]
Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, and Yu-Gang Jiang. 2022. ObjectFormer for Image Manipula- tion Detection and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 2364–2373
work page 2022
- [57]
-
[58]
Haiwei Wu and Jiantao Zhou. 2021. IID-Net: Image inpainting detection network via neural architecture search and attention. IEEE Transactions on Circuits and Systems for Video Technology 32, 3 (2021), 1172–1185
work page 2021
-
[59]
Yue Wu et al. 2019. ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 9535–9544. doi:10.1109/CVPR.2019.00977
-
[60]
Yue Wu, Wael AbdAlmageed, and Premkumar Natarajan. 2019. Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 9543–9552
work page 2019
-
[61]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and Efficient Design for Semantic Segmenta- tion with Transformers. In Neural Information Processing Systems (NeurIPS)
work page 2021
-
[62]
Shaoan Xie, Zhifei Zhang, Zhe Lin, Tobias Hinz, and Kun Zhang. 2023. Smart- brush: Text and shape guided object inpainting with diffusion model. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22428–22437
work page 2023
- [63]
- [64]
-
[65]
Shiyuan Yang, Xiaodong Chen, and Jing Liao. 2023. Uni-paint: A unified frame- work for multimodal image inpainting with pretrained diffusion model. In ACM International Conference on Multimedia (MM) . 3190–3199
work page 2023
- [66]
- [67]
- [68]
-
[69]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang
-
[70]
In Proceedings of the IEEE/CVF international conference on computer vision
Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision . 4471–4480
- [71]
-
[72]
Markos Zampoglou, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2015. Detecting image splicing in the wild (web). In 2015 IEEE international conference on multimedia & expo workshops (ICMEW) . IEEE, 1–6
work page 2015
- [73]
-
[74]
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene Parsing through ADE20K Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
work page 2017
-
[75]
Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, and Ishan Misra. 2022. Detecting twenty-thousand classes using image-level supervision. In Computer Vision–ECCV 2022: 17th European Conference, Tel A viv, Israel, October 23–27, 2022, Proceedings, Part IX . Springer, 350–368
work page 2022
-
[76]
Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, and Kai Chen. 2024. A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. In European Conference on Computer Vision. Springer, 195–211
work page 2024
-
[77]
Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.