Common Inpainted Objects In-N-Out of Context

Jin Sun; Ninghao Liu; Ruitong Sun; Tianze Yang; Tyson Jordan

arxiv: 2506.00721 · v2 · submitted 2025-05-31 · 💻 cs.CV · cs.LG

Common Inpainted Objects In-N-Out of Context

Tianze Yang , Tyson Jordan , Ruitong Sun , Ninghao Liu , Jin Sun This is my paper

Pith reviewed 2026-05-19 11:34 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords context reasoningout-of-context detectioninpaintingvision datasetscene understandingimage forensicsdiffusion modelsobject replacement

0 comments

The pith

A dataset of nearly 100,000 edited photos gives vision models clear examples of objects that fit or clash with their surroundings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper creates a large collection of edited photos where objects have been swapped in or out of their usual settings. The goal is to give computer vision systems better examples for understanding what belongs in a scene and what does not. By using AI image editing to change objects in standard scene photos, the authors build both matching and mismatched versions. They check each change with vision-language models to label them correctly. This setup supports new ways to train models on context, such as classifying fit, predicting suitable objects, and spotting edited images more reliably.

Core claim

By systematically replacing objects in COCO images through diffusion-based inpainting, the authors create 97,722 unique images featuring both contextually coherent and inconsistent scenes. Each inpainted object is verified and categorized as in- or out-of-context through large vision language model assessments. This controlled testbed enables three tasks: fine-grained context reasoning that classifies objects based on three criteria, a novel Objects-from-Context prediction task at instance and clique levels, and context-enhanced fake detection on existing methods without fine-tuning.

What carries the argument

Diffusion-based inpainting to replace objects in original scenes, combined with large vision language model verification to label each result as contextually fitting or inconsistent.

If this is right

Fine-grained classification becomes possible by scoring objects against three explicit context criteria.
Models can predict which new objects naturally belong in a given scene, both for single instances and groups sharing semantic relations.
Existing fake-image detectors gain accuracy when supplied with context signals from the dataset, without any additional training on the detectors themselves.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same editing approach could extend to video clips to study how context violations unfold over time.
Training on these examples might improve model performance on real-world tasks like spotting anomalies in surveillance footage.
The dataset offers a way to test whether context understanding helps models handle unusual but realistic scene variations beyond the training distribution.

Load-bearing premise

Large vision language models can reliably and accurately judge whether each inpainted object belongs in the surrounding scene or not.

What would settle it

A direct check would compare the large vision language model labels against human judgments on a random sample of the images, or measure whether models trained with these examples show clear gains on separate context-reasoning benchmarks.

Figures

Figures reproduced from arXiv: 2506.00721 by Jin Sun, Ninghao Liu, Ruitong Sun, Tianze Yang, Tyson Jordan.

**Figure 1.** Figure 1: Which object is fake? Only one object per image is inpainted. Out-of-context inpainted objects are easier to identify. Answers are revealed at the bottom of this page2 . Using Stable Diffusion’s inpainting model [9], we replace exactly one object per COCO image. This selective approach allows us to maintain the broader scene context while introducing precise, controlled variations in object-scene relation… view at source ↗

**Figure 2.** Figure 2: Our COinCO pipeline. (a) For a given COCO image, an object is randomly replaced by with Stable Diffusion inpainting. (b) Inpainting success is verified using object detection and an MLLM. Successes are added to the dataset, while fail cases are regenerated and retested. (c) Inpainted images are classified as in-context or out-of-context using the MLLM. (d) Instance-level (object category) and clique-level … view at source ↗

**Figure 3.** Figure 3: (a) Inpainting success rate for original-inpainted object pairs. Rows are classes of original [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Inpainting, fake detection, and objects-from-context results. Context reasoning responses [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Object-from-Context prediction. A red box is a query. The top row shows three examples [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Context enhancement results [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Fake localization performance. We evaluate this context-enhancement approach under two settings. In the oracle setting, we use ground truth annotations to enhance predictions within the fake object’s mask region when it is out-of-context. This serves as an upper bound for performance gains. In the practical setting, the fake objects are unknown. We propose the use of Molmo to identify suspicious objects … view at source ↗

read the original abstract

We present Common Inpainted Objects In-N-Out of Context (COinCO), a novel dataset addressing the scarcity of out-of-context examples in existing vision datasets. By systematically replacing objects in COCO images through diffusion-based inpainting, we create 97,722 unique images featuring both contextually coherent and inconsistent scenes, enabling effective context learning. Each inpainted object is meticulously verified and categorized as in- or out-of-context through Large Vision Language Model assessments. We demonstrate three key tasks enabled by COinCO: (1) a fine-grained context reasoning approach that classifies objects as in- or out-of-context based on three criteria; (2) a novel Objects-from-Context prediction task that determines which new objects naturally belong in given scenes at both instance and clique level semantics, and (3) context-enhanced fake detection on state-of-the-art methods without fine-tuning. COinCO provides a controlled testbed with contextual variations, establishing a foundation for advancing context-aware visual understanding in computer vision, including image forensics. Code and dataset are available at https://co-in-co.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a 98k-image dataset of diffusion-inpainted COCO scenes labeled in/out-of-context by LVLMs and demos three tasks, but the labels lack any reported human validation or error checks.

read the letter

The main thing to know is that this paper puts together a dataset of 97,722 images by systematically inpainting objects into COCO scenes with diffusion models, then uses LVLMs to mark each one as contextually coherent or inconsistent. They show it on three tasks: fine-grained in/out classification, predicting which objects fit a scene at instance and group levels, and boosting fake detection without extra training.

Referee Report

2 major / 3 minor

Summary. The paper introduces the COinCO dataset comprising 97,722 images generated by systematically inpainting objects from COCO scenes using diffusion models to produce both contextually coherent and inconsistent variants. Each inpainted object is categorized as in- or out-of-context via LVLM assessments, which the authors describe as meticulous verification. The dataset is then used to demonstrate three tasks: (1) fine-grained context reasoning that classifies objects according to three criteria, (2) Objects-from-Context prediction at instance and clique levels, and (3) context-enhanced fake detection on existing methods without fine-tuning.

Significance. A reliably labeled dataset of this scale could serve as a useful controlled testbed for context-aware vision models and image forensics. The construction method is straightforward and the three downstream tasks are well-motivated. However, the absence of any quantitative validation for the LVLM-generated labels means the central contribution rests on an unverified assumption; if that assumption does not hold, the utility of the entire resource and all reported experiments is undermined.

major comments (2)

[§3.2] §3.2 (Verification procedure): The manuscript asserts that 'each inpainted object is meticulously verified and categorized as in- or out-of-context through Large Vision Language Model assessments,' yet reports no human agreement metrics, inter-rater reliability scores, error analysis, or ablation on prompt sensitivity. Because every downstream task (context reasoning classifier, Objects-from-Context prediction, and fake-detection experiments) depends directly on these labels, the lack of validation constitutes a load-bearing gap.
[§4.1 and §4.2] §4.1 and §4.2 (Context reasoning and Objects-from-Context tasks): Both tasks treat the LVLM-derived in/out-of-context labels as ground truth for training and evaluation. Without reported label accuracy or noise analysis, the quantitative improvements claimed over baselines cannot be interpreted as evidence of improved context understanding; they may simply reflect propagation of LVLM biases.

minor comments (3)

[Abstract and §3.1] The abstract and §3.1 should explicitly state the total number of unique source COCO images used and the average number of inpainted objects per image to allow readers to assess diversity.
[Figure 2] Figure 2 (example images) would benefit from clearer annotation of which object was inpainted and its assigned context label; current captions are ambiguous.
[§3.2] A brief discussion of known LVLM failure modes on fine-grained scene semantics (e.g., object affordance or spatial relations) should be added to §3.2 to contextualize the verification choice.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our paper. We address the major concerns regarding the validation of the LVLM labels below.

read point-by-point responses

Referee: [§3.2] §3.2 (Verification procedure): The manuscript asserts that 'each inpainted object is meticulously verified and categorized as in- or out-of-context through Large Vision Language Model assessments,' yet reports no human agreement metrics, inter-rater reliability scores, error analysis, or ablation on prompt sensitivity. Because every downstream task (context reasoning classifier, Objects-from-Context prediction, and fake-detection experiments) depends directly on these labels, the lack of validation constitutes a load-bearing gap.

Authors: We agree that providing quantitative validation for the LVLM labels would strengthen the paper. The current version relies on the LVLM's capability for this task without reporting human agreement. In the revised manuscript, we will add a section with human evaluation on a random sample of 500 images, reporting agreement rates between LVLM and human annotators, along with an error analysis. We will also include an ablation study on different prompts to show sensitivity. revision: yes
Referee: [§4.1 and §4.2] §4.1 and §4.2 (Context reasoning and Objects-from-Context tasks): Both tasks treat the LVLM-derived in/out-of-context labels as ground truth for training and evaluation. Without reported label accuracy or noise analysis, the quantitative improvements claimed over baselines cannot be interpreted as evidence of improved context understanding; they may simply reflect propagation of LVLM biases.

Authors: We acknowledge this valid point. The improvements are demonstrated using the same label set for all methods, meaning that if there is bias, it affects baselines and our method similarly. However, to better address potential label noise, we will add a noise robustness analysis in the revision, such as injecting controlled noise into labels and observing performance. This will help interpret the results more carefully. revision: yes

Circularity Check

0 steps flagged

No circularity detected in dataset construction methodology

full rationale

The paper presents a dataset creation pipeline using diffusion inpainting on COCO images followed by LVLM-based categorization of in/out-of-context objects, then demonstrates three downstream tasks. No equations, fitted parameters, or derivations are present that reduce any claimed result to its own inputs by construction. The LVLM verification step is an empirical labeling process rather than a self-referential fit or self-citation chain. Self-citations, if any, are not load-bearing for the core contribution of the new dataset and tasks. The work is self-contained as a standard dataset paper with independent methodological content.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that diffusion inpainting produces sufficiently realistic images and that LVLM judgments provide reliable context labels without additional human validation.

axioms (2)

domain assumption Diffusion-based inpainting can generate images that preserve scene coherence when objects are contextually appropriate.
Invoked in the description of creating coherent and inconsistent scenes.
domain assumption Large Vision Language Models can accurately classify objects as in- or out-of-context.
Stated as the verification method for all 97,722 images.

pith-pipeline@v0.9.0 · 5726 in / 1271 out tokens · 41896 ms · 2026-05-19T11:34:43.105371+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By systematically replacing objects in COCO images through diffusion-based inpainting, we create 97,722 unique images... Each inpainted object is meticulously verified and categorized as in- or out-of-context through Large Vision Language Model assessments.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our context reasoning is based on three fundamental principles regarding context: location, size, and co-occurrence.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DiffusionPrint: Learning Generative Fingerprints for Diffusion-Based Inpainting Localization
cs.CV 2026-04 unverdicted novelty 7.0

DiffusionPrint learns robust forensic feature maps via MoCo-style contrastive training on diffusion inpainting fingerprints, boosting localization accuracy by up to 28% when fused into existing IFL systems and general...

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · cited by 1 Pith paper · 7 internal anchors

[1]

Visual objects in context

Moshe Bar. Visual objects in context. Nature Reviews Neuroscience, 5(8):617–629, 2004

work page 2004
[2]

Context models and out-of-context objects

Myung Jin Choi, Antonio Torralba, and Alan S Willsky. Context models and out-of-context objects. Pattern Recognition Letters, 33(7):853–862, 2012

work page 2012
[3]

Noise or signal: The role of image backgrounds in object recognition

Kai Xiao, Logan Engstrom, Andrew Ilyas, and Aleksander Madry. Noise or signal: The role of image backgrounds in object recognition. arXiv preprint arXiv:2006.09994, 2020

work page arXiv 2006
[4]

Scene perception: Detecting and judging objects undergoing relational violations

Irving Biederman, Robert J Mezzanotte, and Jan C Rabinowitz. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive psychology, 14(2):143–177, 1982

work page 1982
[5]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014

work page 2014
[6]

Coco-stuff: Thing and stuff classes in context

Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. Coco-stuff: Thing and stuff classes in context. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1209–1218, 2018

work page 2018
[7]

Lvis: A dataset for large vocabulary instance segmentation

Agrim Gupta, Piotr Dollar, and Ross Girshick. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5356–5364, 2019

work page 2019
[8]

Referitgame: Referring to objects in photographs of natural scenes

Sahar Kazemzadeh, Vicente Ordonez, Mark Matten, and Tamara Berg. Referitgame: Referring to objects in photographs of natural scenes. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 787–798, 2014

work page 2014
[9]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022
[10]

Whole- body human pose estimation in the wild

Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. Whole- body human pose estimation in the wild. In European Conference on Computer Vision, pages 196–214. Springer, 2020

work page 2020
[11]

Microsoft COCO Captions: Data Collection and Evaluation Server

Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[13]

Cd-coco: A versatile complex distorted coco database for scene-context-aware computer vision

Ayman Beghdadi, Azeddine Beghdadi, Malik Mallem, Lotfi Beji, and Faouzi Alaya Cheikh. Cd-coco: A versatile complex distorted coco database for scene-context-aware computer vision. In2023 11th European Workshop on Visual Information Processing (EUVIP), pages 1–6. IEEE, 2023

work page 2023
[14]

Coconut: Modernizing coco segmentation

Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, and Liang-Chieh Chen. Coconut: Modernizing coco segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21863–21873, 2024

work page 2024
[15]

Detecting out-of-context objects using graph context reasoning network

Manoj Acharya, Anirban Roy, Kaushik Koneripalli, Susmit Jha, Christopher Kanan, and Ajay Divakaran. Detecting out-of-context objects using graph context reasoning network. In IJCAI, 2022

work page 2022
[16]

Context understanding in computer vision: A survey

Xuan Wang and Zhigang Zhu. Context understanding in computer vision: A survey. Computer Vision and Image Understanding, 229:103646, 2023

work page 2023
[17]

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, et al. Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models. arXiv preprint arXiv:2409.17146, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

Visual instruction tuning, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning, 2023

work page 2023
[19]

Llava-next: Improved reasoning, ocr, and world knowledge, January 2024

Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. Llava-next: Improved reasoning, ocr, and world knowledge, January 2024. URL https://llava-vl.github.io/ blog/2024-01-30-llava-next/ . 10

work page 2024
[20]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Unified language- vision pretraining in llm with dynamic discrete visual tokenization

Yang Jin, Kun Xu, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Yadong Mu, et al. Unified language- vision pretraining in llm with dynamic discrete visual tokenization. In International Conference on Learning Representations, 2024

work page 2024
[22]

Image forgery detection

Hany Farid. Image forgery detection. IEEE Signal processing magazine, 26(2):16–25, 2009

work page 2009
[23]

A comprehensive framework for image inpainting

Aurélie Bugeau, Marcelo Bertalmío, Vicent Caselles, and Guillermo Sapiro. A comprehensive framework for image inpainting. IEEE transactions on image processing, 19(10):2634–2645, 2010

work page 2010
[24]

Generative adversarial networks

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Commun. ACM, 63(11):139–144, October

work page
[25]

[HBP23] Aamal Abbas Hussain, Francesco Belardinelli, and G eorgios Piliouras

ISSN 0001-0782. doi: 10.1145/3422622. URL https://doi.org/10.1145/3422622

work page doi:10.1145/3422622
[26]

Image-to-image translation with conditional adversarial networks

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017

work page 2017
[27]

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Andrew Brock. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4396–4405, 2019. doi: 10.1109/CVPR.2019.00453

work page doi:10.1109/cvpr.2019.00453 2019
[29]

Analyzing and improving the image quality of stylegan

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020

work page 2020
[30]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[31]

De-fake: Detection and attribution of fake images generated by text-to-image generation models

Zeyang Sha, Zheng Li, Ning Yu, and Yang Zhang. De-fake: Detection and attribution of fake images generated by text-to-image generation models. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 3418–3432, 2023

work page 2023
[32]

Zero-shot text-to-image generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. In International conference on machine learning , pages 8821–8831. Pmlr, 2021

work page 2021
[33]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3836–3847, October 2023

work page 2023
[34]

Anydoor: Zero-shot object-level im- age customization

Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, and Hengshuang Zhao. Anydoor: Zero-shot object-level image customization, 2024. URL https://arxiv.org/abs/2307.09481

work page arXiv 2024
[35]

Controlcom: Controllable image composition using diffusion model, 2023

Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, and Li Niu. Controlcom: Controllable image composition using diffusion model, 2023. URL https://arxiv.org/abs/2308. 10040

work page 2023
[36]

Zero-1-to-3: Zero-shot one image to 3d object, 2023

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object, 2023

work page 2023
[37]

Diffrelight: Diffusion-based facial performance relighting

Mingming He, Pascal Clausen, Ahmet Levent Ta¸ sel, Li Ma, Oliver Pilarski, Wenqi Xian, Laszlo Rikker, Xueming Yu, Ryan Burgert, Ning Yu, et al. Diffrelight: Diffusion-based facial performance relighting. arXiv preprint arXiv:2410.08188, 2024

work page arXiv 2024
[38]

Neural gaffer: Relighting any object via diffusion, 2024

Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, and Noah Snavely. Neural gaffer: Relighting any object via diffusion, 2024. URL https://arxiv.org/abs/2406.07520

work page arXiv 2024
[39]

Deep learning based computer generated face identification using convolutional neural network.Applied Sciences, 8(12):2610, 2018

L Minh Dang, Syed Ibrahim Hassan, Suhyeon Im, Jaecheol Lee, Sujin Lee, and Hyeonjoon Moon. Deep learning based computer generated face identification using convolutional neural network.Applied Sciences, 8(12):2610, 2018. 11

work page 2018
[40]

Fake faces identification via convolutional neural network

Huaxiao Mo, Bolin Chen, and Weiqi Luo. Fake faces identification via convolutional neural network. In Proceedings of the 6th ACM workshop on information hiding and multimedia security, pages 43–47, 2018

work page 2018
[41]

Pscc-net: Progressive spatio-channel correlation network for image manipulation detection and localization

Xiaohong Liu, Yaojie Liu, Jun Chen, and Xiaoming Liu. Pscc-net: Progressive spatio-channel correlation network for image manipulation detection and localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(11):7505–7517, 2022

work page 2022
[42]

Learning jpeg compression artifacts for image manipulation detection and localization.International Journal of Computer Vision, 130(8):1875–1895, August 2022

Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung-Kyu Lee, and Changick Kim. Learning jpeg compression artifacts for image manipulation detection and localization.International Journal of Computer Vision, 130(8):1875–1895, August 2022. doi: 10.1007/s11263-022-01617-5

work page doi:10.1007/s11263-022-01617-5 2022
[43]

Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features

Yue Wu, Wael AbdAlmageed, and Premkumar Natarajan. Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9543–9552, 2019

work page 2019
[44]

Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization

Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20606–20615, 2023

work page 2023
[45]

Genimage: A million-scale benchmark for detecting ai-generated image, 2023

Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million-scale benchmark for detecting ai-generated image, 2023

work page 2023
[46]

Bird and Ahmad Lotfi

Jordan J. Bird and Ahmad Lotfi. Cifake: Image classification and explainable identification of ai-generated synthetic images, 2023. URL https://arxiv.org/abs/2303.14126

work page arXiv 2023
[47]

Tgif: Text-guided inpainting forgery dataset

Hannes Mareen, Dimitrios Karageorgiou, Glenn Van Wallendael, Peter Lambert, and Symeon Papadopou- los. Tgif: Text-guided inpainting forgery dataset. In 2024 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6. IEEE, 2024

work page 2024
[48]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022

work page 2022
[49]

Ultralytics yolov8, 2023

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023. URL https://github.com/ ultralytics/ultralytics

work page 2023
[50]

MMDetection: Open MMLab Detection Toolbox and Benchmark

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. MMDetection: Open mmlab detection toolbox and b...

work page internal anchor Pith review Pith/arXiv arXiv 1906
[51]

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[52]

Bert: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[53]

Contextual priming for object detection

Antonio Torralba. Contextual priming for object detection. International journal of computer vision, 53: 169–191, 2003

work page 2003
[54]

Object co-occurrence serves as a contextual cue to guide and facilitate visual search in a natural viewing environment

Stephen C Mack and Miguel P Eckstein. Object co-occurrence serves as a contextual cue to guide and facilitate visual search in a natural viewing environment. Journal of vision, 11(9):9–9, 2011

work page 2011
[55]

Deep learning for anomaly detection: A review

Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den Hengel. Deep learning for anomaly detection: A review. ACM computing surveys (CSUR), 54(2):1–38, 2021. 12

work page 2021

[1] [1]

Visual objects in context

Moshe Bar. Visual objects in context. Nature Reviews Neuroscience, 5(8):617–629, 2004

work page 2004

[2] [2]

Context models and out-of-context objects

Myung Jin Choi, Antonio Torralba, and Alan S Willsky. Context models and out-of-context objects. Pattern Recognition Letters, 33(7):853–862, 2012

work page 2012

[3] [3]

Noise or signal: The role of image backgrounds in object recognition

Kai Xiao, Logan Engstrom, Andrew Ilyas, and Aleksander Madry. Noise or signal: The role of image backgrounds in object recognition. arXiv preprint arXiv:2006.09994, 2020

work page arXiv 2006

[4] [4]

Scene perception: Detecting and judging objects undergoing relational violations

Irving Biederman, Robert J Mezzanotte, and Jan C Rabinowitz. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive psychology, 14(2):143–177, 1982

work page 1982

[5] [5]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014

work page 2014

[6] [6]

Coco-stuff: Thing and stuff classes in context

Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. Coco-stuff: Thing and stuff classes in context. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1209–1218, 2018

work page 2018

[7] [7]

Lvis: A dataset for large vocabulary instance segmentation

Agrim Gupta, Piotr Dollar, and Ross Girshick. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5356–5364, 2019

work page 2019

[8] [8]

Referitgame: Referring to objects in photographs of natural scenes

Sahar Kazemzadeh, Vicente Ordonez, Mark Matten, and Tamara Berg. Referitgame: Referring to objects in photographs of natural scenes. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 787–798, 2014

work page 2014

[9] [9]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022

[10] [10]

Whole- body human pose estimation in the wild

Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. Whole- body human pose estimation in the wild. In European Conference on Computer Vision, pages 196–214. Springer, 2020

work page 2020

[11] [11]

Microsoft COCO Captions: Data Collection and Evaluation Server

Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[12] [12]

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[13] [13]

Cd-coco: A versatile complex distorted coco database for scene-context-aware computer vision

Ayman Beghdadi, Azeddine Beghdadi, Malik Mallem, Lotfi Beji, and Faouzi Alaya Cheikh. Cd-coco: A versatile complex distorted coco database for scene-context-aware computer vision. In2023 11th European Workshop on Visual Information Processing (EUVIP), pages 1–6. IEEE, 2023

work page 2023

[14] [14]

Coconut: Modernizing coco segmentation

Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, and Liang-Chieh Chen. Coconut: Modernizing coco segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21863–21873, 2024

work page 2024

[15] [15]

Detecting out-of-context objects using graph context reasoning network

Manoj Acharya, Anirban Roy, Kaushik Koneripalli, Susmit Jha, Christopher Kanan, and Ajay Divakaran. Detecting out-of-context objects using graph context reasoning network. In IJCAI, 2022

work page 2022

[16] [16]

Context understanding in computer vision: A survey

Xuan Wang and Zhigang Zhu. Context understanding in computer vision: A survey. Computer Vision and Image Understanding, 229:103646, 2023

work page 2023

[17] [17]

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, et al. Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models. arXiv preprint arXiv:2409.17146, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [18]

Visual instruction tuning, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning, 2023

work page 2023

[19] [19]

Llava-next: Improved reasoning, ocr, and world knowledge, January 2024

Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. Llava-next: Improved reasoning, ocr, and world knowledge, January 2024. URL https://llava-vl.github.io/ blog/2024-01-30-llava-next/ . 10

work page 2024

[20] [20]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

Unified language- vision pretraining in llm with dynamic discrete visual tokenization

Yang Jin, Kun Xu, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Yadong Mu, et al. Unified language- vision pretraining in llm with dynamic discrete visual tokenization. In International Conference on Learning Representations, 2024

work page 2024

[22] [22]

Image forgery detection

Hany Farid. Image forgery detection. IEEE Signal processing magazine, 26(2):16–25, 2009

work page 2009

[23] [23]

A comprehensive framework for image inpainting

Aurélie Bugeau, Marcelo Bertalmío, Vicent Caselles, and Guillermo Sapiro. A comprehensive framework for image inpainting. IEEE transactions on image processing, 19(10):2634–2645, 2010

work page 2010

[24] [24]

Generative adversarial networks

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Commun. ACM, 63(11):139–144, October

work page

[25] [25]

[HBP23] Aamal Abbas Hussain, Francesco Belardinelli, and G eorgios Piliouras

ISSN 0001-0782. doi: 10.1145/3422622. URL https://doi.org/10.1145/3422622

work page doi:10.1145/3422622

[26] [26]

Image-to-image translation with conditional adversarial networks

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017

work page 2017

[27] [27]

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Andrew Brock. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4396–4405, 2019. doi: 10.1109/CVPR.2019.00453

work page doi:10.1109/cvpr.2019.00453 2019

[29] [29]

Analyzing and improving the image quality of stylegan

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020

work page 2020

[30] [30]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020

[31] [31]

De-fake: Detection and attribution of fake images generated by text-to-image generation models

Zeyang Sha, Zheng Li, Ning Yu, and Yang Zhang. De-fake: Detection and attribution of fake images generated by text-to-image generation models. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 3418–3432, 2023

work page 2023

[32] [32]

Zero-shot text-to-image generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. In International conference on machine learning , pages 8821–8831. Pmlr, 2021

work page 2021

[33] [33]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3836–3847, October 2023

work page 2023

[34] [34]

Anydoor: Zero-shot object-level im- age customization

Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, and Hengshuang Zhao. Anydoor: Zero-shot object-level image customization, 2024. URL https://arxiv.org/abs/2307.09481

work page arXiv 2024

[35] [35]

Controlcom: Controllable image composition using diffusion model, 2023

Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, and Li Niu. Controlcom: Controllable image composition using diffusion model, 2023. URL https://arxiv.org/abs/2308. 10040

work page 2023

[36] [36]

Zero-1-to-3: Zero-shot one image to 3d object, 2023

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object, 2023

work page 2023

[37] [37]

Diffrelight: Diffusion-based facial performance relighting

Mingming He, Pascal Clausen, Ahmet Levent Ta¸ sel, Li Ma, Oliver Pilarski, Wenqi Xian, Laszlo Rikker, Xueming Yu, Ryan Burgert, Ning Yu, et al. Diffrelight: Diffusion-based facial performance relighting. arXiv preprint arXiv:2410.08188, 2024

work page arXiv 2024

[38] [38]

Neural gaffer: Relighting any object via diffusion, 2024

Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, and Noah Snavely. Neural gaffer: Relighting any object via diffusion, 2024. URL https://arxiv.org/abs/2406.07520

work page arXiv 2024

[39] [39]

Deep learning based computer generated face identification using convolutional neural network.Applied Sciences, 8(12):2610, 2018

L Minh Dang, Syed Ibrahim Hassan, Suhyeon Im, Jaecheol Lee, Sujin Lee, and Hyeonjoon Moon. Deep learning based computer generated face identification using convolutional neural network.Applied Sciences, 8(12):2610, 2018. 11

work page 2018

[40] [40]

Fake faces identification via convolutional neural network

Huaxiao Mo, Bolin Chen, and Weiqi Luo. Fake faces identification via convolutional neural network. In Proceedings of the 6th ACM workshop on information hiding and multimedia security, pages 43–47, 2018

work page 2018

[41] [41]

Pscc-net: Progressive spatio-channel correlation network for image manipulation detection and localization

Xiaohong Liu, Yaojie Liu, Jun Chen, and Xiaoming Liu. Pscc-net: Progressive spatio-channel correlation network for image manipulation detection and localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(11):7505–7517, 2022

work page 2022

[42] [42]

Learning jpeg compression artifacts for image manipulation detection and localization.International Journal of Computer Vision, 130(8):1875–1895, August 2022

Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung-Kyu Lee, and Changick Kim. Learning jpeg compression artifacts for image manipulation detection and localization.International Journal of Computer Vision, 130(8):1875–1895, August 2022. doi: 10.1007/s11263-022-01617-5

work page doi:10.1007/s11263-022-01617-5 2022

[43] [43]

Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features

Yue Wu, Wael AbdAlmageed, and Premkumar Natarajan. Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9543–9552, 2019

work page 2019

[44] [44]

Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization

Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20606–20615, 2023

work page 2023

[45] [45]

Genimage: A million-scale benchmark for detecting ai-generated image, 2023

Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million-scale benchmark for detecting ai-generated image, 2023

work page 2023

[46] [46]

Bird and Ahmad Lotfi

Jordan J. Bird and Ahmad Lotfi. Cifake: Image classification and explainable identification of ai-generated synthetic images, 2023. URL https://arxiv.org/abs/2303.14126

work page arXiv 2023

[47] [47]

Tgif: Text-guided inpainting forgery dataset

Hannes Mareen, Dimitrios Karageorgiou, Glenn Van Wallendael, Peter Lambert, and Symeon Papadopou- los. Tgif: Text-guided inpainting forgery dataset. In 2024 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6. IEEE, 2024

work page 2024

[48] [48]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022

work page 2022

[49] [49]

Ultralytics yolov8, 2023

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023. URL https://github.com/ ultralytics/ultralytics

work page 2023

[50] [50]

MMDetection: Open MMLab Detection Toolbox and Benchmark

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. MMDetection: Open mmlab detection toolbox and b...

work page internal anchor Pith review Pith/arXiv arXiv 1906

[51] [51]

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[52] [52]

Bert: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019

[53] [53]

Contextual priming for object detection

Antonio Torralba. Contextual priming for object detection. International journal of computer vision, 53: 169–191, 2003

work page 2003

[54] [54]

Object co-occurrence serves as a contextual cue to guide and facilitate visual search in a natural viewing environment

Stephen C Mack and Miguel P Eckstein. Object co-occurrence serves as a contextual cue to guide and facilitate visual search in a natural viewing environment. Journal of vision, 11(9):9–9, 2011

work page 2011

[55] [55]

Deep learning for anomaly detection: A review

Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den Hengel. Deep learning for anomaly detection: A review. ACM computing surveys (CSUR), 54(2):1–38, 2021. 12

work page 2021