arxiv: 2507.02314 · v6 · submitted 2025-07-03 · 💻 cs.CV · cs.AI

MAGIC: Few-Shot Mask-Guided Anomaly Inpainting with Prompt Perturbation, Spatially Adaptive Guidance, and Context Awareness

JaeHyuck Choi , Minjun Kim , Je Hyeong Hong This is my paper

Pith reviewed 2026-05-19 05:51 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords few-shot anomaly generationdiffusion inpaintingmask-guided synthesisprompt perturbationadaptive guidancecontext-aware alignmentindustrial anomaly detectionsynthetic defect generation

0 comments p. Extension

The pith

MAGIC generates high-fidelity anomalies that strictly follow given masks while maximizing in-distribution diversity through prompt perturbation, adaptive guidance, and mask alignment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MAGIC to address the difficulty of creating useful anomaly examples when only a few real defects are available for industrial quality control. Existing global prompt methods tend to alter normal image regions, and prior inpainting approaches often produce anomalies that lack the variety needed to train strong detectors. MAGIC fine-tunes a diffusion inpainting model and adds three targeted components to keep changes inside the mask while sampling from a smooth space of realistic anomalies. A sympathetic reader would care because higher-quality synthetic defects could improve training of downstream detection systems without requiring large collections of labeled failures. If correct, the result would be measurable gains in detection accuracy across multiple industrial datasets under matched evaluation conditions.

Core claim

MAGIC is a fine-tuned inpainting framework that generates high-fidelity anomalies adhering strictly to the mask while maximizing in-distribution diversity. It achieves this by learning and sampling from a smooth manifold of realistic anomalies via Gaussian prompt perturbation, applying distinct guidance strengths to anomaly versus background regions through spatially adaptive guidance, and relocating masks for plausible placement via context-aware mask alignment. Under consistent identical evaluation protocol, this combination outperforms state-of-the-art methods on diverse anomaly datasets in downstream tasks.

What carries the argument

The MAGIC fine-tuned inpainting diffusion framework that combines Gaussian prompt perturbation, spatially adaptive guidance, and context-aware mask alignment.

If this is right

Downstream anomaly detection models achieve higher performance when trained using MAGIC-generated data instead of data from earlier methods.
Generated anomalies remain confined to the supplied mask and leave surrounding normal regions unchanged.
The framework supports effective anomaly creation even when only a small number of real anomaly examples are available for fine-tuning.
Consistent gains appear across multiple distinct industrial anomaly datasets when evaluation conditions are held identical.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same combination of perturbation and region-specific guidance could be tested for controlled object insertion or defect simulation in non-industrial image domains such as medical or satellite imagery.
Learning a smooth manifold of anomalies may reduce overfitting risks that appear when diffusion models are fine-tuned on very small sets of real defects.
The mask-alignment step suggests a general route for making synthetic data respect object geometry, which could be measured by checking whether generated anomalies respect expected lighting and texture continuity at mask boundaries.

Load-bearing premise

The three components together produce anomalies that are both strictly mask-adherent and maximally in-distribution diverse without corrupting normal regions.

What would settle it

A controlled replication under the same evaluation protocol in which models trained on MAGIC-generated anomalies show no accuracy improvement over those trained on data from prior state-of-the-art methods.

Figures

Figures reproduced from arXiv: 2507.02314 by JaeHyuck Choi, Je Hyeong Hong, Minjun Kim.

**Figure 2.** Figure 2: Framework overview. A fixed anomaly prompt P (e.g., “sks”) undergoes Gaussian prompt perturbation (GPP) to create cp during both training and inference with the aim of diversifying global-scale anomaly texture appearance without compromising realism. In training, an inpainting diffusion backbone receives an anomaly image IA, ground-truth mask MGT, and masked input IM A , all conditioned on (perturbed) cp, … view at source ↗

**Figure 3.** Figure 3: Effect of Gaussian Prompt Perturbation (GPP) shown on two different noise seeds. Applying GPP at both training and test times broadens the global textural appearance of anomalies while avoiding unrealistic textures. a ↑ a = 0.0 a = 0.2 a = 0.4 a = 0.6 a = 0.8 a = 1.0 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: An overview of context-aware mask alignment (CAMA). After sampling an anomaly [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison of generated anomalies from our model and existing anomaly [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Few-shot anomaly generation is a key challenge in industrial quality control. Although diffusion models are promising, existing methods struggle: global prompt-guided approaches corrupt normal regions, and existing inpainting-based methods often lack the in-distribution diversity essential for robust downstream models. We propose MAGIC, a fine-tuned inpainting framework that generates high-fidelity anomalies that strictly adhere to the mask while maximizing this diversity. MAGIC introduces three complementary components: (i) Gaussian prompt perturbation, which prevents model overfitting in the few-shot setting by learning and sampling from a smooth manifold of realistic anomalies, (ii) spatially adaptive guidance that applies distinct guidance strengths to the anomaly and background regions, and (iii) context-aware mask alignment to relocate masks for plausible placement within the host object. Under consistent identical evaluation protocol, MAGIC outperforms state-of-the-art methods on diverse anomaly datasets in downstream tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MAGIC combines three targeted tweaks to few-shot inpainting for anomaly generation but the abstract supplies no numbers, leaving the outperformance claim unverified and the evaluation protocol open to question.

read the letter

The main contribution is a fine-tuned inpainting diffusion model that adds Gaussian prompt perturbation to sample a smoother manifold of anomalies, spatially adaptive guidance that treats the defect region differently from the background, and context-aware mask relocation to put defects in more natural spots on the object. These pieces are meant to generate diverse yet mask-adherent anomalies without corrupting normal areas, which directly targets the data scarcity problem in industrial quality control. The joint design is not described as prior art in the abstract, so the specific integration counts as new. If the experiments hold, the approach could help downstream detectors by producing more useful synthetic defects than global prompting or basic inpainting. The motivation is clear and the components are straightforward engineering choices that address real failure modes. The soft spots are the complete absence of quantitative results, ablations, dataset details, or error bars in the abstract, so the claim of outperforming state-of-the-art methods under a consistent protocol rests on unshown evidence. The mask relocation also needs scrutiny: if baselines keep masks in their original positions while MAGIC moves them to plausible locations, the protocol is no longer identical and any gains could partly reflect easier placement rather than better generation fidelity. The full paper would have to show that all methods were evaluated the same way. This is aimed at computer vision researchers working on anomaly detection and synthetic data for manufacturing. A reader who needs practical few-shot techniques for diffusion models would get value from the design details once the results are checked. It deserves peer review because the problem is real and the components are clearly motivated, even if the experiments require strengthening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes MAGIC, a fine-tuned diffusion-based inpainting framework for few-shot anomaly generation in industrial images. It introduces three components—Gaussian prompt perturbation to sample from a smooth manifold of anomalies and avoid overfitting, spatially adaptive guidance with distinct strengths for anomaly versus background regions, and context-aware mask alignment that relocates masks to plausible positions within the host object—to produce high-fidelity anomalies that strictly adhere to the mask while maximizing in-distribution diversity. The central claim is that, under a consistent identical evaluation protocol, MAGIC outperforms state-of-the-art methods on diverse anomaly datasets when used for downstream tasks.

Significance. If the outperformance claims hold under a truly identical protocol, the work would offer a practical advance for few-shot anomaly synthesis in quality control, addressing limitations of global prompt guidance (normal-region corruption) and prior inpainting methods (insufficient diversity). The design choices are motivated and the emphasis on mask adherence plus diversity is a clear strength; reproducible code or parameter-free derivations are not mentioned but would further strengthen the contribution if present.

major comments (2)

[Abstract and §3] Abstract and §3 (method description): The central claim of outperformance 'under consistent identical evaluation protocol' is load-bearing. Component (iii) explicitly relocates masks 'for plausible placement within the host object.' If baselines were evaluated on the original (unrelocated) masks while MAGIC uses relocated ones, the generated anomalies occupy different spatial contexts; downstream gains could then stem from easier placement rather than superior inpainting fidelity or diversity. This must be clarified with explicit confirmation that all methods were re-evaluated under the same relocated-mask protocol, or the comparison re-run accordingly.
[§4] §4 (experiments): The abstract asserts quantitative outperformance on downstream tasks, yet the soundness assessment notes the absence of specific metrics, ablation studies, error bars, or dataset details in the provided text. To support the claim that the three components together produce strictly mask-adherent and maximally diverse anomalies, the manuscript must include ablations isolating each component's contribution and statistical significance tests under the identical protocol.

minor comments (2)

[§3] Notation for the Gaussian prompt perturbation scale and spatially adaptive guidance strengths should be defined explicitly with symbols and ranges in §3 to aid reproducibility.
[Figures] Figure captions for qualitative results should explicitly state whether masks shown are original or relocated to avoid reader confusion about the protocol.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and have revised the paper accordingly to strengthen the presentation of our evaluation protocol and experimental results.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method description): The central claim of outperformance 'under consistent identical evaluation protocol' is load-bearing. Component (iii) explicitly relocates masks 'for plausible placement within the host object.' If baselines were evaluated on the original (unrelocated) masks while MAGIC uses relocated ones, the generated anomalies occupy different spatial contexts; downstream gains could then stem from easier placement rather than superior inpainting fidelity or diversity. This must be clarified with explicit confirmation that all methods were re-evaluated under the same relocated-mask protocol, or the comparison re-run accordingly.

Authors: We thank the referee for raising this critical point on protocol consistency. All baseline methods were indeed re-evaluated under the identical protocol that incorporates the context-aware mask alignment step, ensuring that masks are relocated to plausible positions within the host object for every method. This was done to isolate the contributions of inpainting fidelity and diversity rather than mask placement. We have added explicit confirmation of this uniform protocol in the revised abstract, Section 3, and a new paragraph in Section 4.1, along with pseudocode in the supplement describing the shared preprocessing. revision: yes
Referee: [§4] §4 (experiments): The abstract asserts quantitative outperformance on downstream tasks, yet the soundness assessment notes the absence of specific metrics, ablation studies, error bars, or dataset details in the provided text. To support the claim that the three components together produce strictly mask-adherent and maximally diverse anomalies, the manuscript must include ablations isolating each component's contribution and statistical significance tests under the identical protocol.

Authors: We agree that the experimental section requires expansion to fully support the claims. The revised manuscript now includes: specific quantitative metrics (e.g., AUROC, AUPRC) for the downstream anomaly detection tasks; comprehensive ablations in Section 4.3 that isolate the contribution of Gaussian prompt perturbation, spatially adaptive guidance, and context-aware mask alignment individually and in combination; error bars computed over five independent runs with different random seeds; and statistical significance testing (paired t-tests with p-values reported) under the identical evaluation protocol. Dataset details, including image counts and anomaly types, have also been expanded in Section 4.1. revision: yes

Circularity Check

0 steps flagged

No circularity: method components are independent design choices

full rationale

The paper presents MAGIC as a fine-tuned inpainting framework whose three components—Gaussian prompt perturbation, spatially adaptive guidance, and context-aware mask alignment—are introduced as complementary, independent modifications to achieve mask-adherent yet diverse anomalies. No equations, fitted parameters, or derivation steps are shown that reduce the claimed diversity, fidelity, or downstream outperformance to quantities defined by the inputs themselves. The outperformance claim rests on empirical evaluation under a stated consistent protocol rather than any self-referential construction or self-citation chain, rendering the central argument self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework assumes standard diffusion inpainting fine-tuning works for anomaly synthesis and introduces ad-hoc mechanisms to address few-shot overfitting and mask plausibility.

free parameters (2)

Gaussian prompt perturbation scale
Controls sampling from the manifold of realistic anomalies; value chosen to balance diversity and fidelity in the few-shot regime.
Spatially adaptive guidance strengths
Separate strengths applied to anomaly versus background regions; values selected to prevent corruption of normal areas while enforcing mask adherence.

axioms (2)

domain assumption Fine-tuned diffusion inpainting models can generate anomalies that strictly respect a binary mask.
Invoked as the base capability of the proposed framework.
ad hoc to paper Prompt perturbation prevents overfitting when only a few anomaly examples are available.
Introduced specifically to address the few-shot setting.

pith-pipeline@v0.9.0 · 5689 in / 1317 out tokens · 44156 ms · 2026-05-19T05:51:59.677738+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MAGIC introduces three complementary components: (i) Gaussian prompt perturbation... (ii) spatially adaptive guidance... (iii) context-aware mask alignment to relocate masks...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose MAGIC... fine-tunes a Stable Diffusion inpainting backbone...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

One-to-More: High-Fidelity Training-Free Anomaly Generation with Attention Control
cs.CV 2026-03 unverdicted novelty 6.0

O2MAG generates high-fidelity text-guided anomalies from a single image without training by manipulating self-attention in diffusion models with anomaly masks and dual enhancements.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 6 internal anchors

[1]

Anomalycontrol: Few-shot anomaly generation by controlnet inpainting

Musawar Ali, Nicola Fioraio, Samuele Salti, and Luigi Di Stefano. Anomalycontrol: Few-shot anomaly generation by controlnet inpainting. IEEE Access, 12:192903–192914, 2024

work page 2024
[2]

Mvtec ad–a comprehensive real- world dataset for unsupervised anomaly detection

Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Mvtec ad–a comprehensive real- world dataset for unsupervised anomaly detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9592–9600, 2019

work page 2019
[3]

Demystifying MMD GANs

Mikolaj Binkowski, Danica J. Sutherland, Michal Arbel, and Arthur Gretton. Demystifying mmd gans. ArXiv, abs/1801.01401, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Describing textures in the wild

Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014

work page 2014
[5]

Hong, Li Niu, and Liqing Zhang

Yuxuan Duan, Y . Hong, Li Niu, and Liqing Zhang. Few-shot defect image generation via defect-aware feature manipulation. In AAAI Conference on Artificial Intelligence, 2023

work page 2023
[6]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. ArXiv, abs/2208.01618, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

Few-shot anomaly-driven generation for anomaly classification and segmentation

Guan Gui, Bin-Bin Gao, Jun Liu, Chengjie Wang, and Yunsheng Wu. Few-shot anomaly-driven generation for anomaly classification and segmentation. In European Conference on Computer Vision (ECCV), 2024

work page 2024
[8]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. doi: 10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016
[9]

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Günter Klambauer, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a nash equilibrium. ArXiv, abs/1706.08500, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

Anomalydiffusion: Few-shot anomaly image generation with diffusion model

Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, and Chengjie Wang. Anomalydiffusion: Few-shot anomaly image generation with diffusion model. In AAAI Conference on Artificial Intelligence, 2023

work page 2023
[11]

Dual-interrelated diffusion model for few-shot anomaly image generation

Ying Jin, Jinlong Peng, Qingdong He, Teng Hu, Jiafu Wu, Hao Chen, Haoxuan Wang, Wenbing Zhu, Mingmin Chi, Jun Liu, and Yabiao Wang. Dual-interrelated diffusion model for few-shot anomaly image generation. arXiv preprint arXiv:2408.13509, 2024

work page arXiv 2024
[12]

Training generative adversarial networks with limited data

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33:12104–12114, 2020

work page 2020
[13]

Analyzing and improving the image quality of stylegan

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8107–8116, 2020

work page 2020
[14]

On Convergence and Stability of GANs

Naveen Kodali, Jacob D. Abernethy, James Hays, and Zsolt Kira. How to train your dragan. ArXiv, abs/1705.07215, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Multi-concept customization of text-to-image diffusion

Nupur Kumari, Bin Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. Multi-concept customization of text-to-image diffusion. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1931–1941, 2022. 10

work page 1931
[16]

Cutpaste: Self-supervised learning for anomaly detection and localization

Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister. Cutpaste: Self-supervised learning for anomaly detection and localization. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9659–9669, 2021

work page 2021
[17]

Few-shot defect segmentation leveraging abundant normal training samples through normal background regularization and crop-and-paste operation

Dongyun Lin, Yanpeng Cao, Wenbin Zhu, and Yiqun Li. Few-shot defect segmentation leveraging abundant normal training samples through normal background regularization and crop-and-paste operation. ArXiv, abs/2007.09438, 2020

work page arXiv 2007
[18]

A comprehensive augmentation framework for anomaly detection

Jiang Lin and Yaping Yan. A comprehensive augmentation framework for anomaly detection. In AAAI Conference on Artificial Intelligence, volume 38, pages 8742–8749, 2024

work page 2024
[19]

Pd-gan: Probabilistic diverse gan for image inpainting

Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, and Jing Liao. Pd-gan: Probabilistic diverse gan for image inpainting. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9367–9376, 2021

work page 2021
[20]

Mescheder, Andreas Geiger, and Sebastian Nowozin

Lars M. Mescheder, Andreas Geiger, and Sebastian Nowozin. Which training methods for gans do actually converge? In International Conference on Machine Learning, 2018

work page 2018
[21]

Glide: Towards photorealistic image generation and editing with text-guided diffusion models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, 2021

work page 2021
[22]

Defect image sample generation with gan for improving defect recognition

Shuanlong Niu, Bin Li, Xinggang Wang, and Hui Lin. Defect image sample generation with gan for improving defect recognition. IEEE Transactions on Automation Science and Engineering, PP:1–12, 02 2020

work page 2020
[23]

Efros, Yong Jae Lee, Eli Shechtman, and Richard Zhang

Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, and Richard Zhang. Few-shot image generation via cross-domain correspondence. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10738–10747, 2021

work page 2021
[24]

Maxime Oquab, Timothée Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Russell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang-Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nicolas Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patrick Laba...

work page 2023
[25]

U2-net: Going deeper with nested u-structure for salient object detection

Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R Zaiane, and Martin Jägersand. U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recognition, 106:107404, 2020

work page 2020
[26]

Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer

Robin Rombach, A. Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, 2021

work page 2021
[27]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi, editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, Cham, 2015. Springer International Publishing. ISBN 978-3-319-24574-4

work page 2015
[28]

Towards total recall in industrial anomaly detection

Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Scholkopf, Thomas Brox, and Peter Gehler. Towards total recall in industrial anomaly detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14298–14308, 2022

work page 2022
[29]

Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10694, 2023

work page 2023
[30]

Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, and Romann M. Weber. Cads: Unleashing the diversity of diffusion models through condition-annealed sampling. ArXiv, abs/2310.17347, 2023

work page arXiv 2023
[31]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Seyed Kam- yar Seyed Ghasemipour, Burcu Karagol Ayan, Seyedeh Sara Mahdavi, Raphael Gontijo Lopes, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. ArXiv, abs/2205.11487, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[32]

Defectfill: Realistic defect generation with inpainting diffusion model for visual inspection

Jaewoo Song, Daemin Park, Kanghyun Baek, Sangyub Lee, Jooyoung Choi, Eunji Kim, and Sungroh Yoon. Defectfill: Realistic defect generation with inpainting diffusion model for visual inspection. ArXiv, abs/2503.13985, 2025. 11

work page arXiv 2025
[33]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv:2010.02502, October 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[34]

Diverse image inpainting with normalizing flow

Cairong Wang, Yiming Zhu, and Chun Yuan. Diverse image inpainting with normalizing flow. InEuropean Conference on Computer Vision, 2022

work page 2022
[35]

Mask-guided generation method for industrial de- fect images with non-uniform structures

Jing Wei, Zhengtao Zhang, Fei Shen, and Chengkan Lv. Mask-guided generation method for industrial de- fect images with non-uniform structures. Machines, 10(12):1239, 2022. doi: 10.3390/machines10121239

work page doi:10.3390/machines10121239 2022
[36]

Draem – a discriminatively trained reconstruction embedding for surface anomaly detection

Vitjan Zavrtanik, Matej Kristan, and Danijel Skocaj. Draem – a discriminatively trained reconstruction embedding for surface anomaly detection. IEEE/CVF International Conference on Computer Vision (ICCV), pages 8310–8319, 2021

work page 2021
[37]

Defect-gan: High-fidelity defect synthesis for automated defect inspection

Gongjie Zhang, Kaiwen Cui, Tzu-Yi Hung, and Shijian Lu. Defect-gan: High-fidelity defect synthesis for automated defect inspection. IEEE Winter Conference on Applications of Computer Vision (WACV), pages 2523–2533, 2021

work page 2021
[38]

Wang, Zhineng Chen, and Yu-Gang Jiang

Hui Min Zhang, Zuxuan Wu, Z. Wang, Zhineng Chen, and Yu-Gang Jiang. Prototypical residual networks for anomaly detection and localization.IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16281–16291, 2022

work page 2022
[39]

Telling left from right: Identifying geometry-aware semantic correspondence

Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, and Ming-Hsuan Yang. Telling left from right: Identifying geometry-aware semantic correspondence. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

work page 2024
[40]

What makes a good data augmentation for few-shot unsupervised image anomaly detection? In IEEE/CVF conference on computer vision and pattern recognition, pages 4345–4354, 2023

Lingrui Zhang, Shuheng Zhang, Guoyang Xie, Jiaqi Liu, Hua Yan, Jinbao Wang, Feng Zheng, and Yaochu Jin. What makes a good data augmentation for few-shot unsupervised image anomaly detection? In IEEE/CVF conference on computer vision and pattern recognition, pages 4345–4354, 2023

work page 2023
[41]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. IEEE/CVF International Conference on Computer Vision (ICCV), pages 3813–3824, 2023

work page 2023
[42]

Dreamdistribution: Learning prompt distribution for diverse in-distribution generation

Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, and Yunhao Ge. Dreamdistribution: Learning prompt distribution for diverse in-distribution generation. In Proceedings of the International Conference on Learning Representations (ICLR), 2025

work page 2025
[43]

Few-shot image generation with diffusion models

Fanbo Zhu, Zekun Liu, Yue Zhang, and Ziwei Liu. Few-shot image generation with diffusion models. arXiv preprint arXiv:2210.04584, 2022. 12

work page arXiv 2022