HFI: A unified framework for training-free detection and implicit watermarking of latent diffusion model generated images

Hankook Lee; Jaehoon Lee; Moontae Lee; Seunghyun Kim; Stanley Jungkyu Choi; Sungik Choi

arxiv: 2412.20704 · v2 · submitted 2024-12-30 · 💻 cs.CV · cs.LG

HFI: A unified framework for training-free detection and implicit watermarking of latent diffusion model generated images

Sungik Choi , Hankook Lee , Jaehoon Lee , Seunghyun Kim , Stanley Jungkyu Choi , Moontae Lee This is my paper

Pith reviewed 2026-05-23 07:15 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords latent diffusion modelstraining-free detectionaliasing distortionimplicit watermarkingautoencoder reconstructionAI-generated image detectionhigh-frequency information

0 comments

The pith

Measuring aliasing in autoencoder reconstructions detects LDM-generated images without training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Latent diffusion models produce images that are difficult to distinguish from real photographs, enabling potential misuse. Existing training-free detectors rely on reconstruction distance through an autoencoder, but this measure overfits to background content and underperforms on images with simple backgrounds. HFI instead treats the autoencoder as a downsampling-upsampling kernel and quantifies the resulting aliasing distortion of high-frequency information in the reconstruction. This produces a more robust detection signal that works across image types and generative models. The same signal also permits implicit watermarking by flagging outputs from one chosen LDM.

Core claim

HFI measures the extent of aliasing distortion appearing in the image reconstructed by an LDM autoencoder, treated as a downsampling-upsampling kernel. This training-free technique consistently outperforms other training-free methods on challenging generated images from various models and enables detection of images from a specified LDM for implicit watermarking.

What carries the argument

HFI, the aliasing extent measured in the autoencoder-reconstructed image, where the autoencoder acts as a downsampling-upsampling kernel causing high-frequency distortion.

If this is right

HFI detects images generated by various generative models without requiring training data.
It supports implicit watermarking by identifying outputs from a designated LDM.
It remains efficient because it needs only a single forward pass through the autoencoder.
It outperforms prior training-free reconstruction-distance methods on images with simple backgrounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The aliasing approach might apply to other autoencoder-based generators beyond LDMs.
HFI could be combined with existing frequency-domain detectors to increase robustness.
Real-time screening pipelines could adopt the method due to its low computational cost.
Further tests on images from future diffusion variants would clarify the signal's longevity.

Load-bearing premise

The aliasing distortion measured in the autoencoder reconstruction supplies a signal that generalizes across image types and models rather than overfitting to background information.

What would settle it

Finding that HFI fails to separate real images from LDM-generated images on simple-background cases or on models not used in development would show the aliasing signal does not generalize.

Figures

Figures reproduced from arXiv: 2412.20704 by Hankook Lee, Jaehoon Lee, Moontae Lee, Seunghyun Kim, Stanley Jungkyu Choi, Sungik Choi.

**Figure 1.** Figure 1: Problem setup of HFI. (Left) Setup of training-based AI-generated image detection methods. Such methods train and test on the same real data distribution. Furthermore, the framework can be costly when detecting images produced by large-scale text-to-image generative models. (Right) Pipeline of our proposed HFI. HFI operates only on the test time and can be computed efficiently via the autoencoder of the LD… view at source ↗

**Figure 2.** Figure 2: Motivation of HFI. (a) Sampled data from the ImageNet [11] dataset. (b) Reconstruction through the autoencoder of the Stable Diffusion [35] v1.1 model. We can observe obvious distortions in the high-frequency details. (c) Histogram of AEROBLADE [34] experimented in toy dataset. (d) Histogram of HFI experimented in toy dataset [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Samples of toy dataset. (a) Real ImageNet [11] data. (b),(c),(d) AI-generated data. SDv1.5 (b), SDv2-base (c) [35], and Kandinsky (d) [33] are applied for generation, respectively. 2.2. Latent diffusion models LDMs [35] efficiently generate high-dimensional images by modeling the diffusion process on the latent space Z ⊂ R C ′×H′×W′ instead of the data space X ⊂ R C×H×W with C = 3. Hence, LDMs first pre-tr… view at source ↗

**Figure 4.** Figure 4: Visualization of the edge-cases. (a) Visualization of the ImageNet data where AEROBLADE outputs the smallest uncertainty. (b) Visualization of the SDv1.4-generated data where AEROBLADE outputs the highest uncertainty. We mark the sample where HFI also fails [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Performance of HFI, AEROBLADE, and B-HFI under corruption. We experiment when JPEG compression, and crop are applied for corruption. We test HFI and AEROBLADE on the ImageNet vs SDv1.4 task with the autoencoder of SDv1.4 as the AE. We report the result in [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Dramatic advances in the quality of the latent diffusion models (LDMs) also led to the malicious use of AI-generated images. While current AI-generated image detection methods assume the availability of real/AI-generated images for training, this is practically limited given the vast expressibility of LDMs. This motivates the training-free detection setup where no related data are available in advance. The existing LDM-generated image detection method assumes that images generated by LDM are easier to reconstruct using an autoencoder than real images. However, we observe that this reconstruction distance is overfitted to background information, leading the current method to underperform in detecting images with simple backgrounds. To address this, we propose a novel method called HFI. Specifically, by viewing the autoencoder of LDM as a downsampling-upsampling kernel, HFI measures the extent of aliasing, a distortion of high-frequency information that appears in the reconstructed image. HFI is training-free, efficient, and consistently outperforms other training-free methods in detecting challenging images generated by various generative models. We also show that HFI can successfully detect the images generated from the specified LDM as a means of implicit watermarking. HFI outperforms the best baseline method while achieving magnitudes of

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HFI swaps reconstruction distance for an aliasing score on the VAE kernel to dodge background overfitting, but the abstract gives no evidence the new signal is actually content-independent.

read the letter

The one thing to know is that this paper replaces the reconstruction-distance baseline with a measure of aliasing distortion after treating the LDM autoencoder as a fixed downsampling-upsampling kernel. They correctly note that the old distance overfits to background and claim the aliasing signal works better on simple-background cases while also supporting implicit watermarking for a chosen LDM. That technical shift is the actual novelty here, and the training-free setup matches the practical constraint they describe around unlimited LDM expressivity. The dual-use claim for watermarking is a practical angle worth checking if the signal proves model-specific. The paper does identify a real limitation in prior training-free detectors and offers a frequency-based alternative that could be efficient to compute. The soft spot is exactly the one in the stress-test note: nothing shown so far demonstrates that aliasing amplitude stays independent of scene complexity, foreground frequencies, or VAE weights. If the metric still tracks image content the way the distance metric did, then both the detection gains and the watermarking specificity rest on an unverified assumption. The abstract states outperformance without numbers, datasets, or error analysis, so the central claim cannot be evaluated from what is given. This is for people working on practical, training-free detectors for generated images in forensics or authentication settings. A reader who needs ideas for content-independent signals would get value from the kernel view even if the results need tightening. It deserves peer review because the problem is current and the proposed fix is distinct enough to test, though the experiments will have to address the content-correlation risk directly.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes HFI, a training-free method for detecting LDM-generated images by quantifying aliasing distortion in autoencoder reconstructions (treating the autoencoder as a fixed downsampling-upsampling kernel) and demonstrates its use for implicit watermarking of images from a specified LDM. It claims to address overfitting to background information in prior reconstruction-distance baselines and to consistently outperform other training-free detectors on challenging cases.

Significance. If the aliasing metric supplies a content-independent signal that generalizes across models and image types, the approach would meaningfully advance training-free detection and provide a practical implicit-watermarking capability without requiring training data or parameter fitting.

major comments (1)

[Abstract] Abstract: the claim that aliasing 'consistently outperforms' prior training-free methods and enables reliable implicit watermarking depends on the unverified assumption that the aliasing signal is independent of image content (scene complexity, foreground frequency content); the abstract itself notes that reconstruction distance overfits to background, yet no controls or analysis are described to show the aliasing measure escapes the same dependence.

minor comments (2)

The abstract is truncated mid-sentence ('achieving magnitudes of').
No equations, pseudocode, or experimental details (datasets, metrics, baselines) appear in the abstract, preventing verification of the aliasing quantification procedure.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for identifying this important point about content dependence. We address the concern directly below and note that the full manuscript contains supporting experiments, though we agree additional explicit controls would strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that aliasing 'consistently outperforms' prior training-free methods and enables reliable implicit watermarking depends on the unverified assumption that the aliasing signal is independent of image content (scene complexity, foreground frequency content); the abstract itself notes that reconstruction distance overfits to background, yet no controls or analysis are described to show the aliasing measure escapes the same dependence.

Authors: The abstract correctly identifies the background overfitting problem with reconstruction distance. Section 3.2 formalizes the autoencoder as a fixed downsampling-upsampling kernel and defines the aliasing metric specifically on high-frequency residuals that arise from the LDM's latent-space generation process rather than from scene content. Experiments in Section 4 evaluate HFI on images spanning simple backgrounds, complex scenes, and varying foreground frequency content, showing consistent gains over baselines; these results provide empirical evidence that the aliasing signal is less content-dependent. We nevertheless agree that dedicated controls (e.g., frequency-content-matched real-image pairs) are not explicitly reported and will add them in the revision to make the independence claim fully rigorous. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation relies on independent observation of aliasing vs. prior reconstruction distance

full rationale

The provided abstract and description contain no equations, fitted parameters, or self-citations that reduce the HFI aliasing metric to its inputs by construction. The method starts from an empirical observation that reconstruction distance overfits to background, then defines aliasing as a distinct high-frequency distortion measure when treating the LDM autoencoder as a fixed downsampling-upsampling kernel. This is presented as a new signal without any fitting step or load-bearing self-citation chain. The central claim (superior training-free detection and implicit watermarking) therefore rests on external validation against baselines rather than tautological re-use of the same quantity. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are specified.

pith-pipeline@v0.9.0 · 5777 in / 1005 out tokens · 23199 ms · 2026-05-23T07:15:10.672550+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 2 internal anchors

[1]

Improving feature stability during upsampling – spectral artifacts and the importance of spatial context

Shashank Agnihotri, Julia Grabinski, and Margret Ke- uper. Improving feature stability during upsampling – spectral artifacts and the importance of spatial context. In ECCV, 2024. 8

work page 2024
[2]

Baraniuk

Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, and Richard G. Baraniuk. Self- consuming generative models go mad. In ICLR, 2024. 1

work page 2024
[3]

Synthbuster: Towards detection of diffusion model generated images

Quentin Bammey. Synthbuster: Towards detection of diffusion model generated images. IEEE Open Journal of Signal Processing, 5:1–9, 2024. doi: 10.1109/OJSP. 2023.3337714. 5

work page doi:10.1109/ojsp 2024
[4]

Google sued by us artists over ai image generator, 2024

Blake Brittain. Google sued by us artists over ai image generator, 2024. 1

work page 2024
[5]

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018. 5

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

What makes fake images detectable? understanding properties that generalize

Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. What makes fake images detectable? understanding properties that generalize. In ECCV, 2020. 8

work page 2020
[7]

Drct: Diffusion reconstruction contrastive train- ing towards universal detection of diffusion generated images

Baoying Chen, Jishen Zeng, Jianquan Yang, and Rui Yang. Drct: Diffusion reconstruction contrastive train- ing towards universal detection of diffusion generated images. In ICML, 2024. 2, 5, 6, 11

work page 2024
[8]

When semantic segmentation meets frequency aliasing

Linwei Chen, Lin Gu, and Ying Fu. When semantic segmentation meets frequency aliasing. In ICLR, 2024. 7

work page 2024
[9]

Dif- fusionface: Towards a comprehensive dataset for diffusion-based face forgery analysis

Zhongxi Chen, Ke Sun, Ziyin Zhou, Xianming Lin, Xiaoshuai Sun, Liujuan Cao, and Rongrong Ji. Dif- fusionface: Towards a comprehensive dataset for diffusion-based face forgery analysis. arXiv preprint arXiv:2403.18471, 2024. 2, 5, 6, 13

work page arXiv 2024
[10]

Raise: a raw images dataset for digital image forensics

Duc-Tien Dang-Nguyen, Cecilia Pasquini, Valentina Conotter, and Giulia Boato. Raise: a raw images dataset for digital image forensics. In Proceedings of the 6th ACM Multimedia Systems Conference, MMSys ’15, pp. 219–224, New York, NY , USA, 2015. Associa- tion for Computing Machinery. ISBN 9781450333511. doi: 10.1145/2713168.2713194. URL https:// doi.org/1...

work page doi:10.1145/2713168.2713194 2015
[11]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 3, 4, 5

work page 2009
[12]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. In NeurIPS, 2021. 4, 5

work page 2021
[13]

Si- moncelli

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P. Si- moncelli. Image quality assessment: Unifying struc- ture and texture similarity. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, pp. 1–1, 2020. ISSN 1939-3539. doi: 10.1109/tpami.2020.3045810. URL http://dx.doi.org/10.1109/TPAMI. 2020.3045810. 7

work page doi:10.1109/tpami.2020.3045810 2020
[14]

The stable signature: Rooting watermarks in latent diffusion models

Pierre Fernandez, Guillaume Couairon, Herv´e J´egou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. In ICCV, 2023. 3

work page 2023
[15]

Lever- aging frequency analysis for deep fake image recogni- tion

Joel Frank, Thorsten Eisenhofer, Lea Sch¨onherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. Lever- aging frequency analysis for deep fake image recogni- tion. In ICML, 2020. 8

work page 2020
[16]

Gonzalez and Richard E

Rafaek C. Gonzalez and Richard E. Woods. Digital Image Processing (3rd Edition). Prentice-Hall, Inc., USA, 2006. ISBN 013168728X. 3

work page 2006
[17]

Wukong: A 100 million large-scale chinese cross- modal pre-training benchmark

Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei Zhang, Xin Jiang, Chunjing Xu, and Hang Xu. Wukong: A 100 million large-scale chinese cross- modal pre-training benchmark. In Advances in Neural Information Processing Systems, Datasets and Bench- marks Track, 2022. 5

work page 2022
[18]

Vector quantized diffusion model for text-to- image synthesis

Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, and Baining Guo. Vector quantized diffusion model for text-to- image synthesis. In CVPR, 2022. 5

work page 2022
[19]

Rigid: A training-free and model-agnostic framework for ro- bust ai-generated image detection

Zhiyuan He, Pin-Yu Chen, and Tsung-Yi Ho. Rigid: A training-free and model-agnostic framework for ro- bust ai-generated image detection. arXiv preprint arXiv:2405.20112, 2024. 1, 5, 6, 8

work page arXiv 2024
[20]

Alias-free generative adversarial networks

Tero Karras, Miika Aittala, Samuli Laine, Erik H¨ark¨onen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. In NeurIPS, 2021. 8

work page 2021
[21]

Auto-encoding variational bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In ICLR, 2014. 1, 3

work page 2014
[22]

miniSD-diffusers: A Text-to-Image Model based on Stable Diffusion, 2022

Lambda Labs. miniSD-diffusers: A Text-to-Image Model based on Stable Diffusion, 2022. URL https: / / huggingface . co / lambdalabs / miniSD - diffusers. 4, 5, 12, 13

work page 2022
[23]

Frequency-aware dis- criminative feature learning supervised by single-center loss for face forgery detection

Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, and Yongdong Zhang. Frequency-aware dis- criminative feature learning supervised by single-center loss for face forgery detection. In CVPR, 2021. 8

work page 2021
[24]

Exposing the fake: Effective diffusion- generated images detection

RuiPeng Ma, Jinhao Duan, Fei Kong, Xiaoshuang Shi, and Kaidi Xu. Exposing the fake: Effective diffusion- generated images detection. In The Second Work- shop on New Frontiers in Adversarial Machine Learn- ing, 2023. URL https://openreview.net/ forum?id=7R62e4Wgim. 8

work page 2023
[25]

MidJourney AI Art Generator, 2022

MidJourney. MidJourney AI Art Generator, 2022. URL https://www.midjourney.com/home/. 1, 5

work page 2022
[26]

Glide: Towards photore- alistic image generation and editing with text-guided diffusion models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya 9 Sutskever, and Mark Chen. Glide: Towards photore- alistic image generation and editing with text-guided diffusion models. In ICML, 2022. 1, 5

work page 2022
[27]

Towards universal fake image detectors that generalize across generative models

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards universal fake image detectors that generalize across generative models. In CVPR, 2023. 8

work page 2023
[28]

Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fer- nandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patri...

work page 2024
[29]

Pytorch: An imperative style, high- performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K¨opf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chil- amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high- perf...

work page 2019
[30]

Popescu and Hany Farid

Alin C. Popescu and Hany Farid. Exposing digital forg- eries by detecting traces of resampling. IEEE Trans- actions on Signal Processing , 53(2):758–767, 2005. 8

work page 2005
[31]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In ICML, 2021. 1

work page 2021
[32]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Hierarchical text-conditional im- age generation with clip latents. arXiv preprint arXiv:2204.06125, 2022. URL https://arxiv. org/abs/2204.06125. 5

work page internal anchor Pith review Pith/arXiv arXiv 2022
[33]

Kandinsky: An improved text-to-image synthesis with image prior and latent diffusion

Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, An- drey Kuznetsov, and Denis Dimitrov. Kandinsky: An improved text-to-image synthesis with image prior and latent diffusion. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Process- ing...

work page 2023
[34]

Aer- oblade: Training-free detection of latent diffusion im- ages using autoencoder reconstruction error

Jonas Ricker, Denis Lukovnikov, and Asja Fischer. Aer- oblade: Training-free detection of latent diffusion im- ages using autoencoder reconstruction error. In CVPR,

work page
[35]

1, 2, 3, 4, 5, 6, 8, 12, 13

work page
[36]

High-resolution im- age synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution im- age synthesis with latent diffusion models. In CVPR,

work page
[37]

1, 2, 3, 4, 5, 12, 13

work page
[38]

Laion-5b: An open large-scale dataset for training next generation image-text models

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmar- czyk, and Jenia Jitsev. Laion-5b: An open large-scale dataset for training next generation image-text model...

work page 2022
[39]

Very deep convolutional networks for large-scale image recogni- tion

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni- tion. In ICLR, 2015. 5

work page 2015
[40]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In NeurIPS, 2019. 1

work page 2019
[41]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In ICLR, 2021. 1

work page 2021
[42]

Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection

Chuangchuang Tan, Huan Liu, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection. InCVPR,

work page
[43]

Frequency-aware deepfake detection: Improving generalizability through frequency space learning

Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake detection: Improving generalizability through frequency space learning. In AAAI, 2024. 8

work page 2024
[44]

‘we definitely messed up’: Why did google’s ai tool make offensive historical images?,

The Guardian. ‘we definitely messed up’: Why did google’s ai tool make offensive historical images?,

work page
[45]

Impact of aliasing on generalization in deep convolutional networks

Cristina Vasconcelos, Hugo Larochelle, Vincent Du- moulin, Rob Romijnders, Nicolas Le Roux, and Ross Goroshin. Impact of aliasing on generalization in deep convolutional networks. In ICCV, 2021. 8

work page 2021
[46]

Gregory K. Wallace. The jpeg still picture compression standard. IEEE Transactions on Consumer Electronics, 38(1):xviii–xxxiv, 1992. doi: 10.1109/30.125072. 7

work page doi:10.1109/30.125072 1992
[47]

Dynamic graph learning with content- guided spatial-frequency relation reasoning for deep- fake detection

Yuan Wang, Kun Yu, Chen Chen, Xiyuan Hu, and Silong Peng. Dynamic graph learning with content- guided spatial-frequency relation reasoning for deep- fake detection. In CVPR, 2023. 8

work page 2023
[48]

Where did i come from? origin attribution of ai-generated images

Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, and Shiqing Ma. Where did i come from? origin attribution of ai-generated images. In NeurIPS, 2023. 3 10

work page 2023
[49]

Metaxas, and Shiqing Ma

Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, and Shiqing Ma. How to trace latent generative model generated images without artificial watermark? In ICML, 2024. 2, 3, 7, 8, 11

work page 2024
[50]

Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust

Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust. In NeurIPS, 2023. 3

work page 2023
[51]

Tedigan: Text-guided diverse face image genera- tion and manipulation

Weihao Xia, Yujiu Yang, Jing-Hao Xue, and Baoyuan Wu. Tedigan: Text-guided diverse face image genera- tion and manipulation. In CVPR, 2021. 5

work page 2021
[52]

A sanity check for ai-generated image detection

Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xi- aolong Jiang, Yao Hu, and Weidi Xie. A sanity check for ai-generated image detection. arXiv preprint arXiv:2406.19435, 2024. 8

work page arXiv 2024
[53]

South korea to criminalise watching or possessing sexually explicit deepfakes, 2024

Hyunsu Yim. South korea to criminalise watching or possessing sexually explicit deepfakes, 2024. 1

work page 2024
[54]

Making convolutional networks shift- invariant again

Richard Zhang. Making convolutional networks shift- invariant again. In ICML, 2019. 8

work page 2019
[55]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable ef- fectiveness of deep features as a perceptual metric. In CVPR, 2018. 3, 5

work page 2018
[56]

Genimage: A million- scale benchmark for detecting ai-generated image

Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million- scale benchmark for detecting ai-generated image. In NeurIPS, 2023. 1, 2, 5, 6, 12 A. Full results on the cross-autoencoder setup We refer to Table 10 and 11 for the full results ofHFI com- pared to AEROBLADE. B. Fu...

work page 2023
[57]

Bold denotes the best method

dataset. Bold denotes the best method. Method ADM BigGAN GLIDE Midj SD1.4 SD1.5 VQDM Wukong Mean AE: SDv1.4 [35] AEROBLADELPIPS 0.804/0.757 0.889/0.909 0.975/0.9760.921/0.928 0.980/0.986 0.981/0.986 0.640/0.595 0.983/0.988 0.897/0.891 AEROBLADELPIPS2 0.856/0.833 0.981/0.987 0.989/0.990 0.918/0.928 0.982/0.988 0.984/0.989 0.732/0.712 0.983/0.989 0.928/0.92...

work page

[1] [1]

Improving feature stability during upsampling – spectral artifacts and the importance of spatial context

Shashank Agnihotri, Julia Grabinski, and Margret Ke- uper. Improving feature stability during upsampling – spectral artifacts and the importance of spatial context. In ECCV, 2024. 8

work page 2024

[2] [2]

Baraniuk

Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, and Richard G. Baraniuk. Self- consuming generative models go mad. In ICLR, 2024. 1

work page 2024

[3] [3]

Synthbuster: Towards detection of diffusion model generated images

Quentin Bammey. Synthbuster: Towards detection of diffusion model generated images. IEEE Open Journal of Signal Processing, 5:1–9, 2024. doi: 10.1109/OJSP. 2023.3337714. 5

work page doi:10.1109/ojsp 2024

[4] [4]

Google sued by us artists over ai image generator, 2024

Blake Brittain. Google sued by us artists over ai image generator, 2024. 1

work page 2024

[5] [5]

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018. 5

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

What makes fake images detectable? understanding properties that generalize

Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. What makes fake images detectable? understanding properties that generalize. In ECCV, 2020. 8

work page 2020

[7] [7]

Drct: Diffusion reconstruction contrastive train- ing towards universal detection of diffusion generated images

Baoying Chen, Jishen Zeng, Jianquan Yang, and Rui Yang. Drct: Diffusion reconstruction contrastive train- ing towards universal detection of diffusion generated images. In ICML, 2024. 2, 5, 6, 11

work page 2024

[8] [8]

When semantic segmentation meets frequency aliasing

Linwei Chen, Lin Gu, and Ying Fu. When semantic segmentation meets frequency aliasing. In ICLR, 2024. 7

work page 2024

[9] [9]

Dif- fusionface: Towards a comprehensive dataset for diffusion-based face forgery analysis

Zhongxi Chen, Ke Sun, Ziyin Zhou, Xianming Lin, Xiaoshuai Sun, Liujuan Cao, and Rongrong Ji. Dif- fusionface: Towards a comprehensive dataset for diffusion-based face forgery analysis. arXiv preprint arXiv:2403.18471, 2024. 2, 5, 6, 13

work page arXiv 2024

[10] [10]

Raise: a raw images dataset for digital image forensics

Duc-Tien Dang-Nguyen, Cecilia Pasquini, Valentina Conotter, and Giulia Boato. Raise: a raw images dataset for digital image forensics. In Proceedings of the 6th ACM Multimedia Systems Conference, MMSys ’15, pp. 219–224, New York, NY , USA, 2015. Associa- tion for Computing Machinery. ISBN 9781450333511. doi: 10.1145/2713168.2713194. URL https:// doi.org/1...

work page doi:10.1145/2713168.2713194 2015

[11] [11]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 3, 4, 5

work page 2009

[12] [12]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. In NeurIPS, 2021. 4, 5

work page 2021

[13] [13]

Si- moncelli

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P. Si- moncelli. Image quality assessment: Unifying struc- ture and texture similarity. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, pp. 1–1, 2020. ISSN 1939-3539. doi: 10.1109/tpami.2020.3045810. URL http://dx.doi.org/10.1109/TPAMI. 2020.3045810. 7

work page doi:10.1109/tpami.2020.3045810 2020

[14] [14]

The stable signature: Rooting watermarks in latent diffusion models

Pierre Fernandez, Guillaume Couairon, Herv´e J´egou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. In ICCV, 2023. 3

work page 2023

[15] [15]

Lever- aging frequency analysis for deep fake image recogni- tion

Joel Frank, Thorsten Eisenhofer, Lea Sch¨onherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. Lever- aging frequency analysis for deep fake image recogni- tion. In ICML, 2020. 8

work page 2020

[16] [16]

Gonzalez and Richard E

Rafaek C. Gonzalez and Richard E. Woods. Digital Image Processing (3rd Edition). Prentice-Hall, Inc., USA, 2006. ISBN 013168728X. 3

work page 2006

[17] [17]

Wukong: A 100 million large-scale chinese cross- modal pre-training benchmark

Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei Zhang, Xin Jiang, Chunjing Xu, and Hang Xu. Wukong: A 100 million large-scale chinese cross- modal pre-training benchmark. In Advances in Neural Information Processing Systems, Datasets and Bench- marks Track, 2022. 5

work page 2022

[18] [18]

Vector quantized diffusion model for text-to- image synthesis

Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, and Baining Guo. Vector quantized diffusion model for text-to- image synthesis. In CVPR, 2022. 5

work page 2022

[19] [19]

Rigid: A training-free and model-agnostic framework for ro- bust ai-generated image detection

Zhiyuan He, Pin-Yu Chen, and Tsung-Yi Ho. Rigid: A training-free and model-agnostic framework for ro- bust ai-generated image detection. arXiv preprint arXiv:2405.20112, 2024. 1, 5, 6, 8

work page arXiv 2024

[20] [20]

Alias-free generative adversarial networks

Tero Karras, Miika Aittala, Samuli Laine, Erik H¨ark¨onen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. In NeurIPS, 2021. 8

work page 2021

[21] [21]

Auto-encoding variational bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In ICLR, 2014. 1, 3

work page 2014

[22] [22]

miniSD-diffusers: A Text-to-Image Model based on Stable Diffusion, 2022

Lambda Labs. miniSD-diffusers: A Text-to-Image Model based on Stable Diffusion, 2022. URL https: / / huggingface . co / lambdalabs / miniSD - diffusers. 4, 5, 12, 13

work page 2022

[23] [23]

Frequency-aware dis- criminative feature learning supervised by single-center loss for face forgery detection

Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, and Yongdong Zhang. Frequency-aware dis- criminative feature learning supervised by single-center loss for face forgery detection. In CVPR, 2021. 8

work page 2021

[24] [24]

Exposing the fake: Effective diffusion- generated images detection

RuiPeng Ma, Jinhao Duan, Fei Kong, Xiaoshuang Shi, and Kaidi Xu. Exposing the fake: Effective diffusion- generated images detection. In The Second Work- shop on New Frontiers in Adversarial Machine Learn- ing, 2023. URL https://openreview.net/ forum?id=7R62e4Wgim. 8

work page 2023

[25] [25]

MidJourney AI Art Generator, 2022

MidJourney. MidJourney AI Art Generator, 2022. URL https://www.midjourney.com/home/. 1, 5

work page 2022

[26] [26]

Glide: Towards photore- alistic image generation and editing with text-guided diffusion models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya 9 Sutskever, and Mark Chen. Glide: Towards photore- alistic image generation and editing with text-guided diffusion models. In ICML, 2022. 1, 5

work page 2022

[27] [27]

Towards universal fake image detectors that generalize across generative models

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards universal fake image detectors that generalize across generative models. In CVPR, 2023. 8

work page 2023

[28] [28]

Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fer- nandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patri...

work page 2024

[29] [29]

Pytorch: An imperative style, high- performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K¨opf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chil- amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high- perf...

work page 2019

[30] [30]

Popescu and Hany Farid

Alin C. Popescu and Hany Farid. Exposing digital forg- eries by detecting traces of resampling. IEEE Trans- actions on Signal Processing , 53(2):758–767, 2005. 8

work page 2005

[31] [31]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In ICML, 2021. 1

work page 2021

[32] [32]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Hierarchical text-conditional im- age generation with clip latents. arXiv preprint arXiv:2204.06125, 2022. URL https://arxiv. org/abs/2204.06125. 5

work page internal anchor Pith review Pith/arXiv arXiv 2022

[33] [33]

Kandinsky: An improved text-to-image synthesis with image prior and latent diffusion

Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, An- drey Kuznetsov, and Denis Dimitrov. Kandinsky: An improved text-to-image synthesis with image prior and latent diffusion. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Process- ing...

work page 2023

[34] [34]

Aer- oblade: Training-free detection of latent diffusion im- ages using autoencoder reconstruction error

Jonas Ricker, Denis Lukovnikov, and Asja Fischer. Aer- oblade: Training-free detection of latent diffusion im- ages using autoencoder reconstruction error. In CVPR,

work page

[35] [35]

1, 2, 3, 4, 5, 6, 8, 12, 13

work page

[36] [36]

High-resolution im- age synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution im- age synthesis with latent diffusion models. In CVPR,

work page

[37] [37]

1, 2, 3, 4, 5, 12, 13

work page

[38] [38]

Laion-5b: An open large-scale dataset for training next generation image-text models

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmar- czyk, and Jenia Jitsev. Laion-5b: An open large-scale dataset for training next generation image-text model...

work page 2022

[39] [39]

Very deep convolutional networks for large-scale image recogni- tion

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni- tion. In ICLR, 2015. 5

work page 2015

[40] [40]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In NeurIPS, 2019. 1

work page 2019

[41] [41]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In ICLR, 2021. 1

work page 2021

[42] [42]

Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection

Chuangchuang Tan, Huan Liu, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection. InCVPR,

work page

[43] [43]

Frequency-aware deepfake detection: Improving generalizability through frequency space learning

Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake detection: Improving generalizability through frequency space learning. In AAAI, 2024. 8

work page 2024

[44] [44]

‘we definitely messed up’: Why did google’s ai tool make offensive historical images?,

The Guardian. ‘we definitely messed up’: Why did google’s ai tool make offensive historical images?,

work page

[45] [45]

Impact of aliasing on generalization in deep convolutional networks

Cristina Vasconcelos, Hugo Larochelle, Vincent Du- moulin, Rob Romijnders, Nicolas Le Roux, and Ross Goroshin. Impact of aliasing on generalization in deep convolutional networks. In ICCV, 2021. 8

work page 2021

[46] [46]

Gregory K. Wallace. The jpeg still picture compression standard. IEEE Transactions on Consumer Electronics, 38(1):xviii–xxxiv, 1992. doi: 10.1109/30.125072. 7

work page doi:10.1109/30.125072 1992

[47] [47]

Dynamic graph learning with content- guided spatial-frequency relation reasoning for deep- fake detection

Yuan Wang, Kun Yu, Chen Chen, Xiyuan Hu, and Silong Peng. Dynamic graph learning with content- guided spatial-frequency relation reasoning for deep- fake detection. In CVPR, 2023. 8

work page 2023

[48] [48]

Where did i come from? origin attribution of ai-generated images

Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, and Shiqing Ma. Where did i come from? origin attribution of ai-generated images. In NeurIPS, 2023. 3 10

work page 2023

[49] [49]

Metaxas, and Shiqing Ma

Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, and Shiqing Ma. How to trace latent generative model generated images without artificial watermark? In ICML, 2024. 2, 3, 7, 8, 11

work page 2024

[50] [50]

Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust

Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust. In NeurIPS, 2023. 3

work page 2023

[51] [51]

Tedigan: Text-guided diverse face image genera- tion and manipulation

Weihao Xia, Yujiu Yang, Jing-Hao Xue, and Baoyuan Wu. Tedigan: Text-guided diverse face image genera- tion and manipulation. In CVPR, 2021. 5

work page 2021

[52] [52]

A sanity check for ai-generated image detection

Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xi- aolong Jiang, Yao Hu, and Weidi Xie. A sanity check for ai-generated image detection. arXiv preprint arXiv:2406.19435, 2024. 8

work page arXiv 2024

[53] [53]

South korea to criminalise watching or possessing sexually explicit deepfakes, 2024

Hyunsu Yim. South korea to criminalise watching or possessing sexually explicit deepfakes, 2024. 1

work page 2024

[54] [54]

Making convolutional networks shift- invariant again

Richard Zhang. Making convolutional networks shift- invariant again. In ICML, 2019. 8

work page 2019

[55] [55]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable ef- fectiveness of deep features as a perceptual metric. In CVPR, 2018. 3, 5

work page 2018

[56] [56]

Genimage: A million- scale benchmark for detecting ai-generated image

Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million- scale benchmark for detecting ai-generated image. In NeurIPS, 2023. 1, 2, 5, 6, 12 A. Full results on the cross-autoencoder setup We refer to Table 10 and 11 for the full results ofHFI com- pared to AEROBLADE. B. Fu...

work page 2023

[57] [57]

Bold denotes the best method

dataset. Bold denotes the best method. Method ADM BigGAN GLIDE Midj SD1.4 SD1.5 VQDM Wukong Mean AE: SDv1.4 [35] AEROBLADELPIPS 0.804/0.757 0.889/0.909 0.975/0.9760.921/0.928 0.980/0.986 0.981/0.986 0.640/0.595 0.983/0.988 0.897/0.891 AEROBLADELPIPS2 0.856/0.833 0.981/0.987 0.989/0.990 0.918/0.928 0.982/0.988 0.984/0.989 0.732/0.712 0.983/0.989 0.928/0.92...

work page