Spectral Tail Auxiliary Learning for AI-Generated Image Detection

Jiahui Zhang; Wenhao Wang; Xingyi Li; Yiheng Li; Yun Cao

arxiv: 2605.22751 · v1 · pith:RMP537M7new · submitted 2026-05-21 · 💻 cs.CV

Spectral Tail Auxiliary Learning for AI-Generated Image Detection

Xingyi Li , Jiahui Zhang , Yiheng Li , Yun Cao , Wenhao Wang This is my paper

Pith reviewed 2026-05-22 06:10 UTC · model grok-4.3

classification 💻 cs.CV

keywords AI-generated image detectionfrequency domainspectral analysispower-law decayauxiliary learninggeneralizationhigh-frequency artifactsnonlinear harmonics

0 comments

The pith

Generated images deviate from power-law spectral decay by showing an anomalous uplift in the ultra-high-frequency tail, which transfers via auxiliary training to improve spatial detectors without added inference cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes one-dimensional radial log-power spectra and finds that generated images do not simply have more or less energy in high frequencies overall. Instead they break from the expected power-law decay with a distinct uplift in the ultra-high-frequency tail. This uplift is traced to nonlinear harmonic accumulation inside trained generative models and is presented as a recurring structural cue. The authors build Spectral Tail Auxiliary Learning to let a frequency-domain teacher pass these tail cues to a conventional spatial detector only during training. At inference the frequency components are removed entirely, leaving a detector that runs at normal speed yet generalizes better across generators and datasets.

Core claim

Generated images deviate from power-law decay in their one-dimensional radial log-power spectra and exhibit an anomalous uplift in the ultra-high-frequency tail; this uplift arises from nonlinear harmonic accumulation and functions as a structural cue that can be transferred from a tail-aware frequency teacher to a spatial detector during training, with all frequency modules discarded at inference time.

What carries the argument

Spectral Tail Auxiliary Learning (STAL), a training-time auxiliary supervision framework that transfers ultra-high-frequency tail cues from a frequency-domain teacher to a spatial detector.

If this is right

The detector achieves stronger generalization across generators and data distributions while introducing zero inference overhead.
The same auxiliary supervision can be applied in real-world scenarios with mixed real and generated images.
Frequency-domain analysis is used only for training and does not affect deployment speed or memory.
The structural cue is claimed to hold across multiple public generative architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the tail uplift proves architecture-agnostic, the same teacher signal could be tested on emerging diffusion or autoregressive models not seen in the current experiments.
The approach separates training-time frequency supervision from inference-time spatial processing, suggesting a template for other detection tasks where heavy analysis is acceptable only during learning.
Connecting the observed harmonic accumulation to known nonlinearities in neural network activations offers a possible route to predict the uplift strength from model architecture details alone.

Load-bearing premise

The spectral tail uplift must be consistent enough across generative architectures for frequency-teacher signals to reliably improve a spatial detector's generalization.

What would settle it

A new generative model whose images follow clean power-law decay in the one-dimensional radial log-power spectrum with no uplift in the ultra-high-frequency tail would falsify the core spectral observation.

Figures

Figures reproduced from arXiv: 2605.22751 by Jiahui Zhang, Wenhao Wang, Xingyi Li, Yiheng Li, Yun Cao.

**Figure 1.** Figure 1: Radial FFT [10] power spectra of real images and fakes from BigGAN [4], SD-v1.5 [34], SDXL [31], Midjourney [29], FLUX [22], and SD-VAE [34] reconstructions. Left: spectra over the full radial frequency range. Middle: spectral tail over the local frequency range ρ ∈ [0.7, 1]. Right: normalized tail curves anchored at ρ = 0.7 to expose shape differences. Across generators, fakes show consistent spectral-tai… view at source ↗

**Figure 2.** Figure 2: Activation nonlinearity drives the spectral tail uplift. We replace every SiLU in SD-VAE [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of STAL. A tail-aware frequency teacher extracts spectral-tail cues from a [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Robustness analysis on GenImage. We evaluate STAL and competing methods under JPEG [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Effect of JPEG compression on the spectral tail. We apply JPEG compression with different [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of spectral-tail under trained and random VAE weights. We keep the SD-VAE [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of model attention. From top to bottom, the three rows show the original [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

read the original abstract

As generative image models evolve rapidly, the perceptual gap between generated and real images continues to narrow, making AI-generated image detection increasingly challenging. Many existing methods exploit frequency-domain cues for detection, typically described as frequency-domain artifacts or high-frequency discrepancies. However, the specific and recurring spectral regularities remain insufficiently understood and characterized. In this paper, we systematically analyze the one-dimensional radial log-power spectra of real and generated images. We find that generated images do not necessarily exhibit higher or lower energy across the entire spectrum or high-band range. Instead, their spectra deviate from the power-law decay and show an anomalous uplift in the ultra-high-frequency tail. We term this phenomenon spectral tail uplift. We further attribute this phenomenon to nonlinear harmonic accumulation in trained generative models, suggesting that it can serve as a structural cue across generative architectures. Based on this observation, we propose Spectral Tail Auxiliary Learning (STAL), a frequency-domain auxiliary supervision framework for generalizable AI-generated image detection. STAL transfers spectral-tail cues from a tail-aware frequency teacher to a spatial detector during training, while all frequency-domain modules are discarded at inference time. Consequently, STAL introduces no inference overhead. Extensive experiments on 9 public datasets show that STAL achieves strong generalization and stability across generators, data distributions, and real-world scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pins down a consistent ultra-high-frequency tail uplift in generated images' radial spectra and shows it can be transferred via auxiliary training to improve a spatial detector with zero inference cost.

read the letter

The main point is that generated images deviate from power-law decay by lifting up in the ultra-high-frequency tail of their one-dimensional radial log-power spectra, and the authors treat this as a structural cue from nonlinear harmonic accumulation. They turn the observation into Spectral Tail Auxiliary Learning, where a frequency teacher supervises a spatial model during training and all frequency modules are removed at inference. That setup is practical and avoids the usual overhead of frequency methods at test time. The full manuscript backs the spectral pattern with numbers across multiple generators and shows generalization gains on nine datasets that hold under distribution shifts and real-world conditions. The auxiliary loss formulation looks straightforward and the experiments report stability without obvious post-hoc fitting. The central claim is not circular; it starts from an empirical regularity and tests whether the cue transfers. One soft spot is that the harmonic-accumulation story is plausible but remains more interpretive than mechanistically proven, though that does not undermine the detection results. Dataset details and spectrum computation steps are present but could be spelled out even more explicitly for someone trying to replicate the exact radial averaging. This work is aimed at computer-vision forensics groups that need detectors to stay ahead of newer generators. A reader already running frequency or artifact-based baselines would find the tail characterization and the teacher-student transfer useful to try. The evidence is sharp enough and the method clean enough that it deserves a serious referee rather than a desk reject.

Referee Report

1 major / 2 minor

Summary. The manuscript analyzes the one-dimensional radial log-power spectra of real and AI-generated images, identifying that generated images deviate from power-law decay via an anomalous uplift in the ultra-high-frequency tail, which the authors attribute to nonlinear harmonic accumulation and treat as an architecture-agnostic structural cue. Building on this, they propose Spectral Tail Auxiliary Learning (STAL), a training framework that transfers spectral-tail cues from a frequency-domain teacher network to a spatial-domain detector via auxiliary supervision; frequency modules are discarded at inference, yielding zero overhead. Experiments across 9 public datasets spanning multiple generators and real-world scenarios are reported to demonstrate improved generalization and stability.

Significance. If the spectral tail uplift observation and the auxiliary transfer mechanism hold under the reported conditions, the work offers a concrete, low-cost route to strengthening generalization in AI-generated image detectors without altering inference latency. The multi-generator, multi-dataset empirical backing and explicit validation of inference-time module removal are strengths that could make the approach practically relevant for forensic and content-authenticity applications.

major comments (1)

[Methods / Spectral Analysis] The central claim that the spectral tail uplift serves as a reliable, transferable cue across generative architectures rests on the consistency of the one-dimensional radial log-power spectrum computation; the manuscript should supply the precise radial-averaging procedure, frequency binning, and normalization steps (including any windowing or log-transform details) in the methods section so that the uplift can be independently reproduced and its statistical significance quantified.

minor comments (2)

[Figures] Figure captions for the spectral plots should explicitly list the exact datasets, generators, and number of images used in each panel to facilitate direct comparison with the quantitative tables.
[STAL Framework] The auxiliary loss formulation would benefit from an explicit equation showing how the frequency-teacher output is aligned with the spatial detector's intermediate features (e.g., via MSE or KL divergence on the tail region).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment and the recommendation for minor revision. We address the point below.

read point-by-point responses

Referee: [Methods / Spectral Analysis] The central claim that the spectral tail uplift serves as a reliable, transferable cue across generative architectures rests on the consistency of the one-dimensional radial log-power spectrum computation; the manuscript should supply the precise radial-averaging procedure, frequency binning, and normalization steps (including any windowing or log-transform details) in the methods section so that the uplift can be independently reproduced and its statistical significance quantified.

Authors: We agree that explicit implementation details are necessary for independent reproduction and statistical assessment. The manuscript describes the computation of one-dimensional radial log-power spectra and the observed uplift but does not enumerate every procedural step. In the revised manuscript we will add a dedicated subsection in Methods that specifies the radial-averaging procedure (including the exact definition of radial bins and averaging kernel), frequency binning scheme, normalization (e.g., per-image or global), any windowing function applied prior to the FFT, and the precise log-transform formulation. These additions will allow readers to reproduce the spectra and quantify the statistical significance of the tail uplift across generators. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is empirical and self-contained

full rationale

The paper's chain begins with direct empirical measurement of one-dimensional radial log-power spectra on real and generated images, identifies the spectral tail uplift as an observed deviation from power-law decay, attributes it to nonlinear harmonic accumulation based on that observation, and then introduces STAL as an auxiliary training technique that transfers the cue without retaining frequency modules at inference. No equations, fitted parameters, or self-citations are shown to reduce the central claim or final detection metric to the inputs by construction. The approach is validated through experiments across multiple datasets and generators, remaining independent of any internal redefinition or load-bearing self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation that real-image spectra follow power-law decay while generated images exhibit tail uplift due to nonlinear harmonic accumulation; no explicit free parameters or invented entities are stated in the abstract.

axioms (1)

domain assumption Real images exhibit power-law decay in their one-dimensional radial log-power spectra
Used as the baseline against which generated-image deviations are identified.

pith-pipeline@v0.9.0 · 5768 in / 1164 out tokens · 61660 ms · 2026-05-22T06:10:29.653941+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 2 internal anchors

[1]

Ahmed, T

N. Ahmed, T. Natarajan, and K.R. Rao. Discrete cosine transform.IEEE Transactions on Computers, C-23(1):90–93, 1974. doi: 10.1109/T-C.1974.223784

work page doi:10.1109/t-c.1974.223784 1974
[2]

Synthbuster: Towards detection of diffusion model generated images.IEEE Open Journal of Signal Processing, 5:1–9, 2024

Quentin Bammey. Synthbuster: Towards detection of diffusion model generated images.IEEE Open Journal of Signal Processing, 5:1–9, 2024. doi: 10.1109/OJSP.2023.3337714

work page doi:10.1109/ojsp.2023.3337714 2024
[3]

Flux.1 [dev]

Black Forest Labs. Flux.1 [dev]. Hugging Face model card, 2024. URL https://huggingface.co/ black-forest-labs/FLUX.1-dev. Model card, accessed 2026-03-30

work page 2024
[4]

Large scale GAN training for high fidelity natural image synthesis

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. InInternational Conference on Learning Representations, 2019. URL https:// openreview.net/forum?id=B1xsqj09Fm

work page 2019
[5]

Real-time deepfake detection in the real world,

Bar Cavia, Eliahu Horwitz, Tal Reiss, and Yedid Hoshen. Real-time deepfake detection in the real world,

work page
[6]

URLhttps://openreview.net/forum?id=kkE7jlqKae

work page
[7]

DRCT: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images

Baoying Chen, Jishen Zeng, Jianquan Yang, and Rui Yang. DRCT: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 7621–7639. PMLR, 21–27 Jul 2024

work page 2024
[8]

Dual data alignment makes AI-generated image detector easier generalizable

Ruoxin Chen, Junwei Xi, Zhiyuan Yan, Ke-Yue Zhang, Shuang Wu, Jingyi Xie, Xu Chen, Lei Xu, Isabel Guan, Taiping Yao, and Shouhong Ding. Dual data alignment makes AI-generated image detector easier generalizable. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URLhttps://openreview.net/forum?id=C39ShJwtD5

work page 2026
[9]

Stargan: Unified generative adversarial networks for multi-domain image-to-image translation

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8789–8797, 2018

work page 2018
[10]

Fire: Robust detection of diffusion-generated images via frequency-guided reconstruction error

Beilin Chu, Xuan Xu, Xin Wang, Yufei Zhang, Weike You, and Linna Zhou. Fire: Robust detection of diffusion-generated images via frequency-guided reconstruction error. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12830–12839, 2025

work page 2025
[11]

Cooley and John W

James W. Cooley and John W. Tukey. An algorithm for the machine calculation of complex fourier series. Mathematics of Computation, 19(90):297–301, 1965. doi: 10.1090/S0025-5718-1965-0178586-1

work page doi:10.1090/s0025-5718-1965-0178586-1 1965
[12]

Raising the bar of ai-generated image detection with clip

Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, and Luisa Verdoliva. Raising the bar of ai-generated image detection with clip. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 4356–4366, June 2024

work page 2024
[13]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first Interna- tional Conference on Machine Learning, 2024

work page 2024
[14]

David J. Field. Relations between the statistics of natural images and the response properties of cortical cells.J. Opt. Soc. Am. A, 4(12):2379–2394, Dec 1987. doi: 10.1364/JOSAA.4.002379. URL https: //opg.optica.org/josaa/abstract.cfm?URI=josaa-4-12-2379

work page doi:10.1364/josaa.4.002379 1987
[15]

Leveraging frequency analysis for deep fake image recognition

Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. Leveraging frequency analysis for deep fake image recognition. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 3247–3258. PMLR, 13–1...

work page 2020
[16]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, volume 27, 2014. 10

work page 2014
[17]

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. InProceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015

work page 2015
[18]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020

work page 2020
[19]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations, 2022. URLhttps://openreview.net/forum?id=nZeVKeeFYf9

work page 2022
[20]

Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios

ITU-R. Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios. Recommendation ITU-R BT.601, 2011. Formerly CCIR Recommendation 601

work page 2011
[21]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019

work page 2019
[22]

Auto-Encoding Variational Bayes

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In2nd International Conference on Learning Representations (ICLR), 2014. URLhttp://arxiv.org/abs/1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2014
[23]

Flux.https://github.com/black-forest-labs/flux, 2024

Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024

work page 2024
[24]

Improving synthetic image detection towards generalization: An image transformation perspective

Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Fuli Feng. Improving synthetic image detection towards generalization: An image transformation perspective. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .1, KDD ’25, page 2405–2414. Association for Computing Machinery, 2025. ISBN 9798400712456

work page 2025
[25]

Masksim: Detection of synthetic images by masked spectrum similarity analysis

Yanhao Li, Quentin Bammey, Marina Gardella, Tina Nikoukhah, Jean-Michel Morel, Miguel Colom, and Rafael Grompone V on Gioi. Masksim: Detection of synthetic images by masked spectrum similarity analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 3855–3865, June 2024

work page 2024
[26]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing

work page 2014
[27]

Forgery- aware adaptive transformer for generalizable synthetic image detection

Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Jingdong Wang, and Yao Zhao. Forgery- aware adaptive transformer for generalizable synthetic image detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10770–10780, June 2024

work page 2024
[28]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=Bkg6RiCqY7

work page 2019
[29]

Lare^2: Latent reconstruction error based method for diffusion-generated image detection

Yunpeng Luo, Junlong Du, Ke Yan, and Shouhong Ding. Lare^2: Latent reconstruction error based method for diffusion-generated image detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17006–17015, 2024

work page 2024
[30]

Midjourney.https://www.midjourney.com/home

Midjourney, Inc. Midjourney.https://www.midjourney.com/home

work page
[31]

Towards universal fake image detectors that generalize across generative models

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards universal fake image detectors that generalize across generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24480–24489, 2023

work page 2023
[32]

Sdxl: Improving latent diffusion models for high-resolution image synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis. In B. Kim, Y . Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y . Sun, editors,International Conference on Learning Representations, volume 2024, pages 1862–1874...

work page 2024
[33]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of ...

work page 2021
[34]

Aligned datasets improve detection of latent diffusion-generated images

Anirudh Sundara Rajan, Utkarsh Ojha, Jedidiah Schloesser, and Yong Jae Lee. Aligned datasets improve detection of latent diffusion-generated images. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=doBkiqESYq

work page 2025
[35]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

work page 2022
[36]

Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Natural image statistics and neural representation.Annual review of neuroscience, 24(1):1193–1216, 2001

Eero P Simoncelli and Bruno A Olshausen. Natural image statistics and neural representation.Annual review of neuroscience, 24(1):1193–1216, 2001

work page 2001
[38]

Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning.Proceedings of the AAAI Conference on Artificial Intelligence, 38(5):5052–5060, Mar. 2024

work page 2024
[39]

Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection

Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28130–28139, June 2024

work page 2024
[40]

C2p-clip: Injecting category common prompt in clip to enhance generalization in deepfake detection

Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, and Yunchao Wei. C2p-clip: Injecting category common prompt in clip to enhance generalization in deepfake detection. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7):7184–7192, Apr. 2025. doi: 10.1609/ aaai.v39i7.32772

work page 2025
[41]

Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. Cnn-generated images are surprisingly easy to spot... for now. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

work page 2020
[42]

A sanity check for AI-generated image detection

Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Weidi Xie. A sanity check for AI-generated image detection. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[43]

Detecting and simulating artifacts in gan fake images

Xu Zhang, Svebor Karaman, and Shih-Fu Chang. Detecting and simulating artifacts in gan fake images. In 2019 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6, 2019

work page 2019
[44]

Patchcraft: Exploring texture patch for efficient ai-generated image detection

Nan Zhong, Yiran Xu, Zhenxing Qian, and Xinpeng Zhang. Patchcraft: Exploring texture patch for efficient ai-generated image detection.arXiv preprint arXiv:2311.12397, 2023

work page arXiv 2023
[45]

Unpaired image-to-image translation using cycle-consistent adversarial networks

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. InProceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017

work page 2017
[46]

Genimage: A million-scale benchmark for detecting ai-generated image

Mingjian Zhu, Hanting Chen, Qiangyu YAN, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million-scale benchmark for detecting ai-generated image. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 77771–77782. Curra...

work page 2023
[47]

Normalized curves show spectra on ρ∈[0.7,1] .Right: tail uplift∆ log 10 P, the rise from the tail’s minimum toρ= 1

architecture fixed and compare trained weights with random-initialized weights using pink noise (left) and real images (middle) as inputs. Normalized curves show spectra on ρ∈[0.7,1] .Right: tail uplift∆ log 10 P, the rise from the tail’s minimum toρ= 1. A.1 Spectral Tail Uplift under JPEG Compression Due to the loss of high-frequency information caused b...

work page

[1] [1]

Ahmed, T

N. Ahmed, T. Natarajan, and K.R. Rao. Discrete cosine transform.IEEE Transactions on Computers, C-23(1):90–93, 1974. doi: 10.1109/T-C.1974.223784

work page doi:10.1109/t-c.1974.223784 1974

[2] [2]

Synthbuster: Towards detection of diffusion model generated images.IEEE Open Journal of Signal Processing, 5:1–9, 2024

Quentin Bammey. Synthbuster: Towards detection of diffusion model generated images.IEEE Open Journal of Signal Processing, 5:1–9, 2024. doi: 10.1109/OJSP.2023.3337714

work page doi:10.1109/ojsp.2023.3337714 2024

[3] [3]

Flux.1 [dev]

Black Forest Labs. Flux.1 [dev]. Hugging Face model card, 2024. URL https://huggingface.co/ black-forest-labs/FLUX.1-dev. Model card, accessed 2026-03-30

work page 2024

[4] [4]

Large scale GAN training for high fidelity natural image synthesis

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. InInternational Conference on Learning Representations, 2019. URL https:// openreview.net/forum?id=B1xsqj09Fm

work page 2019

[5] [5]

Real-time deepfake detection in the real world,

Bar Cavia, Eliahu Horwitz, Tal Reiss, and Yedid Hoshen. Real-time deepfake detection in the real world,

work page

[6] [6]

URLhttps://openreview.net/forum?id=kkE7jlqKae

work page

[7] [7]

DRCT: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images

Baoying Chen, Jishen Zeng, Jianquan Yang, and Rui Yang. DRCT: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 7621–7639. PMLR, 21–27 Jul 2024

work page 2024

[8] [8]

Dual data alignment makes AI-generated image detector easier generalizable

Ruoxin Chen, Junwei Xi, Zhiyuan Yan, Ke-Yue Zhang, Shuang Wu, Jingyi Xie, Xu Chen, Lei Xu, Isabel Guan, Taiping Yao, and Shouhong Ding. Dual data alignment makes AI-generated image detector easier generalizable. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URLhttps://openreview.net/forum?id=C39ShJwtD5

work page 2026

[9] [9]

Stargan: Unified generative adversarial networks for multi-domain image-to-image translation

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8789–8797, 2018

work page 2018

[10] [10]

Fire: Robust detection of diffusion-generated images via frequency-guided reconstruction error

Beilin Chu, Xuan Xu, Xin Wang, Yufei Zhang, Weike You, and Linna Zhou. Fire: Robust detection of diffusion-generated images via frequency-guided reconstruction error. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12830–12839, 2025

work page 2025

[11] [11]

Cooley and John W

James W. Cooley and John W. Tukey. An algorithm for the machine calculation of complex fourier series. Mathematics of Computation, 19(90):297–301, 1965. doi: 10.1090/S0025-5718-1965-0178586-1

work page doi:10.1090/s0025-5718-1965-0178586-1 1965

[12] [12]

Raising the bar of ai-generated image detection with clip

Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, and Luisa Verdoliva. Raising the bar of ai-generated image detection with clip. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 4356–4366, June 2024

work page 2024

[13] [13]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first Interna- tional Conference on Machine Learning, 2024

work page 2024

[14] [14]

David J. Field. Relations between the statistics of natural images and the response properties of cortical cells.J. Opt. Soc. Am. A, 4(12):2379–2394, Dec 1987. doi: 10.1364/JOSAA.4.002379. URL https: //opg.optica.org/josaa/abstract.cfm?URI=josaa-4-12-2379

work page doi:10.1364/josaa.4.002379 1987

[15] [15]

Leveraging frequency analysis for deep fake image recognition

Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. Leveraging frequency analysis for deep fake image recognition. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 3247–3258. PMLR, 13–1...

work page 2020

[16] [16]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, volume 27, 2014. 10

work page 2014

[17] [17]

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. InProceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015

work page 2015

[18] [18]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020

work page 2020

[19] [19]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations, 2022. URLhttps://openreview.net/forum?id=nZeVKeeFYf9

work page 2022

[20] [20]

Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios

ITU-R. Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios. Recommendation ITU-R BT.601, 2011. Formerly CCIR Recommendation 601

work page 2011

[21] [21]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019

work page 2019

[22] [22]

Auto-Encoding Variational Bayes

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In2nd International Conference on Learning Representations (ICLR), 2014. URLhttp://arxiv.org/abs/1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2014

[23] [23]

Flux.https://github.com/black-forest-labs/flux, 2024

Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024

work page 2024

[24] [24]

Improving synthetic image detection towards generalization: An image transformation perspective

Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Fuli Feng. Improving synthetic image detection towards generalization: An image transformation perspective. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .1, KDD ’25, page 2405–2414. Association for Computing Machinery, 2025. ISBN 9798400712456

work page 2025

[25] [25]

Masksim: Detection of synthetic images by masked spectrum similarity analysis

Yanhao Li, Quentin Bammey, Marina Gardella, Tina Nikoukhah, Jean-Michel Morel, Miguel Colom, and Rafael Grompone V on Gioi. Masksim: Detection of synthetic images by masked spectrum similarity analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 3855–3865, June 2024

work page 2024

[26] [26]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing

work page 2014

[27] [27]

Forgery- aware adaptive transformer for generalizable synthetic image detection

Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Jingdong Wang, and Yao Zhao. Forgery- aware adaptive transformer for generalizable synthetic image detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10770–10780, June 2024

work page 2024

[28] [28]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=Bkg6RiCqY7

work page 2019

[29] [29]

Lare^2: Latent reconstruction error based method for diffusion-generated image detection

Yunpeng Luo, Junlong Du, Ke Yan, and Shouhong Ding. Lare^2: Latent reconstruction error based method for diffusion-generated image detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17006–17015, 2024

work page 2024

[30] [30]

Midjourney.https://www.midjourney.com/home

Midjourney, Inc. Midjourney.https://www.midjourney.com/home

work page

[31] [31]

Towards universal fake image detectors that generalize across generative models

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards universal fake image detectors that generalize across generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24480–24489, 2023

work page 2023

[32] [32]

Sdxl: Improving latent diffusion models for high-resolution image synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis. In B. Kim, Y . Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y . Sun, editors,International Conference on Learning Representations, volume 2024, pages 1862–1874...

work page 2024

[33] [33]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of ...

work page 2021

[34] [34]

Aligned datasets improve detection of latent diffusion-generated images

Anirudh Sundara Rajan, Utkarsh Ojha, Jedidiah Schloesser, and Yong Jae Lee. Aligned datasets improve detection of latent diffusion-generated images. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=doBkiqESYq

work page 2025

[35] [35]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

work page 2022

[36] [36]

Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

Natural image statistics and neural representation.Annual review of neuroscience, 24(1):1193–1216, 2001

Eero P Simoncelli and Bruno A Olshausen. Natural image statistics and neural representation.Annual review of neuroscience, 24(1):1193–1216, 2001

work page 2001

[38] [38]

Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning.Proceedings of the AAAI Conference on Artificial Intelligence, 38(5):5052–5060, Mar. 2024

work page 2024

[39] [39]

Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection

Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28130–28139, June 2024

work page 2024

[40] [40]

C2p-clip: Injecting category common prompt in clip to enhance generalization in deepfake detection

Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, and Yunchao Wei. C2p-clip: Injecting category common prompt in clip to enhance generalization in deepfake detection. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7):7184–7192, Apr. 2025. doi: 10.1609/ aaai.v39i7.32772

work page 2025

[41] [41]

Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. Cnn-generated images are surprisingly easy to spot... for now. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

work page 2020

[42] [42]

A sanity check for AI-generated image detection

Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Weidi Xie. A sanity check for AI-generated image detection. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[43] [43]

Detecting and simulating artifacts in gan fake images

Xu Zhang, Svebor Karaman, and Shih-Fu Chang. Detecting and simulating artifacts in gan fake images. In 2019 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6, 2019

work page 2019

[44] [44]

Patchcraft: Exploring texture patch for efficient ai-generated image detection

Nan Zhong, Yiran Xu, Zhenxing Qian, and Xinpeng Zhang. Patchcraft: Exploring texture patch for efficient ai-generated image detection.arXiv preprint arXiv:2311.12397, 2023

work page arXiv 2023

[45] [45]

Unpaired image-to-image translation using cycle-consistent adversarial networks

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. InProceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017

work page 2017

[46] [46]

Genimage: A million-scale benchmark for detecting ai-generated image

Mingjian Zhu, Hanting Chen, Qiangyu YAN, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million-scale benchmark for detecting ai-generated image. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 77771–77782. Curra...

work page 2023

[47] [47]

Normalized curves show spectra on ρ∈[0.7,1] .Right: tail uplift∆ log 10 P, the rise from the tail’s minimum toρ= 1

architecture fixed and compare trained weights with random-initialized weights using pink noise (left) and real images (middle) as inputs. Normalized curves show spectra on ρ∈[0.7,1] .Right: tail uplift∆ log 10 P, the rise from the tail’s minimum toρ= 1. A.1 Spectral Tail Uplift under JPEG Compression Due to the loss of high-frequency information caused b...

work page