arxiv: 2604.14570 · v1 · submitted 2026-04-16 · 💻 cs.CV

Recognition: unknown

Deepfake Detection Generalization with Diffusion Noise

Hongyuan Qi , Wenjin Hou , Hehe Fan , Jun Xiao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:19 UTC · model grok-4.3

classification 💻 cs.CV

keywords deepfake detectiondiffusion modelsgeneralizationnoise predictionattention mechanismimage forgerysynthetic media detectioncross-domain detection

0 comments

The pith

A frozen diffusion model teaches deepfake detectors to predict noise, exposing generalizable artifacts across forgery types.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Deepfake detectors trained on one type of forgery often fail on new synthesis methods such as diffusion models. The paper introduces the Attention-guided Noise Learning framework that inserts a pre-trained diffusion model into the training loop. The detector must predict the noise that the diffusion process would add at a chosen step, and an attention map derived from that prediction steers the model toward globally distributed discrepancies instead of localized patterns. By freezing the diffusion model, its learned distribution of natural images acts as a regularizer that pushes the detector toward features that transfer to unseen generators. Experiments show large gains in accuracy and average precision on cross-forgery benchmarks while adding no cost at inference time.

Core claim

Training a detector to predict the noise added by a frozen diffusion model at a given timestep, combined with an attention map extracted from the predicted noise, forces the network to capture discrepancies that generalize beyond the training forgery distribution and yields state-of-the-art detection accuracy on diffusion-generated images.

What carries the argument

Attention-guided Noise Learning (ANL): the detector is trained to regress the noise residual produced by a frozen diffusion model at a fixed timestep, and an attention map computed from that residual encourages focus on globally distributed rather than local artifacts.

If this is right

Detectors achieve higher accuracy on diffusion-generated deepfakes without retraining the diffusion model itself.
Generalization improves across multiple public benchmarks while inference cost stays identical to a baseline detector.
The regularization effect arises solely from the frozen diffusion distribution rather than from additional learnable parameters.
The same noise-prediction objective can be applied at different diffusion timesteps to tune the sensitivity of the detector.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on video deepfakes by extending the noise prediction to temporal diffusion models.
Combining ANL with other frozen generative priors, such as autoregressive models, might further enlarge the set of detectable forgery families.
Because the method requires no extra inference compute, it could be deployed as a drop-in upgrade for existing detector pipelines.

Load-bearing premise

The noise residuals predicted by the diffusion model expose discrepancies that remain consistent across GAN and diffusion forgeries, and the attention derived from those residuals reliably selects global rather than local cues.

What would settle it

A controlled experiment in which ANL-trained detectors are evaluated on a held-out diffusion generator and show no improvement in accuracy or AP over standard detectors trained only on GAN data would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.14570 by Hehe Fan, Hongyuan Qi, Jun Xiao, Wenjin Hou.

**Figure 2.** Figure 2: Pipeline of Attention-guided Noise Learning (ANL). ANL implements noise estimation using the diffusion noise [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Evaluation comparison under standard, cross-dataset, and cross-model scenarios. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Cross-model evaluation results (ACC) on DiffFace. Labels on the left indicate the generative models used for training [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Effect of the timestep 𝑡 on cross-model generalization performance evaluated on DiFF. 4.5 Standard Evaluation In this work, we also evaluate ANL under the standard setting, i.e., training and testing on the same dataset. We further test on a diverse dataset, DiffusionForensics [62]. From the results in [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of predicted noise and corresponding [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Cross-model evaluation results (ACC and AP) on DiffFace. Vertical comparison of results. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Cross-model evaluation results (ACC and AP) on DiFF. Vertical comparison of results. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

Deepfake detectors face growing challenges in generalization as new image synthesis techniques emerge. In particular, deepfakes generated by diffusion models are highly photorealistic and often evade detectors trained on GAN-based forgeries. This paper addresses the generalization problem in deepfake detection by leveraging diffusion noise characteristics. We propose an Attention-guided Noise Learning (ANL) framework that integrates a pre-trained diffusion model into the deepfake detection pipeline to guide the learning of more robust features. Specifically, our method uses the diffusion model's denoising process to expose subtle artifacts: the detector is trained to predict the noise contained in an input image at a given diffusion step, forcing it to capture discrepancies between real and synthetic images, while an attention-guided mechanism derived from the predicted noise is introduced to encourage the model to focus on globally distributed discrepancies rather than local patterns. By harnessing the frozen diffusion model's learned distribution of natural images, the ANL method acts as a form of regularization, improving the detector's generalization to unseen forgery types. Extensive experiments demonstrate that ANL significantly outperforms existing methods on multiple benchmarks, achieving state-of-the-art accuracy in detecting diffusion-generated deepfakes. Notably, the proposed framework boosts generalization performance (e.g., improving ACC/AP by a substantial margin on unseen models) without introducing additional overhead during inference. Our results highlight that diffusion noise provides a powerful signal for generalizable deepfake detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper trains a deepfake detector to predict noise from a frozen diffusion model at a fixed step and uses the result for attention guidance, aiming to regularize toward better cross-generator generalization.

read the letter

The main idea is to take a pre-trained diffusion model, keep it frozen, and train the detector to output the noise that model would predict for an input at one chosen timestep. The detector then uses that prediction to build an attention map that steers it toward globally distributed discrepancies instead of local ones. This is presented as a regularization trick that borrows the diffusion model's learned distribution of real images without paying for it at inference time.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an Attention-guided Noise Learning (ANL) framework for deepfake detection generalization. It integrates a frozen pre-trained diffusion model by training the detector to predict noise at a chosen diffusion timestep on input images; the resulting noise residual is used to derive an attention map that steers the detector toward globally distributed discrepancies rather than local artifacts. The approach is presented as a regularization technique that leverages the diffusion model's learned natural-image distribution to improve robustness to unseen forgery generators (including diffusion-based ones), with reported SOTA accuracy on multiple benchmarks and no added inference cost.

Significance. If the empirical claims hold, the work offers a practical regularization strategy that repurposes a generative prior for discriminative robustness, addressing a core challenge in media forensics as synthesis methods diversify. The zero-inference-overhead property is a clear practical advantage.

major comments (2)

[method description and experimental analysis] The central generalization claim rests on the assumption that noise residuals from the fixed diffusion model encode forgery signals that are consistent across GAN, diffusion, and other unseen generators rather than reflecting proximity to that specific model's manifold. No ablation varies the diffusion backbone or tests mismatched priors, leaving the regularization mechanism's generality unverified (see the method description and experimental analysis sections).
[Abstract and results] The abstract and results claim substantial ACC/AP gains on unseen models, yet the provided text supplies no equations for the noise-prediction loss, no training hyperparameters, no data-split details, no baseline implementations, and no error bars or statistical tests. This prevents assessment of whether the reported improvements are robust or sensitive to post-hoc choices.

minor comments (2)

[method] Notation for the diffusion timestep and the attention map derivation is introduced without a clear equation or diagram, making the pipeline hard to reproduce from the text alone.
[figures and tables] Figure captions and table headers could more explicitly state the exact forgery generators used in each cross-model split.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions we will make to improve the manuscript.

read point-by-point responses

Referee: The central generalization claim rests on the assumption that noise residuals from the fixed diffusion model encode forgery signals that are consistent across GAN, diffusion, and other unseen generators rather than reflecting proximity to that specific model's manifold. No ablation varies the diffusion backbone or tests mismatched priors, leaving the regularization mechanism's generality unverified (see the method description and experimental analysis sections).

Authors: We agree that the generality of the regularization would be more convincingly demonstrated with additional ablations. The pre-trained diffusion model is trained exclusively on natural images, so its noise prediction is designed to surface deviations from the natural-image distribution rather than any specific generative manifold; this is consistent with the strong results we observe on both GAN- and diffusion-based unseen forgeries. Nevertheless, to directly address the concern we will add a new ablation subsection that evaluates an alternative diffusion backbone and briefly discusses the rationale for the chosen prior in the method section. revision: yes
Referee: The abstract and results claim substantial ACC/AP gains on unseen models, yet the provided text supplies no equations for the noise-prediction loss, no training hyperparameters, no data-split details, no baseline implementations, and no error bars or statistical tests. This prevents assessment of whether the reported improvements are robust or sensitive to post-hoc choices.

Authors: We acknowledge that the current presentation lacks sufficient detail for full reproducibility assessment. In the revised manuscript we will (1) include the noise-prediction loss equation in the abstract, (2) add a dedicated table listing all training hyperparameters, data splits, and baseline implementations, and (3) report error bars from multiple random seeds together with statistical significance tests in the results tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity; external pre-trained model and empirical validation keep derivation self-contained

full rationale

The ANL framework trains a detector to predict noise from a frozen external diffusion model at a chosen step, then derives an attention map from that prediction to regularize toward global artifacts. This training objective and the claimed generalization improvement are not equivalent to the inputs by construction: the diffusion prior is independently pre-trained on natural images, the regularization hypothesis is tested via cross-forgery benchmarks, and no equations reduce the final performance claim to a fitted parameter or self-citation. No self-definitional, fitted-input-as-prediction, or load-bearing self-citation patterns appear. The result is therefore an empirical claim supported by external components rather than a closed loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities. The framework implicitly assumes the pre-trained diffusion model encodes a useful natural-image distribution and that noise prediction reveals generalizable artifacts, but these are not formalized or evidenced here.

pith-pipeline@v0.9.0 · 5542 in / 1197 out tokens · 29687 ms · 2026-05-10T11:19:08.501745+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

82 extracted references · 55 canonical work pages · 9 internal anchors

[1]

[n. d.]. improved-diffusion. https://github.com/openai/improved-diffusion. 2025. 12,5

2025
[2]

Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. 2022. Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models. arXiv:2201.06503 [cs.LG] https://arxiv.org/abs/2201.06503

work page arXiv 2022
[3]

2022.Deepfakes and synthetic media in the financial system: Assessing threat scenarios

Jon Bateman. 2022.Deepfakes and synthetic media in the financial system: Assessing threat scenarios. Carnegie Endowment for International Peace

2022
[4]

Chaitali Bhattacharyya, Hanxiao Wang, Feng Zhang, Sungho Kim, and Xiatian Zhu. 2024. Diffusion Deepfake. arXiv:2404.01579 [cs.CV] https://arxiv.org/abs/ 2404.01579

work page arXiv 2024
[5]

Zhongxi Chen, Ke Sun, Ziyin Zhou, Xianming Lin, Xiaoshuai Sun, Liujuan Cao, and Rongrong Ji. 2024. DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis.arXiv preprint arXiv:2403.18471(2024)

work page arXiv 2024
[6]

Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, and Mohan Kankanhalli
[7]

arXiv:2401.15859 [cs.CV]

Diffusion Facial Forgery Detection. arXiv:2401.15859 [cs.CV]

work page arXiv
[8]

Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, and Jongwon Choi. 2024. Exploiting Style Latent Flows for Generalizing Deepfake Video Detection. arXiv:2403.06592 [cs.CV] https://arxiv.org/abs/2403.06592

work page arXiv 2024
[9]

François Chollet. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv:1610.02357 [cs.CV] https://arxiv.org/abs/1610.02357

work page arXiv 2017
[10]

Hyungjin Chung and Jong Chul Ye. 2022. Score-based diffusion models for accelerated MRI. arXiv:2110.05243 [eess.IV] https://arxiv.org/abs/2110.05243

work page arXiv 2022
[11]

Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2022. On the detection of synthetic images generated by diffusion models. arXiv:2211.00680 [cs.CV] https://arxiv.org/abs/2211.00680

work page arXiv 2022
[12]

Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2023. On The Detection of Synthetic Images Generated by Diffusion Models. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–5. doi:10.1109/ICASSP49357.2023.10095167

work page doi:10.1109/icassp49357.2023.10095167 2023
[13]

Prafulla Dhariwal and Alex Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. arXiv:2105.05233 [cs.LG] https://arxiv.org/abs/2105.05233

work page internal anchor Pith review arXiv 2021
[14]

Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. 2020. Leveraging Frequency Analysis for Deep Fake Image Recognition. arXiv:2003.08685 [cs.CV] https://arxiv.org/abs/2003.08685

work page arXiv 2020
[15]

guided diffusion. [n. d.]. guided-diffusion. https://github.com/openai/guided- diffusion. 2025. 12,5

2025
[16]

Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. 2023. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

2023
[17]

Gourav Gupta, Kiran Raja, Manish Gupta, Tony Jan, Scott Thompson Whiteside, and Mukesh Prasad. 2024. A Comprehensive Review of DeepFake Detection Using Advanced Machine Learning and Fusion Methods.Electronics (Switzerland) 13, 1 (Jan. 2024). doi:10.3390/electronics13010095 Publisher Copyright:©2023 by the authors

work page doi:10.3390/electronics13010095 2024
[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV] https://arxiv.org/abs/ 1512.03385

work page internal anchor Pith review arXiv 2015
[19]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. InProceedings of the 34th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA

2020
[20]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InInternational Conference on Learning Representations. https: //openreview.net/forum?id=nZeVKeeFYf9

2022
[21]

Chan, Yuming Jiang, and Ziwei Liu

Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, and Ziwei Liu. 2023. Collaborative Diffusion for Multi-Modal Face Generation and Editing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2023
[22]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. arXiv:1812.04948 [cs.NE] https://arxiv.org/abs/1812.04948

work page arXiv 2019
[23]

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2023. Imagic: Text-Based Real Image Editing with Diffusion Models. arXiv:2210.09276 [cs.CV] https://arxiv.org/abs/2210.09276

work page arXiv 2023
[24]

K Kim, Y Kim, S Cho, J Seo, J Nam, K Lee, S Kim, and K Lee. 2022. Diffface: Diffusion-based face swapping with facial guidance. arXiv 2022.arXiv preprint arXiv:2212.133441, 2 (2022), 3

work page arXiv 2022
[25]

Minchul Kim, Feng Liu, Anil Jain, and Xiaoming Liu. 2023. DCFace: Synthetic Face Generation with Dual Condition Diffusion Model. arXiv:2304.07060 [cs.CV] https://arxiv.org/abs/2304.07060

work page arXiv 2023
[26]

Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face X-ray for More General Face Forgery Detection. arXiv:1912.13458 [cs.CV] https://arxiv.org/abs/1912.13458

work page arXiv 2020
[27]

Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, and Shu Hu. 2024. Detecting Multimedia Generated by Large AI Models: A Survey. arXiv:2402.00045 [cs.MM] https://arxiv.org/abs/ 2402.00045

work page arXiv 2024
[28]

Li Lin, Xinan He, Yan Ju, Xin Wang, Feng Ding, and Shu Hu. 2024. Preserving Fairness Generalization in Deepfake Detection. arXiv:2402.17229 [cs.CV] https: //arxiv.org/abs/2402.17229

work page arXiv 2024
[29]

Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. 2022. Pseudo Numerical Meth- ods for Diffusion Models on Manifolds. InInternational Conference on Learning Representations. https://openreview.net/forum?id=PlKWVd2yBkY

2022
[30]

Ping Liu, Qiqi Tao, and Joey Tianyi Zhou. 2025. Evolving from Single- modal to Multi-modal Facial Deepfake Detection: Progress and Challenges. arXiv:2406.06965 [cs.CV] https://arxiv.org/abs/2406.06965

work page arXiv 2025
[31]

Peter Lorenz, Ricard Durall, and Janis Keuper. 2023. Detecting Images Gen- erated by Deep Diffusion Models using their Local Intrinsic Dimensionality. arXiv:2307.02347 [cs.CV] https://arxiv.org/abs/2307.02347

work page arXiv 2023
[32]

Durall, and Janis Keuper

Peter Lorenz, Ricard L. Durall, and Janis Keuper. 2023. Detecting Images Gen- erated by Deep Diffusion Models using their Local Intrinsic Dimensionality. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE Computer Society, 448–459

2023
[33]

Yunpeng Luo, Junlong Du, Ke Yan, and Shouhong Ding. 2024. LaRE: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection. arXiv:2403.17465 [cs.CV]

work page arXiv 2024
[34]

Ruipeng Ma, Jinhao Duan, Fei Kong, Xiaoshuang Shi, and Kaidi Xu
[35]

arXiv preprint arXiv:2307.06272 , year=

Exposing the Fake: Effective Diffusion-Generated Images Detection. arXiv:2307.06272 [cs.CV]

work page arXiv
[36]

Abdullahi, and Ahmad Neyaz Khan

Asad Malik, Minoru Kuribayashi, Sani M. Abdullahi, and Ahmad Neyaz Khan
[37]

doi:10.1109/ACCESS.2022.3151186

DeepFake Detection for Human Face Images and Videos: A Survey.IEEE Access10 (2022), 18757–18775. doi:10.1109/ACCESS.2022.3151186

work page doi:10.1109/access.2022.3151186 2022
[38]

Scott McCloskey and Michael Albright. 2018. Detecting GAN-generated Imagery using Color Cues. arXiv:1812.08247 [cs.CV] https://arxiv.org/abs/1812.08247

work page arXiv 2018
[39]

Midjourney. [n. d.]. Midjourney. https://www.midjourney.com/. 2025. 12,5

2025
[40]

Bappy, Amit K

Lakshmanan Nataraj, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, Arjuna Flenner, Jawadul H. Bappy, Amit K. Roy-Chowdhury, and B. S. Manjunath
[41]

arXiv preprint arXiv:1903.06836 (2019)

Detecting GAN generated Fake Images using Co-occurrence Matrices. arXiv:1903.06836 [cs.CV] https://arxiv.org/abs/1903.06836

work page arXiv 1903
[42]

Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, and Djamila Aouada. 2024. LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection. arXiv:2401.13856 [cs.CV] https://arxiv.org/abs/2401.13856

work page arXiv 2024
[43]

Alex Nichol and Prafulla Dhariwal. 2021. Improved Denoising Diffusion Proba- bilistic Models. arXiv:2102.09672 [cs.LG] https://arxiv.org/abs/2102.09672

work page arXiv 2021
[44]

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. 2024. Towards Universal Fake Image Detectors that Generalize Across Generative Models. arXiv:2302.10174 [cs.CV] https://arxiv.org/abs/2302.10174

work page arXiv 2024
[45]

Lorenzo Papa, Lorenzo Faiella, Luca Corvitto, Luca Maiano, and Irene Amerini
[46]

In2023 11th International Workshop on Biometrics and Forensics (IWBF)

On the use of Stable Diffusion for creating realistic faces: from generation to detection. In2023 11th International Workshop on Biometrics and Forensics (IWBF). 1–6. doi:10.1109/IWBF57495.2023.10156981

work page doi:10.1109/iwbf57495.2023.10156981 2023
[47]

Or Patashnik, Daniel Garibi, Idan Azuri, Hadar Averbuch-Elor, and Daniel Cohen- Or. 2023. Localizing Object-level Shape Variations with Text-to-Image Diffusion Models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

2023
[48]

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv:2307.01952 [cs.CV] https://arxiv.org/abs/2307.01952

work page internal anchor Pith review arXiv 2023
[49]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. 2022. DreamFusion: Text-to-3D using 2D Diffusion. arXiv:2209.14988 [cs.CV] https://arxiv.org/abs/ 2209.14988

work page internal anchor Pith review arXiv 2022
[50]

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Think- ing in Frequency: Face Forgery Detection by Mining Frequency-aware Clues. arXiv:2007.09355 [cs.CV] https://arxiv.org/abs/2007.09355

work page arXiv 2020
[51]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen
[52]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125 [cs.CV] https://arxiv.org/abs/2204.06125

work page internal anchor Pith review arXiv
[53]

Jonas Ricker, Simon Damm, Thorsten Holz, and Asja Fischer. 2024. Towards the Detection of Diffusion Model Deepfakes. arXiv:2210.14571 [cs.CV]

work page arXiv 2024
[55]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752 [cs.CV] https://arxiv.org/abs/2112.10752

work page Pith review arXiv 2022
[56]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2023
[57]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575 [cs.CV] https://arxiv.org/abs/1409.0575 , , Hongyuan Qi, Feifei Shao, Ming Li, Hehe Fan, Jun Xiao

work page arXiv 2015
[58]

Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics++: Learning to Detect Manipulated Facial Images. arXiv:1901.08971 [cs.CV] https://arxiv.org/abs/1901.08971

work page arXiv 2019
[59]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Den- ton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Lan- guage Understanding. arXiv:2205.11487 [cs.CV] https:/...

work page internal anchor Pith review arXiv 2022
[60]

Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman. 2022. Make-A-Video: Text-to-Video Generation without Text-Video Data. arXiv:2209.14792 [cs.CV] https://arxiv.org/abs/2209.14792

work page internal anchor Pith review arXiv 2022
[61]

Weiss, Niru Maheswaranathan, and Surya Ganguli

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli
[62]

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Deep Unsupervised Learning using Nonequilibrium Thermodynamics. arXiv:1503.03585 [cs.LG] https://arxiv.org/abs/1503.03585

work page internal anchor Pith review arXiv
[63]

Haixu Song, Shiyu Huang, Yinpeng Dong, and Wei-Wei Tu. 2023. Robustness and Generalizability of Deepfake Detection: A Study with Diffusion Models. arXiv:2309.02218 [cs.CV] https://arxiv.org/abs/2309.02218

work page arXiv 2023
[64]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2022. Denoising Diffusion Implicit Models. arXiv:2010.02502 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2022
[65]

Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, and Rongrong Ji. 2024. DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion. arXiv:2410.04372 [cs.CV] https://arxiv. org/abs/2410.04372

work page arXiv 2024
[66]

Chuangchuang Tan, Huan Liu, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. 2023. Rethinking the Up-Sampling Operations in CNN-based Gen- erative Network for Generalizable Deepfake Detection. arXiv:2312.10461 [cs.CV] https://arxiv.org/abs/2312.10461

work page arXiv 2023
[67]

Andrey Voynov, Qinghao Chu, Daniel Cohen-Or, and Kfir Aberman. 2023. P+: Extended Textual Conditioning in Text-to-Image Generation. (2023)

2023
[68]

Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. 2020. CNN-generated images are surprisingly easy to spot... for now. arXiv:1912.11035 [cs.CV] https://arxiv.org/abs/1912.11035

work page arXiv 2020
[69]

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. 2023. DIRE for Diffusion-Generated Image Detection

2023
[70]

Jun Wei, Shuhui Wang, and Qingming Huang. 2019. F3Net: Fusion, Feedback and Focus for Salient Object Detection. arXiv:1911.11445 [cs.CV] https://arxiv. org/abs/1911.11445

work page arXiv 2019
[71]

Mika Westerlund. 2019. The emergence of deepfake technology: A review.Tech- nology innovation management review9, 11 (2019)

2019
[72]

Chen Henry Wu and Fernando De la Torre. 2023. A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance. InICCV

2023
[73]

Haiwei Wu, Jiantao Zhou, and Shile Zhang. 2025. Generalizable Synthetic Image Detection via Language-guided Contrastive Learning. arXiv:2305.13800 [cs.CV] https://arxiv.org/abs/2305.13800

work page arXiv 2025
[74]

Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hongsheng Li. 2023. Better Aligning Text-to-Image Models with Human Preference. arXiv:2303.14420 [cs.CV]

work page arXiv 2023
[75]

Yilun Xu, Ziming Liu, Yonglong Tian, Shangyuan Tong, Max Tegmark, and Tommi Jaakkola. 2023. PFGM++: Unlocking the Potential of Physics-Inspired Generative Models. arXiv:2302.04265 [cs.LG] https://arxiv.org/abs/2302.04265

work page arXiv 2023
[76]

Zhiyuan Yan, Yong Zhang, Yanbo Fan, and Baoyuan Wu. 2023. UCF: Uncovering Common Features for Generalizable Deepfake Detection. arXiv:2304.13949 [cs.CV] https://arxiv.org/abs/2304.13949

work page arXiv 2023
[77]

Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, and Baoyuan Wu. 2023. DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection. InAd- vances in Neural Information Processing Systems, A. Oh, T. Neumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 4534–4565. https://proceedings.neurips.cc/paper_files/...

2023
[78]

Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, and Jian Zhang. 2023. FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model.Proceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV)(2023)

2023
[79]

Daichi Zhang, Chenyu Li, Fanzhao Lin, Dan Zeng, and Shiming Ge. 2021. Detect- ing Deepfake Videos with Temporal Dropout 3DCNN.. InIJCAI. 1288–1294

2021
[80]

Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, and Shiming Ge. 2024. Learning Natural Consistency Representation for Face Forgery Video Detection. arXiv:2407.10550 [cs.CV] https://arxiv.org/abs/2407.10550

work page arXiv 2024
[81]

Wenliang Zhao, Yongming Rao, Weikang Shi, Zuyan Liu, Jie Zhou, and Jiwen Lu. 2023. DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion.CVPR(2023)

2023

Showing first 80 references.