pith. machine review for the scientific record. sign in

arxiv: 2604.14570 · v1 · submitted 2026-04-16 · 💻 cs.CV

Recognition: unknown

Deepfake Detection Generalization with Diffusion Noise

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:19 UTC · model grok-4.3

classification 💻 cs.CV
keywords deepfake detectiondiffusion modelsgeneralizationnoise predictionattention mechanismimage forgerysynthetic media detectioncross-domain detection
0
0 comments X

The pith

A frozen diffusion model teaches deepfake detectors to predict noise, exposing generalizable artifacts across forgery types.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Deepfake detectors trained on one type of forgery often fail on new synthesis methods such as diffusion models. The paper introduces the Attention-guided Noise Learning framework that inserts a pre-trained diffusion model into the training loop. The detector must predict the noise that the diffusion process would add at a chosen step, and an attention map derived from that prediction steers the model toward globally distributed discrepancies instead of localized patterns. By freezing the diffusion model, its learned distribution of natural images acts as a regularizer that pushes the detector toward features that transfer to unseen generators. Experiments show large gains in accuracy and average precision on cross-forgery benchmarks while adding no cost at inference time.

Core claim

Training a detector to predict the noise added by a frozen diffusion model at a given timestep, combined with an attention map extracted from the predicted noise, forces the network to capture discrepancies that generalize beyond the training forgery distribution and yields state-of-the-art detection accuracy on diffusion-generated images.

What carries the argument

Attention-guided Noise Learning (ANL): the detector is trained to regress the noise residual produced by a frozen diffusion model at a fixed timestep, and an attention map computed from that residual encourages focus on globally distributed rather than local artifacts.

If this is right

  • Detectors achieve higher accuracy on diffusion-generated deepfakes without retraining the diffusion model itself.
  • Generalization improves across multiple public benchmarks while inference cost stays identical to a baseline detector.
  • The regularization effect arises solely from the frozen diffusion distribution rather than from additional learnable parameters.
  • The same noise-prediction objective can be applied at different diffusion timesteps to tune the sensitivity of the detector.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on video deepfakes by extending the noise prediction to temporal diffusion models.
  • Combining ANL with other frozen generative priors, such as autoregressive models, might further enlarge the set of detectable forgery families.
  • Because the method requires no extra inference compute, it could be deployed as a drop-in upgrade for existing detector pipelines.

Load-bearing premise

The noise residuals predicted by the diffusion model expose discrepancies that remain consistent across GAN and diffusion forgeries, and the attention derived from those residuals reliably selects global rather than local cues.

What would settle it

A controlled experiment in which ANL-trained detectors are evaluated on a held-out diffusion generator and show no improvement in accuracy or AP over standard detectors trained only on GAN data would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.14570 by Hehe Fan, Hongyuan Qi, Jun Xiao, Wenjin Hou.

Figure 1
Figure 1. Figure 1: Power Spectral Density (PSD) analysis of noise do [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline of Attention-guided Noise Learning (ANL). ANL implements noise estimation using the diffusion noise [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Evaluation comparison under standard, cross-dataset, and cross-model scenarios. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Cross-model evaluation results (ACC) on DiffFace. Labels on the left indicate the generative models used for training [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of the timestep 𝑡 on cross-model generaliza￾tion performance evaluated on DiFF. 4.5 Standard Evaluation In this work, we also evaluate ANL under the standard setting, i.e., training and testing on the same dataset. We further test on a diverse dataset, DiffusionForensics [62]. From the results in [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of predicted noise and corresponding [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Cross-model evaluation results (ACC and AP) on DiffFace. Vertical comparison of results. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Cross-model evaluation results (ACC and AP) on DiFF. Vertical comparison of results. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

Deepfake detectors face growing challenges in generalization as new image synthesis techniques emerge. In particular, deepfakes generated by diffusion models are highly photorealistic and often evade detectors trained on GAN-based forgeries. This paper addresses the generalization problem in deepfake detection by leveraging diffusion noise characteristics. We propose an Attention-guided Noise Learning (ANL) framework that integrates a pre-trained diffusion model into the deepfake detection pipeline to guide the learning of more robust features. Specifically, our method uses the diffusion model's denoising process to expose subtle artifacts: the detector is trained to predict the noise contained in an input image at a given diffusion step, forcing it to capture discrepancies between real and synthetic images, while an attention-guided mechanism derived from the predicted noise is introduced to encourage the model to focus on globally distributed discrepancies rather than local patterns. By harnessing the frozen diffusion model's learned distribution of natural images, the ANL method acts as a form of regularization, improving the detector's generalization to unseen forgery types. Extensive experiments demonstrate that ANL significantly outperforms existing methods on multiple benchmarks, achieving state-of-the-art accuracy in detecting diffusion-generated deepfakes. Notably, the proposed framework boosts generalization performance (e.g., improving ACC/AP by a substantial margin on unseen models) without introducing additional overhead during inference. Our results highlight that diffusion noise provides a powerful signal for generalizable deepfake detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an Attention-guided Noise Learning (ANL) framework for deepfake detection generalization. It integrates a frozen pre-trained diffusion model by training the detector to predict noise at a chosen diffusion timestep on input images; the resulting noise residual is used to derive an attention map that steers the detector toward globally distributed discrepancies rather than local artifacts. The approach is presented as a regularization technique that leverages the diffusion model's learned natural-image distribution to improve robustness to unseen forgery generators (including diffusion-based ones), with reported SOTA accuracy on multiple benchmarks and no added inference cost.

Significance. If the empirical claims hold, the work offers a practical regularization strategy that repurposes a generative prior for discriminative robustness, addressing a core challenge in media forensics as synthesis methods diversify. The zero-inference-overhead property is a clear practical advantage.

major comments (2)
  1. [method description and experimental analysis] The central generalization claim rests on the assumption that noise residuals from the fixed diffusion model encode forgery signals that are consistent across GAN, diffusion, and other unseen generators rather than reflecting proximity to that specific model's manifold. No ablation varies the diffusion backbone or tests mismatched priors, leaving the regularization mechanism's generality unverified (see the method description and experimental analysis sections).
  2. [Abstract and results] The abstract and results claim substantial ACC/AP gains on unseen models, yet the provided text supplies no equations for the noise-prediction loss, no training hyperparameters, no data-split details, no baseline implementations, and no error bars or statistical tests. This prevents assessment of whether the reported improvements are robust or sensitive to post-hoc choices.
minor comments (2)
  1. [method] Notation for the diffusion timestep and the attention map derivation is introduced without a clear equation or diagram, making the pipeline hard to reproduce from the text alone.
  2. [figures and tables] Figure captions and table headers could more explicitly state the exact forgery generators used in each cross-model split.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions we will make to improve the manuscript.

read point-by-point responses
  1. Referee: The central generalization claim rests on the assumption that noise residuals from the fixed diffusion model encode forgery signals that are consistent across GAN, diffusion, and other unseen generators rather than reflecting proximity to that specific model's manifold. No ablation varies the diffusion backbone or tests mismatched priors, leaving the regularization mechanism's generality unverified (see the method description and experimental analysis sections).

    Authors: We agree that the generality of the regularization would be more convincingly demonstrated with additional ablations. The pre-trained diffusion model is trained exclusively on natural images, so its noise prediction is designed to surface deviations from the natural-image distribution rather than any specific generative manifold; this is consistent with the strong results we observe on both GAN- and diffusion-based unseen forgeries. Nevertheless, to directly address the concern we will add a new ablation subsection that evaluates an alternative diffusion backbone and briefly discusses the rationale for the chosen prior in the method section. revision: yes

  2. Referee: The abstract and results claim substantial ACC/AP gains on unseen models, yet the provided text supplies no equations for the noise-prediction loss, no training hyperparameters, no data-split details, no baseline implementations, and no error bars or statistical tests. This prevents assessment of whether the reported improvements are robust or sensitive to post-hoc choices.

    Authors: We acknowledge that the current presentation lacks sufficient detail for full reproducibility assessment. In the revised manuscript we will (1) include the noise-prediction loss equation in the abstract, (2) add a dedicated table listing all training hyperparameters, data splits, and baseline implementations, and (3) report error bars from multiple random seeds together with statistical significance tests in the results tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity; external pre-trained model and empirical validation keep derivation self-contained

full rationale

The ANL framework trains a detector to predict noise from a frozen external diffusion model at a chosen step, then derives an attention map from that prediction to regularize toward global artifacts. This training objective and the claimed generalization improvement are not equivalent to the inputs by construction: the diffusion prior is independently pre-trained on natural images, the regularization hypothesis is tested via cross-forgery benchmarks, and no equations reduce the final performance claim to a fitted parameter or self-citation. No self-definitional, fitted-input-as-prediction, or load-bearing self-citation patterns appear. The result is therefore an empirical claim supported by external components rather than a closed loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities. The framework implicitly assumes the pre-trained diffusion model encodes a useful natural-image distribution and that noise prediction reveals generalizable artifacts, but these are not formalized or evidenced here.

pith-pipeline@v0.9.0 · 5542 in / 1197 out tokens · 29687 ms · 2026-05-10T11:19:08.501745+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

82 extracted references · 55 canonical work pages · 9 internal anchors

  1. [1]

    [n. d.]. improved-diffusion. https://github.com/openai/improved-diffusion. 2025. 12,5

  2. [2]

    Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. 2022. Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models. arXiv:2201.06503 [cs.LG] https://arxiv.org/abs/2201.06503

  3. [3]

    2022.Deepfakes and synthetic media in the financial system: Assessing threat scenarios

    Jon Bateman. 2022.Deepfakes and synthetic media in the financial system: Assessing threat scenarios. Carnegie Endowment for International Peace

  4. [4]

    Chaitali Bhattacharyya, Hanxiao Wang, Feng Zhang, Sungho Kim, and Xiatian Zhu. 2024. Diffusion Deepfake. arXiv:2404.01579 [cs.CV] https://arxiv.org/abs/ 2404.01579

  5. [5]

    Zhongxi Chen, Ke Sun, Ziyin Zhou, Xianming Lin, Xiaoshuai Sun, Liujuan Cao, and Rongrong Ji. 2024. DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis.arXiv preprint arXiv:2403.18471(2024)

  6. [6]

    Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, and Mohan Kankanhalli

  7. [7]

    arXiv:2401.15859 [cs.CV]

    Diffusion Facial Forgery Detection. arXiv:2401.15859 [cs.CV]

  8. [8]

    Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, and Jongwon Choi. 2024. Exploiting Style Latent Flows for Generalizing Deepfake Video Detection. arXiv:2403.06592 [cs.CV] https://arxiv.org/abs/2403.06592

  9. [9]

    François Chollet. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv:1610.02357 [cs.CV] https://arxiv.org/abs/1610.02357

  10. [10]

    Hyungjin Chung and Jong Chul Ye. 2022. Score-based diffusion models for accelerated MRI. arXiv:2110.05243 [eess.IV] https://arxiv.org/abs/2110.05243

  11. [11]

    Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2022. On the detection of synthetic images generated by diffusion models. arXiv:2211.00680 [cs.CV] https://arxiv.org/abs/2211.00680

  12. [12]

    Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2023. On The Detection of Synthetic Images Generated by Diffusion Models. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–5. doi:10.1109/ICASSP49357.2023.10095167

  13. [13]

    Prafulla Dhariwal and Alex Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. arXiv:2105.05233 [cs.LG] https://arxiv.org/abs/2105.05233

  14. [14]

    Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. 2020. Leveraging Frequency Analysis for Deep Fake Image Recognition. arXiv:2003.08685 [cs.CV] https://arxiv.org/abs/2003.08685

  15. [15]

    guided diffusion. [n. d.]. guided-diffusion. https://github.com/openai/guided- diffusion. 2025. 12,5

  16. [16]

    Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. 2023. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

  17. [17]

    Gourav Gupta, Kiran Raja, Manish Gupta, Tony Jan, Scott Thompson Whiteside, and Mukesh Prasad. 2024. A Comprehensive Review of DeepFake Detection Using Advanced Machine Learning and Fusion Methods.Electronics (Switzerland) 13, 1 (Jan. 2024). doi:10.3390/electronics13010095 Publisher Copyright:©2023 by the authors

  18. [18]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV] https://arxiv.org/abs/ 1512.03385

  19. [19]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. InProceedings of the 34th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA

  20. [20]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InInternational Conference on Learning Representations. https: //openreview.net/forum?id=nZeVKeeFYf9

  21. [21]

    Chan, Yuming Jiang, and Ziwei Liu

    Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, and Ziwei Liu. 2023. Collaborative Diffusion for Multi-Modal Face Generation and Editing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  22. [22]

    Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. arXiv:1812.04948 [cs.NE] https://arxiv.org/abs/1812.04948

  23. [23]

    Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2023. Imagic: Text-Based Real Image Editing with Diffusion Models. arXiv:2210.09276 [cs.CV] https://arxiv.org/abs/2210.09276

  24. [24]

    K Kim, Y Kim, S Cho, J Seo, J Nam, K Lee, S Kim, and K Lee. 2022. Diffface: Diffusion-based face swapping with facial guidance. arXiv 2022.arXiv preprint arXiv:2212.133441, 2 (2022), 3

  25. [25]

    Minchul Kim, Feng Liu, Anil Jain, and Xiaoming Liu. 2023. DCFace: Synthetic Face Generation with Dual Condition Diffusion Model. arXiv:2304.07060 [cs.CV] https://arxiv.org/abs/2304.07060

  26. [26]

    Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face X-ray for More General Face Forgery Detection. arXiv:1912.13458 [cs.CV] https://arxiv.org/abs/1912.13458

  27. [27]

    Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, and Shu Hu. 2024. Detecting Multimedia Generated by Large AI Models: A Survey. arXiv:2402.00045 [cs.MM] https://arxiv.org/abs/ 2402.00045

  28. [28]

    Li Lin, Xinan He, Yan Ju, Xin Wang, Feng Ding, and Shu Hu. 2024. Preserving Fairness Generalization in Deepfake Detection. arXiv:2402.17229 [cs.CV] https: //arxiv.org/abs/2402.17229

  29. [29]

    Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. 2022. Pseudo Numerical Meth- ods for Diffusion Models on Manifolds. InInternational Conference on Learning Representations. https://openreview.net/forum?id=PlKWVd2yBkY

  30. [30]

    Ping Liu, Qiqi Tao, and Joey Tianyi Zhou. 2025. Evolving from Single- modal to Multi-modal Facial Deepfake Detection: Progress and Challenges. arXiv:2406.06965 [cs.CV] https://arxiv.org/abs/2406.06965

  31. [31]

    Peter Lorenz, Ricard Durall, and Janis Keuper. 2023. Detecting Images Gen- erated by Deep Diffusion Models using their Local Intrinsic Dimensionality. arXiv:2307.02347 [cs.CV] https://arxiv.org/abs/2307.02347

  32. [32]

    Durall, and Janis Keuper

    Peter Lorenz, Ricard L. Durall, and Janis Keuper. 2023. Detecting Images Gen- erated by Deep Diffusion Models using their Local Intrinsic Dimensionality. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE Computer Society, 448–459

  33. [33]

    Yunpeng Luo, Junlong Du, Ke Yan, and Shouhong Ding. 2024. LaRE: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection. arXiv:2403.17465 [cs.CV]

  34. [34]

    Ruipeng Ma, Jinhao Duan, Fei Kong, Xiaoshuang Shi, and Kaidi Xu

  35. [35]

    arXiv preprint arXiv:2307.06272 , year=

    Exposing the Fake: Effective Diffusion-Generated Images Detection. arXiv:2307.06272 [cs.CV]

  36. [36]

    Abdullahi, and Ahmad Neyaz Khan

    Asad Malik, Minoru Kuribayashi, Sani M. Abdullahi, and Ahmad Neyaz Khan

  37. [37]

    doi:10.1109/ACCESS.2022.3151186

    DeepFake Detection for Human Face Images and Videos: A Survey.IEEE Access10 (2022), 18757–18775. doi:10.1109/ACCESS.2022.3151186

  38. [38]

    Scott McCloskey and Michael Albright. 2018. Detecting GAN-generated Imagery using Color Cues. arXiv:1812.08247 [cs.CV] https://arxiv.org/abs/1812.08247

  39. [39]

    Midjourney. [n. d.]. Midjourney. https://www.midjourney.com/. 2025. 12,5

  40. [40]

    Bappy, Amit K

    Lakshmanan Nataraj, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, Arjuna Flenner, Jawadul H. Bappy, Amit K. Roy-Chowdhury, and B. S. Manjunath

  41. [41]

    arXiv preprint arXiv:1903.06836 (2019)

    Detecting GAN generated Fake Images using Co-occurrence Matrices. arXiv:1903.06836 [cs.CV] https://arxiv.org/abs/1903.06836

  42. [42]

    Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, and Djamila Aouada. 2024. LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection. arXiv:2401.13856 [cs.CV] https://arxiv.org/abs/2401.13856

  43. [43]

    Alex Nichol and Prafulla Dhariwal. 2021. Improved Denoising Diffusion Proba- bilistic Models. arXiv:2102.09672 [cs.LG] https://arxiv.org/abs/2102.09672

  44. [44]

    Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. 2024. Towards Universal Fake Image Detectors that Generalize Across Generative Models. arXiv:2302.10174 [cs.CV] https://arxiv.org/abs/2302.10174

  45. [45]

    Lorenzo Papa, Lorenzo Faiella, Luca Corvitto, Luca Maiano, and Irene Amerini

  46. [46]

    In2023 11th International Workshop on Biometrics and Forensics (IWBF)

    On the use of Stable Diffusion for creating realistic faces: from generation to detection. In2023 11th International Workshop on Biometrics and Forensics (IWBF). 1–6. doi:10.1109/IWBF57495.2023.10156981

  47. [47]

    Or Patashnik, Daniel Garibi, Idan Azuri, Hadar Averbuch-Elor, and Daniel Cohen- Or. 2023. Localizing Object-level Shape Variations with Text-to-Image Diffusion Models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

  48. [48]

    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv:2307.01952 [cs.CV] https://arxiv.org/abs/2307.01952

  49. [49]

    DreamFusion: Text-to-3D using 2D Diffusion

    Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. 2022. DreamFusion: Text-to-3D using 2D Diffusion. arXiv:2209.14988 [cs.CV] https://arxiv.org/abs/ 2209.14988

  50. [50]

    Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Think- ing in Frequency: Face Forgery Detection by Mining Frequency-aware Clues. arXiv:2007.09355 [cs.CV] https://arxiv.org/abs/2007.09355

  51. [51]

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen

  52. [52]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125 [cs.CV] https://arxiv.org/abs/2204.06125

  53. [53]

    Jonas Ricker, Simon Damm, Thorsten Holz, and Asja Fischer. 2024. Towards the Detection of Diffusion Model Deepfakes. arXiv:2210.14571 [cs.CV]

  54. [55]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752 [cs.CV] https://arxiv.org/abs/2112.10752

  55. [56]

    Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  56. [57]

    Berg, and Li Fei-Fei

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575 [cs.CV] https://arxiv.org/abs/1409.0575 , , Hongyuan Qi, Feifei Shao, Ming Li, Hehe Fan, Jun Xiao

  57. [58]

    Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics++: Learning to Detect Manipulated Facial Images. arXiv:1901.08971 [cs.CV] https://arxiv.org/abs/1901.08971

  58. [59]

    Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Den- ton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Lan- guage Understanding. arXiv:2205.11487 [cs.CV] https:/...

  59. [60]

    Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman. 2022. Make-A-Video: Text-to-Video Generation without Text-Video Data. arXiv:2209.14792 [cs.CV] https://arxiv.org/abs/2209.14792

  60. [61]

    Weiss, Niru Maheswaranathan, and Surya Ganguli

    Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli

  61. [62]

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics. arXiv:1503.03585 [cs.LG] https://arxiv.org/abs/1503.03585

  62. [63]

    Haixu Song, Shiyu Huang, Yinpeng Dong, and Wei-Wei Tu. 2023. Robustness and Generalizability of Deepfake Detection: A Study with Diffusion Models. arXiv:2309.02218 [cs.CV] https://arxiv.org/abs/2309.02218

  63. [64]

    Jiaming Song, Chenlin Meng, and Stefano Ermon. 2022. Denoising Diffusion Implicit Models. arXiv:2010.02502 [cs.LG]

  64. [65]

    Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, and Rongrong Ji. 2024. DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion. arXiv:2410.04372 [cs.CV] https://arxiv. org/abs/2410.04372

  65. [66]

    Chuangchuang Tan, Huan Liu, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. 2023. Rethinking the Up-Sampling Operations in CNN-based Gen- erative Network for Generalizable Deepfake Detection. arXiv:2312.10461 [cs.CV] https://arxiv.org/abs/2312.10461

  66. [67]

    Andrey Voynov, Qinghao Chu, Daniel Cohen-Or, and Kfir Aberman. 2023. P+: Extended Textual Conditioning in Text-to-Image Generation. (2023)

  67. [68]

    Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. 2020. CNN-generated images are surprisingly easy to spot... for now. arXiv:1912.11035 [cs.CV] https://arxiv.org/abs/1912.11035

  68. [69]

    Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. 2023. DIRE for Diffusion-Generated Image Detection

  69. [70]

    Jun Wei, Shuhui Wang, and Qingming Huang. 2019. F3Net: Fusion, Feedback and Focus for Salient Object Detection. arXiv:1911.11445 [cs.CV] https://arxiv. org/abs/1911.11445

  70. [71]

    Mika Westerlund. 2019. The emergence of deepfake technology: A review.Tech- nology innovation management review9, 11 (2019)

  71. [72]

    Chen Henry Wu and Fernando De la Torre. 2023. A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance. InICCV

  72. [73]

    Haiwei Wu, Jiantao Zhou, and Shile Zhang. 2025. Generalizable Synthetic Image Detection via Language-guided Contrastive Learning. arXiv:2305.13800 [cs.CV] https://arxiv.org/abs/2305.13800

  73. [74]

    Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hongsheng Li. 2023. Better Aligning Text-to-Image Models with Human Preference. arXiv:2303.14420 [cs.CV]

  74. [75]

    Yilun Xu, Ziming Liu, Yonglong Tian, Shangyuan Tong, Max Tegmark, and Tommi Jaakkola. 2023. PFGM++: Unlocking the Potential of Physics-Inspired Generative Models. arXiv:2302.04265 [cs.LG] https://arxiv.org/abs/2302.04265

  75. [76]

    Zhiyuan Yan, Yong Zhang, Yanbo Fan, and Baoyuan Wu. 2023. UCF: Uncovering Common Features for Generalizable Deepfake Detection. arXiv:2304.13949 [cs.CV] https://arxiv.org/abs/2304.13949

  76. [77]

    Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, and Baoyuan Wu. 2023. DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection. InAd- vances in Neural Information Processing Systems, A. Oh, T. Neumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 4534–4565. https://proceedings.neurips.cc/paper_files/...

  77. [78]

    Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, and Jian Zhang. 2023. FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model.Proceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV)(2023)

  78. [79]

    Daichi Zhang, Chenyu Li, Fanzhao Lin, Dan Zeng, and Shiming Ge. 2021. Detect- ing Deepfake Videos with Temporal Dropout 3DCNN.. InIJCAI. 1288–1294

  79. [80]

    Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, and Shiming Ge. 2024. Learning Natural Consistency Representation for Face Forgery Video Detection. arXiv:2407.10550 [cs.CV] https://arxiv.org/abs/2407.10550

  80. [81]

    Wenliang Zhao, Yongming Rao, Weikang Shi, Zuyan Liu, Jie Zhou, and Jiwen Lu. 2023. DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion.CVPR(2023)

Showing first 80 references.