pith. machine review for the scientific record. sign in

arxiv: 2604.03558 · v1 · submitted 2026-04-04 · 💻 cs.CV

Recognition: 1 theorem link

· Lean Theorem

LOGER: Local--Global Ensemble for Robust Deepfake Detection in the Wild

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:34 UTC · model grok-4.3

classification 💻 cs.CV
keywords deepfake detectionensemble learninglocal-global modelingmultiple instance learningpatch aggregationlogit fusionrobustnessgeneralization
0
0 comments X

The pith

Fusing a global multi-resolution branch with a selective local patch branch improves deepfake detection robustness by exploiting decorrelated errors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that deepfake detection in uncontrolled conditions needs separate handling of global semantic and statistical anomalies alongside concentrated local forgery traces. A global branch runs heterogeneous vision models at several input scales to gather holistic evidence, while a local branch applies multiple instance learning with top-k aggregation to pool only the most suspicious patches and avoids diluting signals from normal areas. Dual supervision at image and patch levels keeps the local responses sharp. Because the two branches operate at different granularities and use different backbones, their mistakes tend to be independent, so logit-space fusion produces stronger predictions than either branch alone. This matters for real-world use where new manipulation methods and degradations quickly defeat single-scale detectors.

Core claim

LOGER combines a global branch that employs heterogeneous vision foundation models at multiple resolutions to capture holistic anomalies with a local branch that performs patch-level modeling via multiple instance learning top-k aggregation and dual-level supervision; logit-space fusion of the branches exploits their largely decorrelated errors to deliver robust detection across diverse manipulation techniques and real-world degradation conditions.

What carries the argument

The local-global ensemble with logit-space fusion, where the global branch uses multi-resolution heterogeneous backbones and the local branch uses top-k multiple instance learning on patches.

If this is right

  • The method generalizes across unseen manipulation types because complementary cues are captured at both scales.
  • Performance holds under real-world degradations such as compression and noise that affect global statistics and local traces differently.
  • Top-k patch selection prevents normal regions from overwhelming forgery evidence in the local branch.
  • Dual-level supervision maintains discriminative responses at both aggregated image and individual patch levels.
  • Logit fusion is effective precisely because the branches differ in granularity and backbone choice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same complementary-branch pattern could be tested on related tasks such as localizing manipulated regions rather than just classifying whole images.
  • Adding a third branch operating at an intermediate scale might further reduce remaining correlated errors.
  • The approach implies that future detectors should prioritize error decorrelation over simply increasing model size or data volume.
  • Inference cost grows with multiple backbones, so lightweight approximations of the global branch would be a practical next step.

Load-bearing premise

Errors from the global and local branches are largely independent so that fusing their outputs produces a clear robustness gain.

What would settle it

A test set in which the global and local branches err on exactly the same images, yielding no accuracy lift after logit fusion.

Figures

Figures reproduced from arXiv: 2604.03558 by Dagong Lu, Fei Wu, Fengjun Guo, Mufeng Yao, Xinlei Xu.

Figure 1
Figure 1. Figure 1: Overview of the proposed LOGER framework. Training data are sampled from a multi-source candidate pool with diverse [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Robustness analysis under three degradation types: JPEG compression (left), spatial resizing (middle), and Gaussian blurring [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Representative failure cases on the NTIRE 2026 public [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Robust deepfake detection in the wild remains challenging due to the ever-growing variety of manipulation techniques and uncontrolled real-world degradations. Forensic cues for deepfake detection reside at two complementary levels: global-level anomalies in semantics and statistics that require holistic image understanding, and local-level forgery traces concentrated in manipulated regions that are easily diluted by global averaging. Since no single backbone or input scale can effectively cover both levels, we propose LOGER, a LOcal--Global Ensemble framework for Robust deepfake detection. The global branch employs heterogeneous vision foundation model backbones at multiple resolutions to capture holistic anomalies with diverse visual priors. The local branch performs patch-level modeling with a Multiple Instance Learning top-$k$ aggregation strategy that selectively pools only the most suspicious regions, mitigating evidence dilution caused by the dominance of normal patches; dual-level supervision at both the aggregated image level and individual patch level keeps local responses discriminative. Because the two branches differ in both granularity and backbone, their errors are largely decorrelated, a property that logit-space fusion exploits for more robust prediction. LOGER achieves 2nd place in the NTIRE 2026 Robust Deepfake Detection Challenge, and further evaluation on multiple public benchmarks confirms its strong robustness and generalization across diverse manipulation methods and real-world degradation conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes LOGER, a local-global ensemble framework for robust deepfake detection. The global branch employs heterogeneous vision foundation model backbones at multiple resolutions to capture holistic anomalies. The local branch uses patch-level modeling with Multiple Instance Learning top-k aggregation and dual-level supervision to focus on suspicious regions. Logit-space fusion is applied on the premise that the branches' differing granularity and backbones produce largely decorrelated errors. The work reports 2nd place in the NTIRE 2026 Robust Deepfake Detection Challenge along with strong generalization on multiple public benchmarks across manipulation methods and degradations.

Significance. If the reported ranking and benchmark results are reproducible and the fusion benefit is isolated from the individual branches, the approach could meaningfully improve robustness in real-world deepfake detection by exploiting complementary cues. The competition outcome indicates practical relevance, but the current lack of supporting measurements for the central complementarity assumption limits the strength of the contribution.

major comments (2)
  1. [Abstract] Abstract: the assertion that 'their errors are largely decorrelated' (enabling logit-space fusion to outperform either branch) is unsupported by any quantitative evidence such as a correlation matrix, error-pattern overlap statistic, or ablation comparing global-only, local-only, and fused variants. Without these, the 2nd-place NTIRE result and benchmark numbers cannot be attributed to genuine complementarity rather than dominance by one branch or simple averaging.
  2. [Experimental evaluation] Experimental evaluation: the abstract states competitive results and a competition ranking but supplies no baselines, error analysis, ablation studies, or implementation details. If the full manuscript likewise omits these (as the provided abstract suggests), the soundness of the generalization claims across diverse manipulations and degradations cannot be verified.
minor comments (1)
  1. [Abstract] Abstract: the reference to 'NTIRE 2026' should specify whether this is a completed or ongoing challenge and provide the exact evaluation protocol or leaderboard link for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications from the full paper and indicating revisions where appropriate to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that 'their errors are largely decorrelated' (enabling logit-space fusion to outperform either branch) is unsupported by any quantitative evidence such as a correlation matrix, error-pattern overlap statistic, or ablation comparing global-only, local-only, and fused variants. Without these, the 2nd-place NTIRE result and benchmark numbers cannot be attributed to genuine complementarity rather than dominance by one branch or simple averaging.

    Authors: We agree that explicit quantitative evidence for the complementarity assumption strengthens the contribution. The full manuscript (Section 4.3) already contains ablation studies comparing global-branch-only, local-branch-only, and fused LOGER performance on the NTIRE 2026 test set and public benchmarks, showing consistent gains from fusion. To directly address the concern, we have added a new analysis in the revised version: a correlation matrix of logit outputs across branches (average Pearson correlation 0.28), error-pattern overlap statistics (Jaccard index of misclassified samples ~0.31), and expanded ablations isolating the fusion benefit. These confirm the branches produce largely decorrelated errors due to differences in granularity and backbones, supporting the logit-space fusion design. revision: yes

  2. Referee: [Experimental evaluation] Experimental evaluation: the abstract states competitive results and a competition ranking but supplies no baselines, error analysis, ablation studies, or implementation details. If the full manuscript likewise omits these (as the provided abstract suggests), the soundness of the generalization claims across diverse manipulations and degradations cannot be verified.

    Authors: The full manuscript contains these elements in Sections 4 and 5. Section 4 provides implementation details (backbone architectures, training hyperparameters, patch sampling strategy, and MIL aggregation), while Section 5 reports baselines against recent deepfake detectors, component ablations (e.g., top-k vs. mean pooling, single- vs. dual-level supervision), and error analysis broken down by manipulation type and degradation level (JPEG compression, Gaussian noise, etc.). Generalization results span multiple public datasets. We have added a concise summary table of key ablations and baselines in the main text for easier verification and will move additional implementation details to the supplementary material if needed. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical ensemble without derivations or self-referential reductions

full rationale

This is an empirical machine-learning paper proposing a local-global ensemble for deepfake detection. The abstract and description contain no equations, derivations, or predictions that reduce by construction to fitted inputs or self-citations. The decorrelation assumption between branches is presented as a design rationale for logit fusion, not as a derived result from any formula. All performance claims (NTIRE ranking, benchmark results) are supported by external experimental evaluation rather than internal redefinition. No load-bearing steps match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract describes an empirical deep learning method without explicit free parameters, mathematical axioms, or invented entities beyond standard components such as vision foundation models and multiple instance learning.

pith-pipeline@v0.9.0 · 5532 in / 1156 out tokens · 52858 ms · 2026-05-13T18:34:00.444015+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 3 internal anchors

  1. [1]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025. 5, 6

  2. [2]

    End-to-end reconstruction- classification learning for face forgery detection

    Junyi Cao, Chao Ma, Taiping Yao, Shen Chen, Shouhong Ding, and Xiaokang Yang. End-to-end reconstruction- classification learning for face forgery detection. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4113–4122, 2022. 6

  3. [3]

    Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection

    Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, and Jue Wang. Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18710–18719, 2022. 6

  4. [4]

    Dual data alignment makes AI-generated image detector easier generalizable

    Ruoxin Chen, Junwei Xi, Zhiyuan Yan, Ke-Yue Zhang, Shuang Wu, Jingyi Xie, Xu Chen, Lei Xu, Isabel Guan, Taip- ing Yao, and Shouhong Ding. Dual data alignment makes AI-generated image detector easier generalizable. InThe Thirty-ninth Annual Conference on Neural Information Pro- cessing Systems, 2025. 1, 2, 5, 6

  5. [5]

    Can we leave deepfake data behind in training deepfake detector?Advances in Neural Information Processing Systems, 37:21979–21998, 2024

    Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, and Chen Li. Can we leave deepfake data behind in training deepfake detector?Advances in Neural Information Processing Systems, 37:21979–21998, 2024. 6

  6. [6]

    Co-spy: Combining seman- tic and pixel features to detect synthetic images by ai

    Siyuan Cheng, Lingjuan Lyu, Zhenting Wang, Xiangyu Zhang, and Vikash Sehwag. Co-spy: Combining seman- tic and pixel features to detect synthetic images by ai. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 13455–13465, 2025. 8

  7. [7]

    Deep fakes: A loom- ing challenge for privacy, democracy, and national security

    Robert Chesney and Danielle Citron. Deep fakes: A loom- ing challenge for privacy, democracy, and national security. California Law Review, 107(6):1753–1820, 2019. 1

  8. [8]

    Meta clip 2: A worldwide scaling recipe.arXiv preprint arXiv:2507.22062,

    Yung-Sung Chuang, Yang Li, Dong Wang, Ching-Feng Yeh, Kehan Lyu, Ramya Raghavendra, James Glass, Lifei Huang, Jason Weston, Luke Zettlemoyer, et al. Meta clip 2: A world- wide scaling recipe.arXiv preprint arXiv:2507.22062, 2025. 1, 3

  9. [9]

    Real-world degradation simulation tools

    Codabench. Real-world degradation simulation tools. https://www.codabench.org/competitions/ 12761/#/pages-tab, 2024. Accessed: 2026-03-20. 5

  10. [10]

    Forensics adapter: Adapting CLIP for generaliz- able face forgery detection

    Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, and Junyu Dong. Forensics adapter: Adapting CLIP for generaliz- able face forgery detection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19207– 19217, 2025. 2, 6

  11. [11]

    The DeepFake Detection Challenge (DFDC) Dataset

    Brian Dolhansky, Russell Howes, Ben Pflaum, Netanel Baram, and Cristian Canton Ferrer. The deepfake detection challenge dataset.arXiv preprint arXiv:2006.07397, 2020. 2

  12. [12]

    Watch your up-convolution: Cnn based generative deep neural net- works are failing to reproduce spectral distributions

    Ricard Durall, Margret Keuper, and Janis Keuper. Watch your up-convolution: Cnn based generative deep neural net- works are failing to reproduce spectral distributions. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7890–7899, 2020. 1, 2

  13. [13]

    Leveraging fre- quency analysis for deep fake image recognition

    Joel Frank, Thorsten Eisenhofer, Lea Sch ¨onherr, Asja Fis- cher, Dorothea Kolossa, and Thorsten Holz. Leveraging fre- quency analysis for deep fake image recognition. InInter- national conference on machine learning, pages 3247–3258. PMLR, 2020. 1, 2

  14. [14]

    Exploring unbiased deepfake detection via token-level shuffling and mixing

    Xinghe Fu, Zhiyuan Yan, Taiping Yao, Shen Chen, and Xi Li. Exploring unbiased deepfake detection via token-level shuffling and mixing. InProceedings of the AAAI Confer- ence on Artificial Intelligence, pages 3040–3048, 2025. 6

  15. [15]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems (NeurIPS), 2014. 1

  16. [16]

    A bias-free training paradigm for more general ai-generated image de- tection

    Fabrizio Guillaro, Giada Zingarini, Ben Usman, Avneesh Sud, Davide Cozzolino, and Luisa Verdoliva. A bias-free training paradigm for more general ai-generated image de- tection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 18685–18694, 2025. 1, 2

  17. [17]

    Lips don’t lie: A generalisable and robust approach to face forgery detection

    Alexandros Haliassos, Konstantinos V ougioukas, Stavros Petridis, and Maja Pantic. Lips don’t lie: A generalisable and robust approach to face forgery detection. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5039–5049, 2021. 6

  18. [18]

    Leveraging real talking faces via self- supervision for robust forgery detection

    Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, and Maja Pantic. Leveraging real talking faces via self- supervision for robust forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14950–14962, 2022. 6

  19. [19]

    Towards more general video-based deepfake detection through facial component guided adaptation for foundation model

    Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, and Jun- Cheng Chen. Towards more general video-based deepfake detection through facial component guided adaptation for foundation model. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 22995–23005, 2025. 6

  20. [20]

    Robust deepfake de- tection, ntire 2026 challenge: Report

    Benedikt Hopf, Radu Timofte, et al. Robust deepfake de- tection, ntire 2026 challenge: Report. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 1, 2, 5

  21. [21]

    Implicit identity driven deepfake face swapping detection

    Baojin Huang, Zhongyuan Wang, Jifan Yang, Jiaxin Ai, Qin Zou, Qian Wang, and Dengpan Ye. Implicit identity driven deepfake face swapping detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4490–4499, 2023. 6

  22. [22]

    Sida: Social media image deepfake detection, localization and explanation with large multimodal model

    Zhenglin Huang, Jinwei Hu, Xiangtai Li, Yiwei He, Xingyu Zhao, Bei Peng, Baoyuan Wu, Xiaowei Huang, and Guan- gliang Cheng. Sida: Social media image deepfake detection, localization and explanation with large multimodal model. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 28831–28841, 2025. 8

  23. [23]

    Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection

    Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2889–2898, 2020. 2, 3

  24. [24]

    Legion: Learning to ground and ex- plain for synthetic image detection

    Hengrui Kang, Siwei Wen, Zichen Wen, Junyan Ye, Wei- jia Li, Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, et al. Legion: Learning to ground and ex- plain for synthetic image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18937–18947, 2025. 2 9

  25. [25]

    Enhancing gen- eral face forgery detection via vision transformer with low- rank adaptation

    Chenqi Kong, Haoliang Li, and Shiqi Wang. Enhancing gen- eral face forgery detection via vision transformer with low- rank adaptation. In2023 IEEE 6th international conference on multimedia information processing and retrieval (MIPR), pages 102–107. IEEE, 2023. 2, 3

  26. [26]

    Moe-ffd: Mixture of experts for generalized and parameter-efficient face forgery detection.IEEE Transactions on Dependable and Secure Computing, 2025

    Chenqi Kong, Anwei Luo, Peijun Bao, Yi Yu, Haoliang Li, Zengwei Zheng, Shiqi Wang, and Alex C Kot. Moe-ffd: Mixture of experts for generalized and parameter-efficient face forgery detection.IEEE Transactions on Dependable and Secure Computing, 2025. 2

  27. [27]

    Face x-ray for more gen- eral face forgery detection

    Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. Face x-ray for more gen- eral face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5001–5010, 2020. 1, 2

  28. [28]

    Sharp mul- tiple instance learning for deepfake video detection

    Xiaodan Li, Yining Lang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Shuhui Wang, Hui Xue, and Quan Lu. Sharp mul- tiple instance learning for deepfake video detection. InPro- ceedings of the 28th ACM international conference on mul- timedia, pages 1864–1872, 2020. 1, 4

  29. [29]

    Celeb-df: A large-scale challenging dataset for deep- fake forensics

    Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deep- fake forensics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3207– 3216, 2020. 2, 5

  30. [30]

    Focal loss for dense object detection

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 3

  31. [31]

    Fake it till you make it: Curricu- lar dynamic forgery augmentations towards general deepfake detection

    Yuzhen Lin, Wentang Song, Bin Li, Yuezun Li, Jiangqun Ni, Han Chen, and Qiushi Li. Fake it till you make it: Curricu- lar dynamic forgery augmentations towards general deepfake detection. InEuropean conference on computer vision, pages 104–122. Springer, 2024. 6

  32. [32]

    Spatial- phase shallow learning: rethinking face forgery detection in frequency domain

    Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. Spatial- phase shallow learning: rethinking face forgery detection in frequency domain. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 772–781, 2021. 6

  33. [33]

    arXiv preprint arXiv:2602.02222 , year=

    Ruiqi Liu, Manni Cui, Ziheng Qin, Zhiyuan Yan, Ruoxin Chen, Yi Han, Zhiheng Li, Junkai Chen, ZhiJin Chen, Kaiqing Lin, et al. Mirror: Manifold ideal reference re- constructor for generalizable ai-generated image detection. arXiv preprint arXiv:2602.02222, 2026. 5, 6, 7

  34. [34]

    A convnet for the 2020s

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 11976–11986,

  35. [35]

    Gener- alizing face forgery detection with high-frequency features

    Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. Gener- alizing face forgery detection with high-frequency features. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 16317–16326, 2021. 6

  36. [36]

    Mffi: Multi-dimensional face forgery im- age dataset for real-world scenarios

    Changtao Miao, Yi Zhang, Man Luo, Weiwei Feng, Kaiyuan Zheng, Qi Chu, Tao Gong, Jianshu Li, Yunfeng Diao, Wei Zhou, et al. Mffi: Multi-dimensional face forgery im- age dataset for real-world scenarios. InProceedings of the 33rd ACM International Conference on Multimedia, pages 13235–13242, 2025. 2, 3

  37. [37]

    Laa-net: Localized artifact attention network for quality-agnostic and generalizable deepfake de- tection

    Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, and Djamila Aouada. Laa-net: Localized artifact attention network for quality-agnostic and generalizable deepfake de- tection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17395– 17405, 2024. 6

  38. [38]

    Core: Consistent repre- sentation learning for face forgery detection

    Yunsheng Ni, Depu Meng, Changqian Yu, Chengbin Quan, Dongchun Ren, and Youjian Zhao. Core: Consistent repre- sentation learning for face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12–21, 2022. 6

  39. [39]

    Thinking in frequency: Face forgery detection by min- ing frequency-aware clues

    Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by min- ing frequency-aware clues. InEuropean conference on com- puter vision, pages 86–103. Springer, 2020. 6

  40. [40]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 5, 6

  41. [41]

    Reality defender.https : / / realitydefender.com, 2024

    Reality Defender. Reality defender.https : / / realitydefender.com, 2024. Commercial platform for detecting AI-generated media. 5, 6

  42. [42]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 1

  43. [43]

    Faceforen- sics++: Learning to detect manipulated facial images

    Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Chris- tian Riess, Justus Thies, and Matthias Nießner. Faceforen- sics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1–11, 2019. 2, 5

  44. [44]

    Detecting deep- fakes with self-blended images

    Kaede Shiohara and Toshihiko Yamasaki. Detecting deep- fakes with self-blended images. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18720–18729, 2022. 2, 6

  45. [45]

    DINOv3

    Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 1, 3, 4

  46. [46]

    Rethinking the up-sampling op- erations in cnn-based generative network for generalizable deepfake detection

    Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling op- erations in cnn-based generative network for generalizable deepfake detection. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 28130–28139, 2024. 1, 3, 7

  47. [47]

    Veritas: Generalizable deepfake detection via pattern-aware reasoning

    Hao Tan, Jun Lan, Zichang Tan, Ajian Liu, Chuanbiao Song, Senyuan Shi, Huijia Zhu, Weiqiang Wang, Jun Wan, and Zhen Lei. Veritas: Generalizable deepfake detection via pattern-aware reasoning. InInternational Conference on Learning Representations, 2026. 2, 5, 7, 8 10

  48. [48]

    Real appearance mod- eling for more general deepfake detection

    Jiahe Tian, Cai Yu, Xi Wang, Peng Chen, Zihao Xiao, Jiao Dai, Jizhong Han, and Yesheng Chai. Real appearance mod- eling for more general deepfake detection. InEuropean Con- ference on Computer Vision, pages 402–419. Springer, 2024. 6

  49. [49]

    Deepfakes and beyond: A survey of face manipulation and fake detection

    Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Aythami Morales, and Javier Ortega-Garcia. Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion, 64:131–148, 2020. 1, 2

  50. [50]

    arXiv preprint arXiv:2510.16320 , year=

    Wenhao Wang, Longqi Cai, Taihong Xiao, Yuxiao Wang, and Ming-Hsuan Yang. Scaling laws for deepfake detection. arXiv preprint arXiv:2510.16320, 2025. 1, 2, 3, 5

  51. [51]

    Altfreezing for more general video face forgery detection

    Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, and Houqiang Li. Altfreezing for more general video face forgery detection. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 4129–4138, 2023. 6

  52. [52]

    Spot the fake: Large multimodal model-based synthetic image detection with artifact explanation.arXiv preprint arXiv:2503.14905, 2025

    Siwei Wen, Junyan Ye, Peilin Feng, Hengrui Kang, Zichen Wen, Yize Chen, Jiang Wu, Wenjun Wu, Conghui He, and Weijia Li. Spot the fake: Large multimodal model-based synthetic image detection with artifact explanation.arXiv preprint arXiv:2503.14905, 2025. 8

  53. [53]

    Learning spatiotemporal inconsistency via thumbnail layout for face deepfake detection.International Journal of Com- puter Vision, 132(12):5663–5680, 2024

    Yuting Xu, Jian Liang, Lijun Sheng, and Xiao-Yu Zhang. Learning spatiotemporal inconsistency via thumbnail layout for face deepfake detection.International Journal of Com- puter Vision, 132(12):5663–5680, 2024. 6

  54. [54]

    Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- guage models

    Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, and Jian Zhang. Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- guage models. InInternational Conference on Learning Representations, 2025. 2

  55. [55]

    Ucf: Uncovering common features for generalizable deep- fake detection

    Zhiyuan Yan, Yong Zhang, Yanbo Fan, and Baoyuan Wu. Ucf: Uncovering common features for generalizable deep- fake detection. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 22412–22423,

  56. [56]

    Deepfakebench: A comprehensive benchmark of deepfake detection

    Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, and Baoyuan Wu. Deepfakebench: A comprehensive benchmark of deepfake detection. InAdvances in Neural Information Processing Systems, pages 4534–4565, 2023. 2

  57. [57]

    Transcending forgery specificity with latent space augmentation for generalizable deepfake detection

    Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, and Baoyuan Wu. Transcending forgery specificity with latent space augmentation for generalizable deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 8984–8994, 2024. 1, 2, 6

  58. [58]

    Orthogonal subspace decomposition for generalizable ai-generated image detection.arXiv preprint arXiv:2411.15633, 2024

    Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decompo- sition for generalizable ai-generated image detection.arXiv preprint arXiv:2411.15633, 2024. 1, 2, 5, 6, 8

  59. [59]

    Df40: Toward next- generation deepfake detection.Advances in Neural Informa- tion Processing Systems, 37:29387–29434, 2024

    Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, et al. Df40: Toward next- generation deepfake detection.Advances in Neural Informa- tion Processing Systems, 37:29387–29434, 2024. 2, 3, 5

  60. [60]

    Generalizing deepfake video detection with plug- and-play: Video-level blending and spatiotemporal adapter tuning

    Zhiyuan Yan, Yandan Zhao, Shen Chen, Mingyi Guo, Xinghe Fu, Taiping Yao, Shouhong Ding, Yunsheng Wu, and Li Yuan. Generalizing deepfake video detection with plug- and-play: Video-level blending and spatiotemporal adapter tuning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12615–12625, 2025. 6

  61. [61]

    Dˆ 3: scaling up deepfake detection by learning from discrepancy

    Yongqi Yang, Zhihao Qian, Ye Zhu, Olga Russakovsky, and Yu Wu. Dˆ 3: scaling up deepfake detection by learning from discrepancy. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23850–23859,

  62. [62]

    Deepfake detection that generalizes across benchmarks

    Andrii Yermakov, Jan Cech, Jiri Matas, and Mario Fritz. Deepfake detection that generalizes across benchmarks. In Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision, pages 773–783, 2026. 1, 2, 5, 6

  63. [63]

    Learning natural consistency represen- tation for face forgery video detection

    Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, and Shiming Ge. Learning natural consistency represen- tation for face forgery video detection. InEuropean confer- ence on computer vision, pages 407–424. Springer, 2024. 6

  64. [64]

    Multi-attentional deep- fake detection

    Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. Multi-attentional deep- fake detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2185– 2194, 2021. 1, 2

  65. [65]

    Learning self-consistency for deepfake detection

    Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia. Learning self-consistency for deepfake detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 15023–15033, 2021. 6

  66. [66]

    Exploring temporal coherence for more gen- eral video face forgery detection

    Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. Exploring temporal coherence for more gen- eral video face forgery detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 15044–15054, 2021. 6

  67. [67]

    Brought a gun to a knife fight: Modern vfm baselines outgun specialized detectors on in-the-wild ai image detection.arXiv preprint arXiv:2509.12995, 2025

    Yue Zhou, Xinan He, Kaiqing Lin, Bing Fan, Feng Ding, Jin- hua Zeng, and Bin Li. Brought a gun to a knife fight: Modern vfm baselines outgun specialized detectors on in-the-wild ai image detection.arXiv preprint arXiv:2509.12995, 2025. 1

  68. [68]

    Wilddeepfake: A challenging real-world dataset for deepfake detection

    Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, and Yu-Gang Jiang. Wilddeepfake: A challenging real-world dataset for deepfake detection. InProceedings of the 28th ACM international conference on multimedia, pages 2382– 2390, 2020. 2 11