arxiv: 2604.08159 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.AI

Recognition: unknown

Face-D(²)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection

Yushuo Zhang , Yu Cheng , Yongkang Hu , Jiuan Zhou , Jiawei Chen , Yuan Xie , Zhaoxia Yin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:05 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords deepfake detectioncontinual learningfacial forgeryspatial frequency fusionanti-forgettingadapter updatesmulti-domain representation

0 comments

The pith

A framework fuses spatial and frequency features with dual continual learning to let deepfake detectors adapt to new forgeries without forgetting or replaying old data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to overcome two limits in facial deepfake detection under continual learning: weak feature representation and catastrophic forgetting when forgery methods change. It proposes fusing spatial and frequency-domain information to catch more forgery traces and pairs elastic weight consolidation with orthogonal gradient constraints on task adapters. This combination is meant to keep old knowledge intact while allowing quick updates to new forgery patterns, all without storing or replaying past images. If the approach holds, detectors could remain effective as forgers advance without the usual costs of full retraining or data retention.

Core claim

Face-D²CL uses multi-domain synergistic representation to fuse spatial and frequency-domain features for comprehensive capture of diverse forgery traces, together with a dual continual learning mechanism that applies elastic weight consolidation to distinguish parameter importance for real versus fake samples and orthogonal gradient constraint to ensure task-specific adapter updates do not interfere with previously learned knowledge.

What carries the argument

Multi-domain synergistic representation that fuses spatial and frequency-domain features, combined with the dual continual learning mechanism of elastic weight consolidation and orthogonal gradient constraint on task adapters.

If this is right

The model reaches a dynamic balance between stability against forgetting and plasticity for new forgery types.
Average detection error rates fall substantially relative to prior state-of-the-art methods.
Detection performance on previously unseen forgery domains rises without requiring storage of past data.
Task-specific adapters can be updated orthogonally while real-versus-fake parameter importance is preserved.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fusion of spatial and frequency cues could be tested on non-facial image forgery tasks such as document or video manipulation.
Separately weighting parameters for real and fake classes might reduce forgetting in other binary continual-learning settings outside detection.
The orthogonal update rule on adapters suggests a general way to limit interference in replay-free continual learning for other computer-vision problems.
Extending the framework to video sequences would reveal whether the spatial-frequency synergy scales beyond static images.

Load-bearing premise

That fusing spatial and frequency-domain features will capture the full variety of forgery traces and that elastic weight consolidation paired with orthogonal gradient constraint will prevent forgetting while enabling adaptation without any replay of historical data.

What would settle it

Sequential training on a series of new forgery domains followed by re-testing on the earliest domains to check whether accuracy on those early domains drops sharply despite the proposed mechanisms.

Figures

Figures reproduced from arXiv: 2604.08159 by Jiawei Chen, Jiuan Zhou, Yongkang Hu, Yuan Xie, Yu Cheng, Yushuo Zhang, Zhaoxia Yin.

**Figure 1.** Figure 1: The pipeline of the proposed framework. Malicious applications such as identity impersonation and disinformation further highlight the urgent need for robust and adaptive face forgery detection methods. Traditional face forgery detection methods typically learn discriminative features from fixed training data and achieve satisfactory performance on known forgery types [4, 22, 33, 40, 41]. However, as g… view at source ↗

**Figure 2.** Figure 2: Overall architecture of Face-D2CL. The input face image is processed by three parallel branches (Spatial, Wavelet, Fourier) with domain alignment. The aligned features are encoded by a shared CLIP encoder with domain-specific LoRA adapters. The resulting features are fused for classification and contrastive alignment with text prompts. During training, a dual continual learning mechanism (EWC and OGC) regu… view at source ↗

**Figure 3.** Figure 3: Robustness comparison of different methods under unseen perturbations based on Protocol 1. Average AUC (%) across [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

The rapid advancement of facial forgery techniques poses severe threats to public trust and information security, making facial DeepFake detection a critical research priority. Continual learning provides an effective approach to adapt facial DeepFake detection models to evolving forgery patterns. However, existing methods face two key bottlenecks in real-world continual learning scenarios: insufficient feature representation and catastrophic forgetting. To address these issues, we propose Face-D(^2)CL, a framework for facial DeepFake detection. It leverages multi-domain synergistic representation to fuse spatial and frequency-domain features for the comprehensive capture of diverse forgery traces, and employs a dual continual learning mechanism that combines Elastic Weight Consolidation (EWC), which distinguishes parameter importance for real versus fake samples, and Orthogonal Gradient Constraint (OGC), which ensures updates to task-specific adapters do not interfere with previously learned knowledge. This synergy enables the model to achieve a dynamic balance between robust anti-forgetting capabilities and agile adaptability to emerging facial forgery paradigms, all without relying on historical data replay. Extensive experiments demonstrate that our method surpasses current SOTA approaches in both stability and plasticity, achieving 60.7% relative reduction in average detection error rate, respectively. On unseen forgery domains, it further improves the average detection AUC by 7.9% compared to the current SOTA method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's dual EWC-plus-OGC setup for replay-free continual DeepFake detection is a straightforward incremental application of existing tools, but the large reported gains need the full experiments to stand up.

read the letter

The main takeaway is that Face-D²CL tries to solve continual adaptation for facial forgery detection without any data replay by fusing spatial and frequency features and running a dual regularizer: EWC that weights parameters differently for real versus fake samples, plus orthogonal gradient constraints on task adapters. This is meant to keep old performance stable while picking up new forgery types. The specific pairing and the real/fake split inside EWC are the concrete additions here. The multi-domain feature part is a sensible engineering choice for catching varied artifacts. The work does a clean job stating the two bottlenecks—weak representations and forgetting—and mapping a targeted fix onto them. The no-replay constraint is practically relevant when storage or privacy rules apply. If the experiments really show low backward transfer across a realistic sequence of domains, that would be the useful result. The reported 60.7 percent error drop and 7.9 percent unseen AUC lift are large enough that they need to be checked against proper baselines, ablations, and statistical tests. The abstract alone does not give dataset sizes, implementation details, or variance numbers, so the claims stay provisional until the full results are examined. The stress-test point about EWC's quadratic penalty and fixed importance estimates not transferring when forgery traces shift in frequency or spatial space is worth watching; if the paper only tests narrow domain sequences, the advantage could shrink. OGC's restriction on the update subspace might also limit plasticity for truly novel traces. This paper is aimed at researchers who build adaptive detectors for media forensics or security pipelines. Someone already working on continual learning in vision will see the method as a direct extension worth trying on their own data. It deserves a serious referee because it offers a complete, runnable framework on a concrete problem with empirical claims, even if the experiments require tightening. I would send it to review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper proposes Face-D²CL for continual facial DeepFake detection. It introduces multi-domain synergistic representation by fusing spatial and frequency-domain features to capture diverse forgery traces, paired with a dual continual learning mechanism: Elastic Weight Consolidation (EWC) that distinguishes parameter importance for real versus fake samples, and Orthogonal Gradient Constraint (OGC) on task-specific adapters to avoid interference with prior knowledge. The approach claims to enable adaptation to new forgery paradigms without historical data replay, achieving a 60.7% relative reduction in average detection error rate and a 7.9% average AUC improvement on unseen domains over current SOTA methods.

Significance. If the reported gains hold under rigorous validation, the work could meaningfully advance continual learning for security applications by offering a no-replay solution that balances stability and plasticity in the face of evolving forgery techniques. The specific pairing of EWC (real/fake importance) with OGC on adapters is a targeted contribution, but its effectiveness depends on empirical demonstration of low backward transfer across realistic domain sequences.

major comments (2)

[Abstract] Abstract: The central empirical claims (60.7% relative error-rate reduction and 7.9% unseen-domain AUC lift) are presented without any reference to the datasets, the number or ordering of continual-learning domains, baseline implementations, ablation studies, or statistical tests. This absence prevents verification of whether the dual CL mechanism actually delivers the claimed synergy.
[Method] Method (dual CL section): The combination of EWC (with real/fake importance weighting) and OGC (orthogonal updates on adapters) is asserted to prevent catastrophic forgetting without replay, yet no analysis addresses whether EWC's quadratic penalty remains valid when forgery artifacts shift between spatial and frequency domains; the skeptic concern that this may reduce plasticity for novel traces is not tested or bounded.

minor comments (2)

[Abstract] The sentence ending 'achieving 60.7% relative reduction in average detection error rate, respectively' contains an extraneous 'respectively' with no antecedent list.
The notation Face-D(^2)CL should be clarified in the title and introduction; it is unclear whether the superscript denotes 'dual' or another quantity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting areas where the presentation of our empirical claims and methodological analysis could be strengthened. We address each major comment point by point below, indicating where revisions to the manuscript are planned.

read point-by-point responses

Referee: [Abstract] Abstract: The central empirical claims (60.7% relative error-rate reduction and 7.9% unseen-domain AUC lift) are presented without any reference to the datasets, the number or ordering of continual-learning domains, baseline implementations, ablation studies, or statistical tests. This absence prevents verification of whether the dual CL mechanism actually delivers the claimed synergy.

Authors: We agree that the abstract's brevity omits these specifics, which are essential for immediate verification. The full manuscript details the datasets (including FaceForensics++, Celeb-DF, and additional forgery sources for the continual sequences), the exact ordering of domains in the learning protocol, baseline re-implementations, comprehensive ablation studies isolating each component of the dual CL mechanism, and statistical tests supporting the reported gains. To improve accessibility, we will revise the abstract to concisely reference the experimental setup and direct readers to the Experiments and Ablation sections for full verification of the claimed synergy. revision: yes
Referee: [Method] Method (dual CL section): The combination of EWC (with real/fake importance weighting) and OGC (orthogonal updates on adapters) is asserted to prevent catastrophic forgetting without replay, yet no analysis addresses whether EWC's quadratic penalty remains valid when forgery artifacts shift between spatial and frequency domains; the skeptic concern that this may reduce plasticity for novel traces is not tested or bounded.

Authors: We appreciate this insightful concern about the interaction between EWC's penalty and cross-domain artifact shifts. Our ablation studies and unseen-domain evaluations empirically demonstrate that the dual mechanism (EWC with real/fake weighting plus OGC) preserves plasticity, as shown by the 7.9% AUC improvement on novel forgeries without replay. However, we acknowledge that the current version lacks an explicit analysis or bound quantifying any potential plasticity reduction under spatial-frequency shifts. We will add a dedicated discussion subsection in the revised Method or Experiments section, supported by additional targeted experiments measuring backward transfer and plasticity metrics across domain sequences, to directly address and bound this aspect. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical combination of known techniques with experimental validation

full rationale

The paper proposes Face-D(^2)CL as a practical framework that fuses spatial/frequency features and applies EWC+OGC for continual learning without replay. No equations, derivations, or first-principles predictions appear in the provided text; performance gains (error-rate reduction, AUC improvement) are asserted solely via experiments on unseen domains. EWC and OGC are standard cited methods, not redefined or fitted in a self-referential loop within this work. The central claim therefore rests on empirical results rather than any reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond the high-level framework description; standard computer-vision assumptions about feature complementarity are invoked implicitly.

pith-pipeline@v0.9.0 · 5554 in / 1274 out tokens · 53493 ms · 2026-05-10T18:05:01.510060+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Lluis Castrejon, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, et al
[2]

Imagen 3.arXiv preprint arXiv:2408.07009(2024)

work page arXiv 2024
[3]

Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajan- than, Puneet K Dokania, Philip HS Torr, and Marc’Aurelio Ranzato. 2019. On tiny episodic memories in continual learning.arXiv preprint arXiv:1902.10486(2019)

work page Pith review arXiv 2019
[4]

Jikang Cheng, Zhiyuan Yan, Ying Zhang, Li Hao, Jiaxin Ai, Qin Zou, Chen Li, and Zhongyuan Wang. 2025. Stacking brick by brick: Aligned feature isolation for incremental face forgery detection. InProceedings of the computer vision and pattern recognition conference. 13927–13936

2025
[5]

François Chollet. 2017. Xception: Deep learning with depthwise separable con- volutions. InProceedings of the IEEE conference on computer vision and pattern recognition. 1251–1258

2017
[6]

Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. 2019. The deepfake detection challenge (dfdc) preview dataset.arXiv preprint arXiv:1910.08854(2019)

work page arXiv 2019
[7]

Nick Dufour and Andrew Gully. 2019. Contributing data to deepfake detection research.Google AI Blog1, 2 (2019), 3

2019
[8]

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. 2024. Scaling rectified flow transformers for high-resolution image synthesis. InForty- first international conference on machine learning

2024
[9]

Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, and Maja Pan- tic. 2021. Lips don’t lie: A generalisable and robust approach to face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5039–5049

2021
[10]

Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, and Jun-Cheng Chen. 2025. To- wards more general video-based deepfake detection through facial component guided adaptation for foundation model. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 22995–23005

2025
[11]

Fa-Ting Hong and Dan Xu. 2023. Implicit identity representation conditioned memory compensation network for talking head video generation. InProceedings of the IEEE/CVF international conference on computer vision. 23062–23072

2023
[12]

Yongkang Hu, Yu Cheng, Yushuo Zhang, Yuan Xie, and Zhaoxia Yin. 2026. SAIDO: Generalizable Detection of AI-Generated Images via Scene-Aware and Importance-Guided Dynamic Optimization in Continual Learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition

2026
[13]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. InProceedings of the international conference on learning representations

2018
[14]

Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks.Ad- vances in neural information processing systems34 (2021), 852–863

2021
[15]

Minha Kim, Shahroz Tariq, and Simon S Woo. 2021. Cored: Generalizing fake media detection with continual representation using distillation. InProceedings of the 29th ACM international conference on multimedia. 337–346

2021
[16]

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences114, 13 (2017), 3521– 3526

2017
[17]

Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Bain- ing Guo. 2020. Face x-ray for more general face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5001–5010

2020
[18]

Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking. In2018 IEEE international workshop on information forensics and security. 1–7

2018
[19]

Zhizhong Li and Derek Hoiem. 2018. Learning without Forgetting.IEEE Trans- actions on pattern analysis and machine intelligence40, 12 (2018), 2935–2947

2018
[20]

Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. 2021. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 772–781

2021
[21]

Zheda Mai, Ruiwen Li, Hyunwoo Kim, and Scott Sanner. 2021. Supervised contrastive replay: Revisiting the nearest class mean classifier in online class- incremental continual learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3589–3599

2021
[22]

Kun Pan, Yifang Yin, Yao Wei, Feng Lin, Zhongjie Ba, Zhenguang Liu, Zhibo Wang, Lorenzo Cavallaro, and Kui Ren. 2023. Dfil: Deepfake incremental learning by exploiting domain-invariant forgery clues. InProceedings of the 31st ACM international conference on multimedia. 8035–8046

2023
[23]

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Think- ing in frequency: Face forgery detection by mining frequency-aware clues. In European conference on computer vision. 86–103

2020
[24]

Jingyang Qiao, Xin Tan, Chengwei Chen, Yanyun Qu, Yong Peng, Yuan Xie, et al. 2024. Prompt gradient projection for continual learning. Inthe Twelfth international conference on learning representations

2024
[25]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. InPro- ceedings of the IEEE conference on computer vision and pattern recognition. 2001– 2010

2017
[26]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695

2022
[27]

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. InProceedings of the IEEE/CVF international conference on computer vision. 1–11

2019
[28]

Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Pro- gressive neural networks.arXiv preprint arXiv:1606.04671(2016)

work page internal anchor Pith review arXiv 2016
[29]

Gobinda Saha, Isha Garg, and Kaushik Roy. 2021. Gradient Projection Memory for Continual Learning. Ininternational conference on learning representations

2021
[30]

Kaede Shiohara and Toshihiko Yamasaki. 2022. Detecting deepfakes with self- blended images. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 18720–18729

2022
[31]

Kaede Shiohara, Xingchao Yang, and Takafumi Taketomi. 2023. Blendface: Re- designing identity encoders for face-swapping. InProceedings of the IEEE/CVF international conference on computer vision. 7634–7644

2023
[32]

Ke Sun, Shen Chen, Taiping Yao, Xiaoshuai Sun, Shouhong Ding, and Rongrong Ji. 2025. Continual face forgery detection via historical distribution preserving. international journal of computer vision133, 3 (2025), 1067–1084

2025
[33]

Ke Sun, Taiping Yao, Shen Chen, Shouhong Ding, Jilin Li, and Rongrong Ji. 2022. Dual contrastive learning for general face forgery detection. InProceedings of the AAAI conference on artificial intelligence, Vol. 36. 2316–2324

2022
[34]

Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational conference on machine learning. 6105–6114

2019
[35]

Jiahe Tian, Cai Yu, Xi Wang, Peng Chen, Zihao Xiao, Jizhong Han, and Yesheng Chai. 2024. Dynamic mixed-prototype model for incremental deepfake detection. InProceedings of the 32nd ACM international conference on multimedia. 8129–8138

2024
[36]

Ying Xu, Kiran Raja, and Marius Pedersen. 2022. Supervised contrastive learn- ing for generalizable and explainable deepfakes detection. InProceedings of the IEEE/CVF winter conference on applications of computer vision. 379–389

2022
[37]

Jiazhen Yan, Ziqiang Li, Fan Wang, Ziwen He, and Zhangjie Fu. 2026. Dual Frequency Branch Framework with Reconstructed Sliding Windows Attention 9 for AI-Generated Image Detection.IEEE Transactions on information forensics and security(2026)

2026
[38]

Shipeng Yan, Jiangwei Xie, and Xuming He. 2021. Der: Dynamically expandable representation for class incremental learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3014–3023

2021
[39]

Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, et al. 2024. Df40: Toward next-generation deepfake detection.Advances in neural information processing systems37 (2024), 29387–29434

2024
[40]

Xin Yang, Yuezun Li, and Siwei Lyu. 2019. Exposing deep fakes using inconsistent head poses. InICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing. 8261–8265

2019
[41]

Nan Zhong, Yiran Xu, Sheng Li, Zhenxing Qian, and Xinpeng Zhang. 2023. Patchcraft: Exploring texture patch for efficient ai-generated image detection. arXiv preprint arXiv:2311.12397(2023)

work page arXiv 2023
[42]

Nan Zhong, Mian Zou, Yiran Xu, Zhenxing Qian, Xinpeng Zhang, Baoyuan Wu, and Kede Ma. 2026. Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective.IEEE Transactions on pattern analysis and machine intelligence(2026), 1–16

2026
[43]

Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, and Yi Yang. 2024. Migc: Multi- instance generation controller for text-to-image synthesis. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6818–6828

2024
[44]

Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, and Yu-Gang Jiang. 2020. Wilddeepfake: A challenging real-world dataset for deepfake detection. InPro- ceedings of the 28th ACM international conference on multimedia. 2382–2390. 10 A. Task Order Robustness for Protocol 1 To evaluate the robustness of the proposed method to task order, an additional expe...

2020