SpecSem-Net: Integrating Spectral and Semantic Features for Robust AI-generated Video Detection
Pith reviewed 2026-05-20 14:15 UTC · model grok-4.3
The pith
SpecSem-Net detects high-fidelity AI videos by guiding spectral denoising with semantic context.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SpecSem-Net is the first framework to introduce a semantic-guided spectral denoising mechanism specifically for high-fidelity AI-generated video detection. It extracts high-frequency features via a Fourier-Transform based spectral module and employs a Gated Merging Mechanism to adaptively fuse semantic context, effectively mitigating spectral noise. On a new benchmark with five state-of-the-art commercial generators the method reaches 87.25 percent accuracy, and it reaches 95.59 percent on public datasets, outperforming existing detectors.
What carries the argument
The semantic-guided spectral denoising mechanism that extracts high-frequency features via Fourier-Transform filtering and then uses gated merging to fuse those features with semantic context while suppressing noise.
Load-bearing premise
High-frequency spectral artifacts remain reliably present and distinguishable even in videos produced by the latest commercial generators such as Sora and Veo.
What would settle it
Generate a test set of videos from a model that explicitly suppresses or randomizes high-frequency spectral content, then check whether SpecSem-Net accuracy falls to the level of semantic-only detectors.
Figures
read the original abstract
The remarkable visual fidelity of recent commercial video generative models, such as Sora and Veo, renders robust AI-generated video detection increasingly essential to prevent synthetic content from being indistinguishable from real videos and exploited for disinformation. However, existing detectors often fail due to an over-reliance on increasingly realistic semantic features, neglecting subtle spectral artifacts. In this paper, we propose SpecSem-Net, the first framework to introduce a semantic-guided spectral denoising mechanism specifically for high-fidelity AI-generated video detection. Specifically, we design a spectral module to extract high-frequency features via Fourier-Transform based filtering. Furthermore, to reduce misjudgments arising from spectral noise, we employ a Gated Merging Mechanism to adaptively fuse semantic context, effectively mitigating spectral noise. Additionally, to evaluate detector performance on the latest top-tier generative models, we construct a comprehensive benchmark comprising 5 SOTA commercial generators. Extensive experiments demonstrate that SpecSem-Net outperforms existing methods, achieving accuracies of 87.25% and 95.59% on our benchmark and public datasets, respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SpecSem-Net, a framework that extracts high-frequency features via Fourier-Transform based filtering in a spectral module and adaptively fuses them with semantic context using a Gated Merging Mechanism for detecting AI-generated videos. It constructs a new benchmark with videos from 5 SOTA commercial generators (including Sora and Veo) and reports accuracies of 87.25% on this benchmark and 95.59% on public datasets, claiming to be the first to introduce a semantic-guided spectral denoising mechanism for high-fidelity video detection.
Significance. If the empirical results hold under rigorous verification, the work would contribute a practical detector that addresses the failure modes of purely semantic approaches as generative models improve in visual fidelity. The new benchmark covering latest commercial generators is a useful resource for the community. The architectural idea of gating spectral features with semantic context is a reasonable direction, though its advantage depends on the continued presence of detectable high-frequency artifacts.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experimental Setup): the reported accuracies of 87.25% on the new benchmark and 95.59% on public datasets are presented without any description of the experimental protocol, number of samples per generator, train/test splits, baseline methods, or statistical significance tests. This information is load-bearing for the central claim that SpecSem-Net outperforms existing methods on high-fidelity generators.
- [§3.2 and §5] §3.2 (Spectral Module) and §5 (Results on Commercial Generators): the claim that the Fourier-based filtering reliably extracts distinguishable high-frequency artifacts rests on the untested assumption that such artifacts survive in videos from Sora and Veo. No ablation or visualization is provided showing that the spectral branch still contributes when these latest models are used; if the artifacts have been suppressed, the gated fusion reduces to a standard semantic classifier and the reported gains may reflect benchmark construction rather than the proposed mechanism.
minor comments (2)
- [§3.3] Notation for the Gated Merging Mechanism should be defined with explicit equations rather than descriptive text only.
- [Figure 3] Figure captions for spectral visualizations should include the exact frequency cutoff values used in the Fourier filtering.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We have reviewed each major point carefully and provide point-by-point responses below. We agree that additional details and analyses will strengthen the paper and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experimental Setup): the reported accuracies of 87.25% on the new benchmark and 95.59% on public datasets are presented without any description of the experimental protocol, number of samples per generator, train/test splits, baseline methods, or statistical significance tests. This information is load-bearing for the central claim that SpecSem-Net outperforms existing methods on high-fidelity generators.
Authors: We agree that a complete description of the experimental protocol is necessary to support our claims and ensure reproducibility. In the revised manuscript, we will expand §4 to explicitly detail the number of samples per generator in the new benchmark, the train/test split methodology and ratios, the complete list of baseline methods with implementation references, and the results of statistical significance tests (such as McNemar's test or paired t-tests with p-values) comparing SpecSem-Net against the baselines. These elements were part of our experimental design but were not fully elaborated in the original submission; we will now include them. revision: yes
-
Referee: [§3.2 and §5] §3.2 (Spectral Module) and §5 (Results on Commercial Generators): the claim that the Fourier-based filtering reliably extracts distinguishable high-frequency artifacts rests on the untested assumption that such artifacts survive in videos from Sora and Veo. No ablation or visualization is provided showing that the spectral branch still contributes when these latest models are used; if the artifacts have been suppressed, the gated fusion reduces to a standard semantic classifier and the reported gains may reflect benchmark construction rather than the proposed mechanism.
Authors: We acknowledge the validity of this observation. The contribution of the spectral module on the latest high-fidelity generators requires explicit verification. In the revised manuscript, we will add an ablation study in §5 that isolates the performance of the full SpecSem-Net model versus a semantic-only variant on the commercial generators benchmark. We will also include visualizations of the frequency spectra and filtered features for representative samples from Sora and Veo to demonstrate that the high-frequency branch continues to provide distinguishable information. These additions will clarify the role of the gated merging mechanism. revision: yes
Circularity Check
No circularity detected in architectural proposal or empirical claims
full rationale
The paper presents SpecSem-Net as a new neural network architecture that extracts high-frequency features via Fourier-Transform filtering and fuses them with semantic context using a Gated Merging Mechanism. No equations, derivations, or first-principles results are described that reduce to fitted parameters or inputs by construction. Performance numbers (87.25% on custom benchmark, 95.59% on public datasets) are reported from direct empirical evaluation rather than any self-referential prediction. The 'first framework' claim and benchmark construction do not invoke self-citations or uniqueness theorems that would create a load-bearing circular chain. The work is self-contained as an empirical architecture proposal.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network hyperparameters and gating thresholds
axioms (1)
- domain assumption Detectable high-frequency spectral artifacts exist in outputs of current commercial video generators
Reference graph
Works this paper leans on
-
[1]
Sora 2 is here.https://openai.com/index/sora-2/, 2025
OpenAI. Sora 2 is here.https://openai.com/index/sora-2/, 2025. Accessed: 2026-01-27
work page 2025
-
[2]
Kling ai: High-quality video generation
Kuaishou. Kling ai: High-quality video generation. https://klingai.com/, 2025. Accessed: 2026-01- 27
work page 2025
-
[3]
Veo: Our most capable generative video model
Google DeepMind. Veo: Our most capable generative video model. https://deepmind.google/ models/veo/, 2025. Accessed: 2026-01-27
work page 2025
-
[4]
Evolving from single-modal to multi-modal facial deepfake detection: Progress and challenges, 2025
Ping Liu, Qiqi Tao, and Joey Tianyi Zhou. Evolving from single-modal to multi-modal facial deepfake detection: Progress and challenges, 2025
work page 2025
-
[5]
The tug-of-war between deepfake generation and detection, 2024
Hannah Lee, Changyeon Lee, Kevin Farhat, Lin Qiu, Steve Geluso, Aerin Kim, and Oren Etzioni. The tug-of-war between deepfake generation and detection, 2024
work page 2024
-
[6]
Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, and Huaxiong Li. Demamba: Ai-generated video detection on million-scale genvideo benchmark.arXiv preprint arXiv:2405.19707, 2024
-
[7]
Chende Zheng, Chenhao Lin, Zhengyu Zhao, Le Yang, Shuai Liu, Minghui Yang, Cong Wang, Chao Shen, et al. D3: Training-free ai-generated video detection using second-order features.arXiv preprint arXiv:2508.00701, 2025
-
[8]
Leveraging frequency analysis for deep fake image recognition
Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. Leveraging frequency analysis for deep fake image recognition. InInternational conference on machine learning, pages 3247–3258. PMLR, 2020
work page 2020
-
[9]
Ricard Durall, Margret Keuper, and Janis Keuper. Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7890–7899, 2020
work page 2020
-
[10]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine...
work page 2021
-
[11]
Tall: Thumbnail layout for deepfake video detection
Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, and Ran He. Tall: Thumbnail layout for deepfake video detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22658–22668, 2023
work page 2023
-
[12]
Ai-generated video forgery detection and authentication
Ayush Kumar Tiwari, Aman Sharma, Poonam Rayakar, Manish Kumar Bhavriya, et al. Ai-generated video forgery detection and authentication. In2024 IEEE 9th International Conference for Convergence in Technology (I2CT), pages 1–8. IEEE, 2024
work page 2024
-
[13]
Turns out i’m not real: Towards robust detection of ai-generated videos, 2024
Qingyuan Liu, Pengyuan Shi, Yun-Yun Tsai, Chengzhi Mao, and Junfeng Yang. Turns out i’m not real: Towards robust detection of ai-generated videos, 2024
work page 2024
-
[14]
Chirui Chang, Jiahui Liu, Zhengzhe Liu, Xiaoyang Lyu, Yi-Hua Huang, Xin Tao, Pengfei Wan, Di Zhang, and Xiaojuan Qi. How far are ai-generated videos from simulating the 3d visual world: A learned 3d evaluation approach, 2025
work page 2025
-
[15]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[16]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma, Yaohui Wang, Xinyuan Chen, Gengyun Jia, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, and Yu Qiao. Latte: Latent diffusion transformer for video generation.arXiv preprint arXiv:2401.03048, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[18]
Introducing Gen-3 Alpha: A new frontier for video generation
Runway. Introducing Gen-3 Alpha: A new frontier for video generation. https://runwayml.com/ research/introducing-gen-3-alpha, 2024. Accessed: 2024-06-17
work page 2024
-
[19]
Vivit: A video vision transformer
Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luˇci´c, and Cordelia Schmid. Vivit: A video vision transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 6836–6846, 2021. 10
work page 2021
-
[20]
Is space-time attention all you need for video understanding? InIcml, volume 2, page 4, 2021
Gedas Bertasius, Heng Wang, and Lorenzo Torresani. Is space-time attention all you need for video understanding? InIcml, volume 2, page 4, 2021
work page 2021
-
[21]
Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. Video swin transformer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3202–3211, 2022
work page 2022
-
[22]
arXiv preprint arXiv:2203.12602 , year=
Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training.arXiv preprint arXiv:2203.12602, 2022
-
[23]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[24]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
Vidguard-r1: Ai-generated video detection and explanation via reasoning mllms and rl, 2025
Kyoungjun Park, Yifan Yang, Juheon Yi, Shicheng Zheng, Yifei Shen, Dongqi Han, Caihua Shan, Muham- mad Muaz, and Lili Qiu. Vidguard-r1: Ai-generated video detection and explanation via reasoning mllms and rl, 2025
work page 2025
-
[26]
Busterx: Mllm-powered ai-generated video forgery detection and explanation.Arxiv, 2025
Haiquan Wen, Yiwei He, Zhenglin Huang, Tianxiao Li, Zihan Yu, Xingru Huang, Lu Qi, Baoyuan Wu, Xiangtai Li, and Guangliang Cheng. Busterx: Mllm-powered ai-generated video forgery detection and explanation.Arxiv, 2025
work page 2025
-
[27]
Ai-generated video detection via spatio-temporal anomaly learning, 2024
Jianfa Bai, Man Lin, and Gang Cao. Ai-generated video detection via spatio-temporal anomaly learning, 2024
work page 2024
-
[28]
Physics-driven spatiotemporal modeling for ai-generated video detection, 2025
Shuhai Zhang, ZiHao Lian, Jiahao Yang, Daiyuan Li, Guoxuan Pang, Feng Liu, Bo Han, Shutao Li, and Mingkui Tan. Physics-driven spatiotemporal modeling for ai-generated video detection, 2025
work page 2025
-
[29]
Generalizing face forgery detection with high- frequency features
Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. Generalizing face forgery detection with high- frequency features. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16317–16326, 2021
work page 2021
-
[30]
How realistic is photorealistic?IEEE Transactions on Signal Processing, 53(2):845–850, 2005
Siwei Lyu and Hany Farid. How realistic is photorealistic?IEEE Transactions on Signal Processing, 53(2):845–850, 2005
work page 2005
-
[31]
Jessica Fridrich and Jan Kodovsky. Rich models for steganalysis of digital images.IEEE Transactions on information Forensics and Security, 7(3):868–882, 2012
work page 2012
-
[32]
Thinking in frequency: Face forgery detection by mining frequency-aware clues
Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by mining frequency-aware clues. InEuropean conference on computer vision, pages 86–103. Springer, 2020
work page 2020
-
[33]
Bihpf: Bilateral high-pass filters for robust deepfake detection
Yonghyun Jeong, Doyeon Kim, Seungjai Min, Seongho Joe, Youngjune Gwon, and Jongwon Choi. Bihpf: Bilateral high-pass filters for robust deepfake detection. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 48–57, 2022
work page 2022
-
[34]
Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris, and Efstratios Gavves. Any- resolution ai-generated image detection by spectral learning.IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[35]
Jiazhen Yan, Ziqiang Li, Fan Wang, Ziwen He, and Zhangjie Fu. Dual frequency branch framework with reconstructed sliding windows attention for ai-generated image detection, 2025
work page 2025
-
[36]
The DeepFake Detection Challenge (DFDC) Dataset
Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. The deepfake detection challenge (dfdc) dataset.arXiv preprint arXiv:2006.07397, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[37]
Focal frequency loss for image reconstruction and synthesis
Liming Jiang, Bo Dai, Wayne Wu, and Chen Change Loy. Focal frequency loss for image reconstruction and synthesis. InProceedings of the IEEE/CVF international conference on computer vision, pages 13919–13929, 2021
work page 2021
-
[38]
Squeeze-and-excitation networks
Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018. 11
work page 2018
-
[39]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
-
[40]
An algorithm for the machine calculation of complex fourier series
James W Cooley and John W Tukey. An algorithm for the machine calculation of complex fourier series. Mathematics of computation, 19(90):297–301, 1965
work page 1965
-
[41]
Tarik Dzanic, Karan Shah, and Freddie Witherden. Fourier spectrum discrepancies in deep network generated images.Advances in neural information processing systems, 33:3022–3032, 2020
work page 2020
-
[42]
A closer look at fourier spectrum discrepancies for cnn-generated images detection
Keshigeyan Chandrasegaran, Ngoc-Trung Tran, and Ngai-Man Cheung. A closer look at fourier spectrum discrepancies for cnn-generated images detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7200–7209, 2021
work page 2021
-
[43]
Deconvolution and checkerboard artifacts.Distill, 1(10):e3, 2016
Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and checkerboard artifacts.Distill, 1(10):e3, 2016
work page 2016
-
[44]
What makes fake images detectable? understanding properties that generalize
Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. What makes fake images detectable? understanding properties that generalize. InEuropean conference on computer vision, pages 103–120. Springer, 2020
work page 2020
-
[45]
Film: Visual reasoning with a general conditioning layer
Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018
work page 2018
-
[46]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016
work page 2016
-
[47]
Genvidbench: A 6-million benchmark for ai-generated video detection, 2025
Zhenliang Ni, Qiangyu Yan, Mouxiao Huang, Tianning Yuan, Yehui Tang, Hailin Hu, Xinghao Chen, and Yunhe Wang. Genvidbench: A 6-million benchmark for ai-generated video detection, 2025
work page 2025
-
[48]
Haiquan Wen, Tianxiao Li, Zhenglin Huang, Yiwei He, and Guangliang Cheng. Busterx++: Towards unified cross-modal ai-generated content detection and explanation with mllm, 2026
work page 2026
-
[49]
Distinguish any fake videos: Unleashing the power of large-scale data and motion features, 2024
Lichuan Ji, Yingqi Lin, Zhenhua Huang, Yan Han, Xiaogang Xu, Jiafei Wu, Chong Wang, and Zhe Liu. Distinguish any fake videos: Unleashing the power of large-scale data and motion features, 2024
work page 2024
-
[50]
The Kinetics Human Action Video Dataset
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. The kinetics human action video dataset.arXiv preprint arXiv:1705.06950, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[51]
HunyuanImage 3.0 Technical Report
Siyu Cao, Hangting Chen, Peng Chen, Yiji Cheng, Yutao Cui, Xinchi Deng, Ying Dong, Kipper Gong, Tianpeng Gu, Xiusen Gu, et al. Hunyuanimage 3.0 technical report.arXiv preprint arXiv:2509.23951, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
Dynamicrafter: Animating open-domain images with video diffusion priors
Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Gongye Liu, Xintao Wang, Ying Shan, and Tien-Tsin Wong. Dynamicrafter: Animating open-domain images with video diffusion priors. InEuropean Conference on Computer Vision, pages 399–417. Springer, 2024
work page 2024
-
[53]
Hailuo ai video.https://hailuoai.com/video, 2024
MiniMax. Hailuo ai video.https://hailuoai.com/video, 2024
work page 2024
-
[54]
Wanx: Alibaba cloud ai video generation.https://wanx.aliyun.com/, 2023
Alibaba Cloud. Wanx: Alibaba cloud ai video generation.https://wanx.aliyun.com/, 2023
work page 2023
-
[55]
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[56]
Adam: A Method for Stochastic Optimization
Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[57]
Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake detection: Improving generalizability through frequency space learning, 2024
work page 2024
-
[58]
Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. InInternational conference on learning representations, 2018. 12 A Robustness in Real-World Scenarios As raised in previous discussions, high-...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.