Curvelet-Based Frequency-Aware Feature Enhancement for Deepfake Detection
Pith reviewed 2026-05-10 15:52 UTC · model grok-4.3
The pith
Curvelet transforms with attention mechanisms allow deepfake detectors to focus on compression-resistant frequency artifacts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Curvelet Transform is applied to input faces, after which wedge-level attention and scale-aware spatial masking are trained to selectively boost frequency components tied to forgery traces; the inverse transform then produces a spatially enhanced image that a pretrained Xception classifier uses to distinguish authentic from manipulated faces, yielding higher accuracy than spatial-only baselines on both low- and high-compression versions of the FaceForensics++ dataset.
What carries the argument
Curvelet Transform equipped with learned wedge-level attention and scale-aware spatial masking, which together select and amplify forgery-discriminative frequency components before spatial reconstruction for CNN input.
If this is right
- Detectors gain robustness to common video compression without retraining the entire classifier from scratch.
- Frequency cues become more interpretable, allowing inspection of which directional scales carry forgery signals.
- The same preprocessing pipeline can be attached to other pretrained CNN backbones for similar gains.
- High-compression performance remains competitive, suggesting the method preserves essential artifact information even when pixel data is degraded.
Where Pith is reading between the lines
- The same wedge-and-scale selection logic could be tested on wavelet or shearlet transforms to compare which frequency basis best isolates deepfake traces.
- Extending the masking to video sequences might improve detection of temporal inconsistencies introduced by generators.
- Forensic tools could visualize the attended frequency wedges to explain why a particular face was flagged as fake.
Load-bearing premise
The Curvelet Transform's directional properties combined with the attention and masking will reliably isolate forgery-related frequency artifacts without overfitting to the training set or creating new reconstruction errors.
What would settle it
Performance falling below a standard spatial Xception baseline on a held-out deepfake dataset generated by unseen methods or subjected to novel compression ratios would show the frequency selection does not generalize.
Figures
read the original abstract
The proliferation of sophisticated generative models has significantly advanced the realism of synthetic facial content, known as deepfakes, raising serious concerns about digital trust. Although modern deep learning-based detectors perform well, many rely on spatial-domain features that degrade under compression. This limitation has prompted a shift toward integrating frequency-domain representations with deep learning to improve robustness. Prior research has explored frequency transforms such as Discrete Cosine Transform (DCT), Fast Fourier Transform (FFT), and Wavelet Transform, among others. However, to the best of our knowledge, the Curvelet Transform, despite its superior directional and multiscale properties, remains entirely unexplored in the context of deepfake detection. In this work, we introduce a novel Curvelet-based detection approach that enhances feature quality through wedge-level attention and scale-aware spatial masking, both trained to selectively emphasize discriminative frequency components. The refined frequency cues are reconstructed and passed to a modified pretrained Xception network for classification. Evaluated on two compression qualities in the challenging FaceForensics++ dataset, our method achieves 98.48% accuracy and 99.96% AUC on FF++ low compression, while maintaining strong performance under high compression, demonstrating the efficacy and interpretability of Curvelet-informed forgery detection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a Curvelet-based pipeline for deepfake detection that decomposes input images via the Curvelet Transform, applies learned wedge-level attention and scale-aware spatial masking to emphasize discriminative frequency components, reconstructs the enhanced representation, and feeds it into a modified pretrained Xception network for binary classification. On the FaceForensics++ benchmark it reports 98.48% accuracy and 99.96% AUC under low compression, with maintained performance under high compression, claiming improved robustness and interpretability relative to prior frequency-domain approaches (DCT, FFT, Wavelet).
Significance. If the reported metrics are reproducible and the ablations isolate the contribution of the Curvelet-specific modules, the work would be a useful incremental advance: it is the first application of Curvelets (with their directional and multiscale properties) to deepfake detection and supplies a concrete, trainable mechanism for frequency-component selection. The emphasis on compression robustness addresses a known practical limitation of spatial-domain detectors.
minor comments (2)
- [Abstract] Abstract: the performance numbers are presented without any baseline comparison or statistical significance test; adding at least the strongest competing frequency-domain method (e.g., the best reported Wavelet or DCT result on the same split) would strengthen the claim of efficacy.
- [Methods] The description of the reconstruction step after masking is too terse to assess whether inverse Curvelet artifacts are controlled; a short paragraph or diagram in §3 would clarify this.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. The recognition of our work as the first application of Curvelets to deepfake detection, along with the trainable frequency-component selection mechanism and focus on compression robustness, is appreciated. No specific major comments were provided in the report.
Circularity Check
No significant circularity
full rationale
The paper describes an empirical pipeline for deepfake detection that applies the Curvelet Transform followed by wedge-level attention and scale-aware masking before feeding into a modified Xception network. No equations, derivations, or fitted parameters are present in the provided text. All central claims consist of reported accuracy and AUC metrics on the FaceForensics++ benchmark under two compression levels. Because the work contains no mathematical reduction steps, self-definitional constructs, or load-bearing self-citations that collapse the result to its inputs, the derivation chain is self-contained and the circularity score is zero.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Abbas, F., & Taeihagh, A. (2024). Unmasking deepfakes: A systematic review of deepfake detection and generation techniques using artificial intelligence. Expert Systems with Applications, 252, 124260. https://doi.org/10.1016/J.ESWA.2024.124260 Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: A compact facial video forgery detection ne...
-
[2]
https://doi.org/10.3390/ELECTRONICS12163407 deepfakes/faceswap: Deepfakes Software For All . (n.d.). Retrieved August 15, 2025, from https://github.com/deepfakes/faceswap Durall, R., Keuper, M., & Keuper, J. (2020). Watch your up - convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions. Proceedings of the IE...
-
[3]
Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., & Holz, T. (2020a). Leveraging Frequency Analysis for Deep Fake Image Recognition . https://doi.org/10.5555/3524938.3525242 Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., & Holz, T. (2020b). Leveraging Frequency Analysis for Deep Fake Image Recognition. Gao, J., Xia, ...
-
[4]
https://doi.org/10.1016/J.ESWA.2017.06.038 Ojha, U., Li, Y., & Lee, Y. J. (2023). Towards Universal Fake Image Detectors that Generalize Across Generative Models. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2023-June, 24480–24489. https://doi.org/10.1109/CVPR52729.2023.02345 Qian, Y., Yin, G., Sheng, L....
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.