pith. sign in

arxiv: 2604.12028 · v1 · submitted 2026-04-13 · 💻 cs.CV · cs.AI

Curvelet-Based Frequency-Aware Feature Enhancement for Deepfake Detection

Pith reviewed 2026-05-10 15:52 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords deepfake detectioncurvelet transformfrequency domainfeature enhancementimage compressionconvolutional neural networksFaceForensics++forgery artifacts
0
0 comments X

The pith

Curvelet transforms with attention mechanisms allow deepfake detectors to focus on compression-resistant frequency artifacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that the Curvelet Transform can capture directional and multiscale frequency details in face images that spatial-domain methods miss, especially after compression. It introduces wedge-level attention to highlight important frequency wedges and scale-aware spatial masking to emphasize relevant image regions. These refined cues are reconstructed into an enhanced image and passed to a modified Xception network for real-versus-fake classification. A sympathetic reader would care because current detectors lose accuracy on compressed videos common in social media, and frequency-based cues might restore reliability. If the approach works, detectors could identify synthetic faces more consistently without needing entirely new network architectures.

Core claim

The Curvelet Transform is applied to input faces, after which wedge-level attention and scale-aware spatial masking are trained to selectively boost frequency components tied to forgery traces; the inverse transform then produces a spatially enhanced image that a pretrained Xception classifier uses to distinguish authentic from manipulated faces, yielding higher accuracy than spatial-only baselines on both low- and high-compression versions of the FaceForensics++ dataset.

What carries the argument

Curvelet Transform equipped with learned wedge-level attention and scale-aware spatial masking, which together select and amplify forgery-discriminative frequency components before spatial reconstruction for CNN input.

If this is right

  • Detectors gain robustness to common video compression without retraining the entire classifier from scratch.
  • Frequency cues become more interpretable, allowing inspection of which directional scales carry forgery signals.
  • The same preprocessing pipeline can be attached to other pretrained CNN backbones for similar gains.
  • High-compression performance remains competitive, suggesting the method preserves essential artifact information even when pixel data is degraded.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same wedge-and-scale selection logic could be tested on wavelet or shearlet transforms to compare which frequency basis best isolates deepfake traces.
  • Extending the masking to video sequences might improve detection of temporal inconsistencies introduced by generators.
  • Forensic tools could visualize the attended frequency wedges to explain why a particular face was flagged as fake.

Load-bearing premise

The Curvelet Transform's directional properties combined with the attention and masking will reliably isolate forgery-related frequency artifacts without overfitting to the training set or creating new reconstruction errors.

What would settle it

Performance falling below a standard spatial Xception baseline on a held-out deepfake dataset generated by unseen methods or subjected to novel compression ratios would show the frequency selection does not generalize.

Figures

Figures reproduced from arXiv: 2604.12028 by Ramadhan J. Mstafa, Salar Adel Sabri.

Figure 2
Figure 2. Figure 2: The main steps of the WedgeSE module. Mathematically, given wedge coefficients 𝑥 ∈ ℝ𝐶×𝐻×𝑊 , where each wedge is treated as a separate channel, the gating vector 𝑔 ∈ ℝ𝐶×1×1 , is computed as: 𝑔𝐶×1×1 = 𝜎(𝑀𝐿𝑃(𝑓𝑝𝑜𝑜𝑙 (𝑓𝑑𝑒𝑝𝑡ℎ𝑤𝑖𝑠𝑒(𝑥)))) (4) Where 𝑔𝐶×1×1 ∈ (0,1) denotes the per channel importance weights 𝑓𝑑𝑒𝑝𝑡ℎ𝑤𝑖𝑠𝑒 denotes the stack of depthwise convolutions 𝑓𝑝𝑜𝑜𝑙 is a global average pooling operation to 1 × 1 𝜎 is… view at source ↗
Figure 4
Figure 4. Figure 4: Result of Curvelet-FAFE for a single RGB color channel. The newly formed 12-channel feature representations are concatenated and passed to a modified, pretrained Xception network, which is fine-tuned end-to-end for binary deepfake classification. Experiments: In this section, we introduce the overall experimental setups. Then, we present a comprehensive evaluation of the proposed approach covering various … view at source ↗
Figure 5
Figure 5. Figure 5: Saliency maps for different deepfake methods using Grad-CAM (Selvaraju et al., 2016) visualization [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: shows the training and validation curves for loss and accuracy. To further analyse the learning behaviour of the model, the training and validation curves for loss and accuracy on the FF++ (HQ) Dataset are presented. The training and validation curves reflect a consistent and well-structured learning trajectory. Training accuracy improves while validation accuracy follows closely, indicating strong general… view at source ↗
read the original abstract

The proliferation of sophisticated generative models has significantly advanced the realism of synthetic facial content, known as deepfakes, raising serious concerns about digital trust. Although modern deep learning-based detectors perform well, many rely on spatial-domain features that degrade under compression. This limitation has prompted a shift toward integrating frequency-domain representations with deep learning to improve robustness. Prior research has explored frequency transforms such as Discrete Cosine Transform (DCT), Fast Fourier Transform (FFT), and Wavelet Transform, among others. However, to the best of our knowledge, the Curvelet Transform, despite its superior directional and multiscale properties, remains entirely unexplored in the context of deepfake detection. In this work, we introduce a novel Curvelet-based detection approach that enhances feature quality through wedge-level attention and scale-aware spatial masking, both trained to selectively emphasize discriminative frequency components. The refined frequency cues are reconstructed and passed to a modified pretrained Xception network for classification. Evaluated on two compression qualities in the challenging FaceForensics++ dataset, our method achieves 98.48% accuracy and 99.96% AUC on FF++ low compression, while maintaining strong performance under high compression, demonstrating the efficacy and interpretability of Curvelet-informed forgery detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces a Curvelet-based pipeline for deepfake detection that decomposes input images via the Curvelet Transform, applies learned wedge-level attention and scale-aware spatial masking to emphasize discriminative frequency components, reconstructs the enhanced representation, and feeds it into a modified pretrained Xception network for binary classification. On the FaceForensics++ benchmark it reports 98.48% accuracy and 99.96% AUC under low compression, with maintained performance under high compression, claiming improved robustness and interpretability relative to prior frequency-domain approaches (DCT, FFT, Wavelet).

Significance. If the reported metrics are reproducible and the ablations isolate the contribution of the Curvelet-specific modules, the work would be a useful incremental advance: it is the first application of Curvelets (with their directional and multiscale properties) to deepfake detection and supplies a concrete, trainable mechanism for frequency-component selection. The emphasis on compression robustness addresses a known practical limitation of spatial-domain detectors.

minor comments (2)
  1. [Abstract] Abstract: the performance numbers are presented without any baseline comparison or statistical significance test; adding at least the strongest competing frequency-domain method (e.g., the best reported Wavelet or DCT result on the same split) would strengthen the claim of efficacy.
  2. [Methods] The description of the reconstruction step after masking is too terse to assess whether inverse Curvelet artifacts are controlled; a short paragraph or diagram in §3 would clarify this.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. The recognition of our work as the first application of Curvelets to deepfake detection, along with the trainable frequency-component selection mechanism and focus on compression robustness, is appreciated. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical pipeline for deepfake detection that applies the Curvelet Transform followed by wedge-level attention and scale-aware masking before feeding into a modified Xception network. No equations, derivations, or fitted parameters are present in the provided text. All central claims consist of reported accuracy and AUC metrics on the FaceForensics++ benchmark under two compression levels. Because the work contains no mathematical reduction steps, self-definitional constructs, or load-bearing self-citations that collapse the result to its inputs, the derivation chain is self-contained and the circularity score is zero.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, or new entities are described in the abstract. The work relies on the standard Curvelet Transform from prior literature and conventional deep learning training practices.

pith-pipeline@v0.9.0 · 5518 in / 1216 out tokens · 53212 ms · 2026-05-10T15:52:21.619344+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

  1. [1]

    Abbas, F., & Taeihagh, A. (2024). Unmasking deepfakes: A systematic review of deepfake detection and generation techniques using artificial intelligence. Expert Systems with Applications, 252, 124260. https://doi.org/10.1016/J.ESWA.2024.124260 Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: A compact facial video forgery detection ne...

  2. [2]

    https://doi.org/10.3390/ELECTRONICS12163407 deepfakes/faceswap: Deepfakes Software For All . (n.d.). Retrieved August 15, 2025, from https://github.com/deepfakes/faceswap Durall, R., Keuper, M., & Keuper, J. (2020). Watch your up - convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions. Proceedings of the IE...

  3. [3]

    Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., & Holz, T. (2020a). Leveraging Frequency Analysis for Deep Fake Image Recognition . https://doi.org/10.5555/3524938.3525242 Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., & Holz, T. (2020b). Leveraging Frequency Analysis for Deep Fake Image Recognition. Gao, J., Xia, ...

  4. [4]

    https://doi.org/10.1016/J.ESWA.2017.06.038 Ojha, U., Li, Y., & Lee, Y. J. (2023). Towards Universal Fake Image Detectors that Generalize Across Generative Models. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , 2023-June, 24480–24489. https://doi.org/10.1109/CVPR52729.2023.02345 Qian, Y., Yin, G., Sheng, L....