Audio Deepfake Detection at the First Greeting: "Hi!"

· 2026 · eess.AS · arXiv 2601.19573

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

This paper focuses on audio deepfake detection under real-world communication degradations, with an emphasis on ultra-short inputs (0.5-2.0s), targeting the capability to detect synthetic speech at a conversation opening, e.g., when a scammer says "Hi." We propose Short-MGAA (S-MGAA), a novel lightweight extension of Multi-Granularity Adaptive Time-Frequency Attention, designed to enhance discriminative representation learning for short, degraded inputs subjected to communication processing and perturbations. The S-MGAA integrates two tailored modules: a Pixel-Channel Enhanced Module (PCEM) that amplifies fine-grained time-frequency saliency, and a Frequency Compensation Enhanced Module (FCEM) to supplement limited temporal evidence via multi-scale frequency modeling and adaptive frequency-temporal interaction. Extensive experiments demonstrate that S-MGAA consistently surpasses nine state-of-the-art baselines while achieving strong robustness to degradations and favorable efficiency-accuracy trade-offs, including low RTF, competitive GFLOPs, compact parameters, and reduced training cost, highlighting its strong potential for real-time deployment in communication systems and edge devices.

representative citing papers

Audio Deepfake Detection at the First Greeting: "Hi!"

eess.AS · 2026-01-27 · unverdicted · novelty 5.0

S-MGAA adds pixel-channel enhancement and frequency compensation modules to improve audio deepfake detection on very short, degraded speech inputs.

citing papers explorer

Showing 1 of 1 citing paper.

Audio Deepfake Detection at the First Greeting: "Hi!" eess.AS · 2026-01-27 · unverdicted · none · ref 1 · internal anchor
S-MGAA adds pixel-channel enhancement and frequency compensation modules to improve audio deepfake detection on very short, degraded speech inputs.

Audio Deepfake Detection at the First Greeting: "Hi!"

fields

years

verdicts

representative citing papers

citing papers explorer