A dual-branch multimodal model combining ECAPA-TDNN on anonymized audio and BERT on transcripts outperforms prior attackers on five of seven VPAC benchmarks and reaches SOTA with augmentation.
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VoxATtack: A Multimodal Attack on Voice Anonymization Systems
A dual-branch multimodal model combining ECAPA-TDNN on anonymized audio and BERT on transcripts outperforms prior attackers on five of seven VPAC benchmarks and reaches SOTA with augmentation.