XLS-R: Self-supervised cross-lingual speech represen- tation learning at scale

· 2021 · arXiv 2111.09296

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

MixFake: Benchmarking and Enhancing Audio Deepfake Detection in Diverse Real-world Mixed Audio

cs.SD · 2026-05-22 · unverdicted · novelty 7.0

MixFake is a new benchmark for mixed-authenticity audio and a multi-stream prompt tuning method achieves 0.95% EER foreground and 7.72% absolute gain in complex background deepfake detection.

Profiling the Voice: Speaker-Specific Phoneme Fingerprinting for Speech Deepfake Detection

cs.SD · 2026-05-18 · unverdicted · novelty 7.0

PVP models speaker-specific phoneme acoustic distributions with lightweight GMMs trained only on real speech to detect deepfakes of persons-of-interest, outperforming generic detectors and introducing a new Chinese POI dataset.

Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation

cs.CL · 2026-04-06 · conditional · novelty 7.0

Multilingual ASR models show 39.7-297% zero-shot WER on Pashto public data, Whisper models output correct script in under 0.8% of cases, and fine-tuned models degrade to 32.5-59% WER on out-of-domain sets.

A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

eess.AS · 2026-03-02 · unverdicted · novelty 7.0

Spoof-SUPERB benchmark shows large-scale discriminative SSL models such as XLS-R, UniSpeech-SAT, and WavLM Large outperform others in audio deepfake detection and maintain robustness under acoustic degradations.

Forensic Similarity for Speech Deepfakes

cs.SD · 2025-10-03 · unverdicted · novelty 6.0

Introduces forensic similarity for speech deepfakes via a Siamese feature extractor and similarity network to verify shared forensic traces and source models between audio segments.

eess.AS · 2026-04-28 · unverdicted · novelty 4.0

Cosine similarity in SupCon with a delayed negative queue on wav2vec2 XLS-R yields the lowest equal error rates for deepfake audio detection on in-the-wild and pooled evaluations.

Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus

cs.CL · 2026-04-14 · unverdicted · novelty 4.0

A bilingual TTS system for the Peruvian Constitution in Quechua and Spanish is developed with XTTS v2, F5-TTS, and DiFlow-TTS, releasing checkpoints and audio to support low-resource speech synthesis.

citing papers explorer

Showing 7 of 7 citing papers.

MixFake: Benchmarking and Enhancing Audio Deepfake Detection in Diverse Real-world Mixed Audio cs.SD · 2026-05-22 · unverdicted · none · ref 31
MixFake is a new benchmark for mixed-authenticity audio and a multi-stream prompt tuning method achieves 0.95% EER foreground and 7.72% absolute gain in complex background deepfake detection.
Profiling the Voice: Speaker-Specific Phoneme Fingerprinting for Speech Deepfake Detection cs.SD · 2026-05-18 · unverdicted · none · ref 2
PVP models speaker-specific phoneme acoustic distributions with lightweight GMMs trained only on real speech to detect deepfakes of persons-of-interest, outperforming generic detectors and introducing a new Chinese POI dataset.
Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation cs.CL · 2026-04-06 · conditional · none · ref 25
Multilingual ASR models show 39.7-297% zero-shot WER on Pashto public data, Whisper models output correct script in under 0.8% of cases, and fine-tuned models degrade to 32.5-59% WER on out-of-domain sets.
A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection eess.AS · 2026-03-02 · unverdicted · none · ref 4
Spoof-SUPERB benchmark shows large-scale discriminative SSL models such as XLS-R, UniSpeech-SAT, and WavLM Large outperform others in audio deepfake detection and maintain robustness under acoustic degradations.
Forensic Similarity for Speech Deepfakes cs.SD · 2025-10-03 · unverdicted · none · ref 11
Introduces forensic similarity for speech deepfakes via a Siamese feature extractor and similarity network to verify shared forensic traces and source models between audio segments.
Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection eess.AS · 2026-04-28 · unverdicted · none · ref 28
Cosine similarity in SupCon with a delayed negative queue on wav2vec2 XLS-R yields the lowest equal error rates for deepfake audio detection on in-the-wild and pooled evaluations.
Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus cs.CL · 2026-04-14 · unverdicted · none · ref 7
A bilingual TTS system for the Peruvian Constitution in Quechua and Spanish is developed with XTTS v2, F5-TTS, and DiFlow-TTS, releasing checkpoints and audio to support low-resource speech synthesis.

XLS-R: Self-supervised cross-lingual speech represen- tation learning at scale

fields

years

verdicts

representative citing papers

citing papers explorer