wav2vec 2.0: A framework for self-supervised learning of speech repre- sentations

· 2020

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Can Large Language Models Reliably Correct Errors in Low-Resource ASR? A Contamination-Aware Case Study on West Frisian

cs.CL · 2026-05-19 · conditional · novelty 7.0

LLM generative error correction improves low-resource Frisian ASR performance, with comparable gains on a contamination-controlled offline dataset confirming true correction ability.

Joint Fullband-Subband Modeling for High-Resolution SingFake Detection

cs.SD · 2026-04-06 · unverdicted · novelty 7.0

A joint fullband-subband model using high-resolution 44.1 kHz audio outperforms standard 16 kHz detectors for singing voice deepfake detection by exploiting spectrum-specific synthesis artifacts.

APEX: Audio Prototype EXplanations for Classification Tasks

cs.SD · 2026-05-11 · unverdicted · novelty 6.0

APEX generates four types of prototype-based explanations for pre-trained audio classifiers that preserve output invariance and target acoustic properties better than gradient methods applied to spectrograms.

Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

cs.CL · 2026-04-29 · unverdicted · novelty 5.0

Fine-tuned speech representation models with hierarchical classification outperform multimodal LLMs on pediatric speech sound disorder tasks.

Who is Speaking or Who is Depressed? A Controlled Study of Speaker Leakage in Speech-Based Depression Detection

eess.AS · 2026-04-15 · unverdicted · novelty 5.0

Speech-based depression detection models primarily learn speaker identity rather than depression biomarkers, with performance dropping sharply on unseen speakers even under adversarial training.

Closing the Speech-Text Gap with Limited Audio for Effective Domain Adaptation in LLM-Based ASR

cs.CL · 2026-04-07 · unverdicted · novelty 5.0

Mixed batching with only 10% target-domain speech achieves word error rates matching or exceeding conventional full-dataset ASR fine-tuning in LLM-based models.

IQRA 2026: Interspeech Challenge on Automatic Pronunciation Assessment for Modern Standard Arabic (MSA)

cs.SD · 2026-03-31 · unverdicted · novelty 5.0

The IQRA 2026 challenge on Arabic mispronunciation detection reports a 0.28 F1-score gain from new authentic human error data and diverse modeling approaches including self-supervised and audio-language models.

Contrastive Regularization for Accent-Robust ASR

cs.SD · 2026-05-05 · unverdicted · novelty 4.0

Supervised contrastive learning as an auxiliary loss during CTC fine-tuning improves accent robustness in ASR, yielding up to 29% relative WER reduction on unseen accents.

citing papers explorer

Showing 8 of 8 citing papers.

Can Large Language Models Reliably Correct Errors in Low-Resource ASR? A Contamination-Aware Case Study on West Frisian cs.CL · 2026-05-19 · conditional · none · ref 27
LLM generative error correction improves low-resource Frisian ASR performance, with comparable gains on a contamination-controlled offline dataset confirming true correction ability.
Joint Fullband-Subband Modeling for High-Resolution SingFake Detection cs.SD · 2026-04-06 · unverdicted · none · ref 33
A joint fullband-subband model using high-resolution 44.1 kHz audio outperforms standard 16 kHz detectors for singing voice deepfake detection by exploiting spectrum-specific synthesis artifacts.
APEX: Audio Prototype EXplanations for Classification Tasks cs.SD · 2026-05-11 · unverdicted · none · ref 46
APEX generates four types of prototype-based explanations for pre-trained audio classifiers that preserve output invariance and target acoustic properties better than gradient methods applied to spectrograms.
Multimodal LLMs are not all you need for Pediatric Speech Language Pathology cs.CL · 2026-04-29 · unverdicted · none · ref 28
Fine-tuned speech representation models with hierarchical classification outperform multimodal LLMs on pediatric speech sound disorder tasks.
Who is Speaking or Who is Depressed? A Controlled Study of Speaker Leakage in Speech-Based Depression Detection eess.AS · 2026-04-15 · unverdicted · none · ref 21
Speech-based depression detection models primarily learn speaker identity rather than depression biomarkers, with performance dropping sharply on unseen speakers even under adversarial training.
Closing the Speech-Text Gap with Limited Audio for Effective Domain Adaptation in LLM-Based ASR cs.CL · 2026-04-07 · unverdicted · none · ref 15
Mixed batching with only 10% target-domain speech achieves word error rates matching or exceeding conventional full-dataset ASR fine-tuning in LLM-based models.
IQRA 2026: Interspeech Challenge on Automatic Pronunciation Assessment for Modern Standard Arabic (MSA) cs.SD · 2026-03-31 · unverdicted · none · ref 22
The IQRA 2026 challenge on Arabic mispronunciation detection reports a 0.28 F1-score gain from new authentic human error data and diverse modeling approaches including self-supervised and audio-language models.
Contrastive Regularization for Accent-Robust ASR cs.SD · 2026-05-05 · unverdicted · none · ref 8
Supervised contrastive learning as an auxiliary loss during CTC fine-tuning improves accent robustness in ASR, yielding up to 29% relative WER reduction on unseen accents.

wav2vec 2.0: A framework for self-supervised learning of speech repre- sentations

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer