Librispeech: An asr corpus based on public domain audio books

Vassil Panayotov, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur · 2015 · arXiv 2015.71789

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

Mechanistic Interpretability of ASR models using Sparse Autoencoders

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

Sparse autoencoders applied to Whisper ASR reveal monosemantic features across linguistic boundaries and demonstrate cross-lingual feature steering.

Hearing the Unspoken: Language Model Priors for Acoustic Adversarial Attacks

cs.LG · 2026-06-05 · unverdicted · novelty 6.0

Semantic Gambit attack uses real-time LLM priors to overcome causal constraints in ASR, tripling corpus word error rate to 35.6%.

Asymmetric Phase Coding Audio Watermarking

cs.CR · 2026-05-08 · unverdicted · novelty 6.0

APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.

GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking

cs.SD · 2026-04-10 · unverdicted · novelty 6.0

GRM ranks Mel bands by attack contribution versus utility sensitivity, perturbs a subset, and learns a universal perturbation to reach 88.46% average jailbreak success rate with improved attack-utility trade-off on four audio LLMs.

GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot

cs.CL · 2024-12-03 · conditional · novelty 6.0

GLM-4-Voice builds an end-to-end spoken chatbot by deriving a 175bps single-codebook tokenizer from ASR, synthesizing interleaved speech-text data, and continuing pre-training of GLM-4-9B on up to 1 trillion tokens before fine-tuning on conversational speech.

citing papers explorer

Showing 5 of 5 citing papers.

Mechanistic Interpretability of ASR models using Sparse Autoencoders cs.CL · 2026-05-12 · unverdicted · none · ref 10
Sparse autoencoders applied to Whisper ASR reveal monosemantic features across linguistic boundaries and demonstrate cross-lingual feature steering.
Hearing the Unspoken: Language Model Priors for Acoustic Adversarial Attacks cs.LG · 2026-06-05 · unverdicted · none · ref 15
Semantic Gambit attack uses real-time LLM priors to overcome causal constraints in ASR, tripling corpus word error rate to 35.6%.
Asymmetric Phase Coding Audio Watermarking cs.CR · 2026-05-08 · unverdicted · none · ref 14
APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.
GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking cs.SD · 2026-04-10 · unverdicted · none · ref 25
GRM ranks Mel bands by attack contribution versus utility sensitivity, perturbs a subset, and learns a universal perturbation to reach 88.46% average jailbreak success rate with improved attack-utility trade-off on four audio LLMs.
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot cs.CL · 2024-12-03 · conditional · none · ref 35
GLM-4-Voice builds an end-to-end spoken chatbot by deriving a 175bps single-codebook tokenizer from ASR, synthesizing interleaved speech-text data, and continuing pre-training of GLM-4-9B on up to 1 trillion tokens before fine-tuning on conversational speech.

Librispeech: An asr corpus based on public domain audio books

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer