A new multi-accent long-form call-center dialogue dataset for English ASR evaluation shows substantial performance variation across accents and segmentation methods.
Granite-speech: open-source speech-aware llms with strong english asr capabilities
7 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 7representative citing papers
LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.
GLOW integrates a pre-trained GNN for candidate prediction with an LLM for joint symbolic-semantic reasoning over incomplete KGs, reporting up to 53.3% gains on standard benchmarks and a new GLOW-BENCH dataset.
Common-word acoustic cues and bias-word position prediction in speech LLMs cut rare-word transcription errors by 16.3% versus baselines, including out-of-domain cases.
A novel alignment algorithm using dynamic programming and beam search provides more accurate matching of individual errors between reference and model transcripts for improved speech recognition evaluation.
Lightweight training strategies allow speech-aware LLMs to output accurate word timestamps alongside ASR transcripts while also improving recognition quality across datasets.
Tight integration of acoustic models with LLMs for ASR is ablated against shallow fusion across label units, fine-tuning strategies, LLM sizes, and joint CTC decoding to mitigate hallucinations.
citing papers explorer
-
AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR
A new multi-accent long-form call-center dialogue dataset for English ASR evaluation shows substantial performance variation across accents and segmentation methods.
-
Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition
LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.
-
Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs
GLOW integrates a pre-trained GNN for candidate prediction with an LLM for joint symbolic-semantic reasoning over incomplete KGs, reporting up to 53.3% gains on standard benchmarks and a new GLOW-BENCH dataset.
-
Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction
Common-word acoustic cues and bias-word position prediction in speech LLMs cut rare-word transcription errors by 16.3% versus baselines, including out-of-domain cases.
-
A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems
A novel alignment algorithm using dynamic programming and beam search provides more accurate matching of individual errors between reference and model transcripts for improved speech recognition evaluation.
-
In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions
Lightweight training strategies allow speech-aware LLMs to output accurate word timestamps alongside ASR transcripts while also improving recognition quality across datasets.
-
LLMs and Speech: Integration vs. Combination
Tight integration of acoustic models with LLMs for ASR is ablated against shallow fusion across label units, fine-tuning strategies, LLM sizes, and joint CTC decoding to mitigate hallucinations.