Granite-speech: open-source speech-aware llms with strong english asr capabilities

George Saon, Avihu Dekel, Alexander Brooks, Tohru Nagano, Abraham Daniels, Aharon Satt, Ashish Mittal, Brian Kingsbury, David Haws, Edmilson Morais, et al · 2025 · arXiv 2505.08699

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

representative citing papers

AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR

cs.CL · 2026-04-30 · unverdicted · novelty 7.0

A new multi-accent long-form call-center dialogue dataset for English ASR evaluation shows substantial performance variation across accents and segmentation methods.

Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.

Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs

cs.CL · 2026-04-15 · unverdicted · novelty 6.0

GLOW integrates a pre-trained GNN for candidate prediction with an LLM for joint symbolic-semantic reasoning over incomplete KGs, reporting up to 53.3% gains on standard benchmarks and a new GLOW-BENCH dataset.

Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction

eess.AS · 2026-04-14 · unverdicted · novelty 6.0

Common-word acoustic cues and bias-word position prediction in speech LLMs cut rare-word transcription errors by 16.3% versus baselines, including out-of-domain cases.

A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems

cs.CL · 2025-09-29 · unverdicted · novelty 5.0

A novel alignment algorithm using dynamic programming and beam search provides more accurate matching of individual errors between reference and model transcripts for improved speech recognition evaluation.

In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions

eess.AS · 2026-04-14 · unverdicted · novelty 4.0

Lightweight training strategies allow speech-aware LLMs to output accurate word timestamps alongside ASR transcripts while also improving recognition quality across datasets.

LLMs and Speech: Integration vs. Combination

eess.AS · 2026-03-16 · unverdicted · novelty 4.0

Tight integration of acoustic models with LLMs for ASR is ablated against shallow fusion across label units, fine-tuning strategies, LLM sizes, and joint CTC decoding to mitigate hallucinations.

citing papers explorer

Showing 7 of 7 citing papers.

AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR cs.CL · 2026-04-30 · unverdicted · none · ref 19
A new multi-accent long-form call-center dialogue dataset for English ASR evaluation shows substantial performance variation across accents and segmentation methods.
Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition cs.CL · 2026-04-23 · unverdicted · none · ref 42
LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.
Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs cs.CL · 2026-04-15 · unverdicted · none · ref 2
GLOW integrates a pre-trained GNN for candidate prediction with an LLM for joint symbolic-semantic reasoning over incomplete KGs, reporting up to 53.3% gains on standard benchmarks and a new GLOW-BENCH dataset.
Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction eess.AS · 2026-04-14 · unverdicted · none · ref 35
Common-word acoustic cues and bias-word position prediction in speech LLMs cut rare-word transcription errors by 16.3% versus baselines, including out-of-domain cases.
A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems cs.CL · 2025-09-29 · unverdicted · none · ref 9
A novel alignment algorithm using dynamic programming and beam search provides more accurate matching of individual errors between reference and model transcripts for improved speech recognition evaluation.
In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions eess.AS · 2026-04-14 · unverdicted · none · ref 26
Lightweight training strategies allow speech-aware LLMs to output accurate word timestamps alongside ASR transcripts while also improving recognition quality across datasets.
LLMs and Speech: Integration vs. Combination eess.AS · 2026-03-16 · unverdicted · none · ref 20
Tight integration of acoustic models with LLMs for ASR is ablated against shallow fusion across label units, fine-tuning strategies, LLM sizes, and joint CTC decoding to mitigate hallucinations.

Granite-speech: open-source speech-aware llms with strong english asr capabilities

fields

years

verdicts

representative citing papers

citing papers explorer