Enhancing Generalization of Speech Large Language Models with Multi-Task Behav- ior Imitation and Speech-Text Interleaving

· 2025 · arXiv 2505.18644

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages

cs.SD · 2025-09-18 · unverdicted · novelty 5.0

Introduces XLSR-Thai encoder, U-Align alignment, and Thai-SUP data pipeline to enable multitask speech understanding SLLMs for Thai.

Rethinking Speech-LLM Integration for ASR: Effective Joint Speech-Text Training by Interleaving

cs.CL · 2026-07-02 · unverdicted · novelty 4.0

JSTIP interleaves speech and text sequences during pretraining on 38k hours of ASR data to improve entity accuracy over ASR-only and simple joint-training baselines while matching performance from domain text.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Rethinking Speech-LLM Integration for ASR: Effective Joint Speech-Text Training by Interleaving cs.CL · 2026-07-02 · unverdicted · none · ref 32
JSTIP interleaves speech and text sequences during pretraining on 38k hours of ASR data to improve entity accuracy over ASR-only and simple joint-training baselines while matching performance from domain text.

Enhancing Generalization of Speech Large Language Models with Multi-Task Behav- ior Imitation and Speech-Text Interleaving

fields

years

verdicts

representative citing papers

citing papers explorer