Speaker-Reasoner is an end-to-end speech LLM that iteratively analyzes audio structure, predicts temporal boundaries, and jointly models speaker identity, gender, timestamps, and transcription using a speaker-aware cache for long audio.
Speakerlm: End-to-end versa- tile speaker diarization and recognition with multimodal large lan- guage models
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
DM-ASR reformulates multi-speaker ASR as multi-turn dialogue generation conditioned on diarization results, achieving competitive benchmark performance with relatively small models and limited data.
Introduces TPI-Train dataset and TPI-Bench to mitigate semantic shortcut learning in SLMs by enforcing acoustic cue prioritization for third-party interruption handling.
PASK introduces the DD-MM-PAS paradigm for streaming proactive agents with intent-aware detection, hybrid memory modeling, and a new real-world benchmark where the IntentFlow model matches top LLMs on latency while finding deeper intents.
citing papers explorer
-
Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
Speaker-Reasoner is an end-to-end speech LLM that iteratively analyzes audio structure, predicts temporal boundaries, and jointly models speaker identity, gender, timestamps, and transcription using a speaker-aware cache for long audio.
-
DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models
DM-ASR reformulates multi-speaker ASR as multi-turn dialogue generation conditioned on diarization results, achieving competitive benchmark performance with relatively small models and limited data.
-
Still Between Us? Evaluating and Improving Voice Assistant Robustness to Third-Party Interruptions
Introduces TPI-Train dataset and TPI-Bench to mitigate semantic shortcut learning in SLMs by enforcing acoustic cue prioritization for third-party interruption handling.
-
PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory
PASK introduces the DD-MM-PAS paradigm for streaming proactive agents with intent-aware detection, hybrid memory modeling, and a new real-world benchmark where the IntentFlow model matches top LLMs on latency while finding deeper intents.