pith. sign in

Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 4 2024 2

roles

background 1

polarities

background 1

clear filters

representative citing papers

VoiceBench: Benchmarking LLM-Based Voice Assistants

cs.CL · 2024-10-22 · unverdicted · novelty 7.0

VoiceBench is the first benchmark for multi-faceted evaluation of LLM voice assistants using real and synthetic spoken instructions with speaker, environmental, and content variations.

Moshi: a speech-text foundation model for real-time dialogue

eess.AS · 2024-09-17 · accept · novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models

cs.CL · 2026-06-09 · unverdicted · novelty 6.0

A multi-axis RL alignment technique improves pause handling, turn-taking, backchanneling, and interruption response in full-duplex spoken dialogue models by optimizing axis-specific rewards derived from human audio segments.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • VoiceBench: Benchmarking LLM-Based Voice Assistants cs.CL · 2024-10-22 · unverdicted · none · ref 107

    VoiceBench is the first benchmark for multi-faceted evaluation of LLM voice assistants using real and synthetic spoken instructions with speaker, environmental, and content variations.

  • Moshi: a speech-text foundation model for real-time dialogue eess.AS · 2024-09-17 · accept · none · ref 111

    Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.