pith. sign in

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

eess.AS 2

years

2024 1 2023 1

representative citing papers

Moshi: a speech-text foundation model for real-time dialogue

eess.AS · 2024-09-17 · accept · novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

citing papers explorer

Showing 2 of 2 citing papers.