Speech and ges- tures are jointly realized, making speech and gestures coordinated expressions of the same communicative process [1, 2, 3]

INTRODUCTION Human communication is inherently multimodal · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction

cs.SD · 2025-10-13 · unverdicted · novelty 6.0

A unified discrete autoregressive model for joint text-to-speech and co-speech gesture synthesis via interleaved token sequences and modality-specific decoders.

citing papers explorer

Showing 1 of 1 citing paper.

Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction cs.SD · 2025-10-13 · unverdicted · none · ref 1
A unified discrete autoregressive model for joint text-to-speech and co-speech gesture synthesis via interleaved token sequences and modality-specific decoders.

Speech and ges- tures are jointly realized, making speech and gestures coordinated expressions of the same communicative process [1, 2, 3]

fields

years

verdicts

representative citing papers

citing papers explorer