DyaPlex: Full-Duplex Speech-Motion Model for Dyadic Interaction

Amrita Mazumdar; Christian Jacobsen; Hongyu Liu; Ka Chun Cheung; Koki Nagano; Michael Stengel; Rajarshi Roy; Seonwook Park; Shalini De Mello; Shengze Wang

arxiv: 2606.03874 · v1 · pith:XDVQVNZ7new · submitted 2026-06-02 · 💻 cs.CV · cs.RO

DyaPlex: Full-Duplex Speech-Motion Model for Dyadic Interaction

Koki Nagano , Hongyu Liu , Seonwook Park , Tianye Li , Amrita Mazumdar , Christian Jacobsen , Shengze Wang , Michael Stengel

show 4 more authors

Rajarshi Roy Ka Chun Cheung Simon See Shalini De Mello

This is my paper

classification 💻 cs.CV cs.RO

keywords modelinteractiondyadicfull-duplexspeechmotionstreamingdyaplex

0 comments

read the original abstract

We present DyaPlex, a streaming, full-duplex speech-and-motion model designed for dyadic interaction. To capture the continuous and reciprocal nature of human communication, this full-duplex capability empowers the agent to simultaneously perceive and generate both speech and physical motion in a streaming fashion. At its core, our method leverages the strong priors of a foundational full-duplex speech model and integrates a novel motion pathway, thereby achieving fully synchronized multi-modal interaction. Specifically, we design a dual-tower Transformer architecture that preserves the zero-shot conversational reasoning of a frozen base speech model while constructing a deeply coupled, streaming motion pathway. By introducing a unified dyadic token interleaving mechanism and guiding cross-attention via a time-aligned speech-motion RoPE, our model effectively aligns autoregressive motions with rich latent speech features. Trained on the 4,000-hour Seamless Interaction dataset, our model effectively captures cross-speaker dependencies and establishes new state-of-the-art performance across both monadic and dyadic human interaction benchmarks.

This paper has not been read by Pith yet.

DyaPlex: Full-Duplex Speech-Motion Model for Dyadic Interaction

discussion (0)