IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems

Jiajun Deng; Nikita Kuzmin; Simon Lui; Tao Zhong; Tianxiang Cao; Tristan Tsoi; Xunying Liu; Yinke Zhu; Zhili Tan

arxiv: 2606.06559 · v1 · pith:4OM7HCWOnew · submitted 2026-06-04 · 💻 cs.SD · cs.AI· eess.AS

IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems

Tao Zhong , Jiajun Deng , Nikita Kuzmin , Yinke Zhu , Tianxiang Cao , Tristan Tsoi , Zhili Tan , Simon Lui

show 1 more author

Xunying Liu

This is my paper

classification 💻 cs.SD cs.AIeess.AS

keywords userfull-duplexfusionirafadaptiveagentaudiodialogue

0 comments

read the original abstract

Full-duplex spoken dialogue models allow voice agents to listen and speak concurrently, enabling natural interaction with real-time overlap. However, end-to-end dual-channel models that jointly encode user and agent streams may degrade in realistic acoustic environments: interfering speakers leaking into the user microphone can be encoded as part of the user query, corrupting the LLM's conditioning and causing unstable turn-taking and reduced response quality. We propose Interference-Resilient Adaptive Fusion (IRAF), a lightweight, streaming-compatible module that modulates the contribution of user audio to the LLM frame by frame. IRAF predicts a scalar reliability gate from target-speaker and user audio embeddings and rescales user representations before fusion with agent embeddings. Experiments on MS-MARCO and InstructS2S-200K show consistent gains in response quality and full-duplex interaction under interfering-speaker conditions.

This paper has not been read by Pith yet.

IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems

discussion (0)