IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems

· 2026 · cs.SD · arXiv 2606.06559

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Full-duplex spoken dialogue models allow voice agents to listen and speak concurrently, enabling natural interaction with real-time overlap. However, end-to-end dual-channel models that jointly encode user and agent streams may degrade in realistic acoustic environments: interfering speakers leaking into the user microphone can be encoded as part of the user query, corrupting the LLM's conditioning and causing unstable turn-taking and reduced response quality. We propose Interference-Resilient Adaptive Fusion (IRAF), a lightweight, streaming-compatible module that modulates the contribution of user audio to the LLM frame by frame. IRAF predicts a scalar reliability gate from target-speaker and user audio embeddings and rescales user representations before fusion with agent embeddings. Experiments on MS-MARCO and InstructS2S-200K show consistent gains in response quality and full-duplex interaction under interfering-speaker conditions.

representative citing papers

IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems

cs.SD · 2026-06-04 · unverdicted · novelty 4.0

IRAF introduces an adaptive fusion module that uses a predicted scalar reliability gate to reduce the impact of interfering speakers on user audio representations in end-to-end full-duplex spoken dialogue systems, with reported gains on MS-MARCO and InstructS2S-200K.

citing papers explorer

Showing 1 of 1 citing paper.

IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems cs.SD · 2026-06-04 · unverdicted · none · ref 1 · internal anchor
IRAF introduces an adaptive fusion module that uses a predicted scalar reliability gate to reduce the impact of interfering speakers on user audio representations in end-to-end full-duplex spoken dialogue systems, with reported gains on MS-MARCO and InstructS2S-200K.

IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems

fields

years

verdicts

representative citing papers

citing papers explorer