SCA: Streaming Cross-attention Alignment for Echo Cancellation

Kaustubh Kalgaonkar; Sriram Srinivasan; Xin Lei; Yang Liu; Yangyang Shi; Yun Li

arxiv: 2211.00589 · v1 · pith:YFA2FKXWnew · submitted 2022-11-01 · 📡 eess.AS · cs.SD· eess.SP

SCA: Streaming Cross-attention Alignment for Echo Cancellation

Yang Liu , Yangyang Shi , Yun Li , Kaustubh Kalgaonkar , Sriram Srinivasan , Xin Lei This is my paper

classification 📡 eess.AS cs.SDeess.SP

keywords echocancellationend-to-endalignmentdeepmethodspeechalgorithms

0 comments

read the original abstract

End-to-End deep learning has shown promising results for speech enhancement tasks, such as noise suppression, dereverberation, and speech separation. However, most state-of-the-art methods for echo cancellation are either classical DSP-based or hybrid DSP-ML algorithms. Components such as the delay estimator and adaptive linear filter are based on traditional signal processing concepts, and deep learning algorithms typically only serve to replace the non-linear residual echo suppressor. This paper introduces an end-to-end echo cancellation network with a streaming cross-attention alignment (SCA). Our proposed method can handle unaligned inputs without requiring external alignment and generate high-quality speech without echoes. At the same time, the end-to-end algorithm simplifies the current echo cancellation pipeline for time-variant echo path cases. We test our proposed method on the ICASSP2022 and Interspeech2021 Microsoft deep echo cancellation challenge evaluation dataset, where our method outperforms some of the other hybrid and end-to-end methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LMPAN: A Lightweight Multi-Path Alignment Network for Joint Full-Duplex Acoustic Echo Cancellation and Noise Suppression
eess.AS 2026-07 unverdicted novelty 5.0

LMPAN is a 480K-parameter network using multi-path alignment, attention integration, and dynamic post-filtering that matches larger models on joint AEC and NS while supporting real-time inference.