Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction

Andong Li; Chengzhong Wang; Dingding Yao; Junfeng Li

arxiv: 2602.08556 · v2 · pith:OGBE23A3new · submitted 2026-02-09 · 💻 cs.SD

Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction

Chengzhong Wang , Andong Li , Dingding Yao , Junfeng Li This is my paper

Pith reviewed 2026-05-21 14:47 UTC · model grok-4.3

classification 💻 cs.SD

keywords speech enhancementphase modelingrotation equivariancemagnitude-phase interactiondeep learningphase retrievalaudio signal processing

0 comments

The pith

Enforcing global rotation equivariance lets a dual-stream network model phase's circular geometry for improved speech enhancement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that conventional flat Euclidean networks fail to capture the circular topology of phase, and that enforcing global rotation equivariance in a dedicated phase stream solves this. The authors introduce a magnitude-phase dual-stream architecture whose key modules are built to preserve this equivariance during information exchange and feature fusion. If correct, the approach delivers concrete gains such as more than 20 percent lower phase distance in retrieval tasks and over 0.1 higher PESQ in cross-corpus denoising. Readers should care because accurate phase recovery remains a bottleneck in high-quality audio restoration, and most current deep models ignore the periodic structure of phase angles.

Core claim

The central claim is that a magnitude-phase dual-stream framework, using a Magnitude-Phase Interactive Convolutional Module for modulus-based exchange and a Hybrid-Attention Dual Feed-Forward Network for unified fusion, preserves Global Rotation Equivariance in the phase stream and thereby aligns features with the intrinsic circular geometry of phase, yielding superior results over multiple baselines on phase retrieval, denoising, dereverberation, and bandwidth extension.

What carries the argument

Global Rotation Equivariance (GRE) preserved by the Magnitude-Phase Interactive Convolutional Module (MPICM) and Hybrid-Attention Dual Feed-Forward Network (HADF) that enable modulus-based interaction while maintaining circular topology in the phase stream.

If this is right

Phase distance drops by over 20 percent in the phase retrieval task.
PESQ rises by more than 0.1 in zero-shot cross-corpus denoising evaluations.
Overall superiority holds across universal speech enhancement tasks that mix multiple distortions.
Learned phase features exhibit distinct periodic patterns that match the circular nature of phase.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same equivariant modules could be tested on other circular quantities such as inter-channel phase differences or direction-of-arrival angles.
Combining the dual-stream design with existing masking or generative vocoders might improve real-time enhancement pipelines.
Evaluating the periodic pattern consistency on larger, noisier, or multilingual corpora would test whether the circular alignment generalizes.

Load-bearing premise

Phase possesses an intrinsic circular geometry that standard flat networks cannot model effectively without explicit global rotation equivariance constraints.

What would settle it

A standard convolutional network without the GRE-preserving modules achieves equal or lower phase distance and equal or higher PESQ on the same phase-retrieval and zero-shot denoising benchmarks.

Figures

Figures reproduced from arXiv: 2602.08556 by Andong Li, Chengzhong Wang, Dingding Yao, Junfeng Li.

**Figure 1.** Figure 1: Overview of the proposed network architecture. (a) The dual-stream encoder-decoder topology. The R-Conv and C-Conv denote real-valued and [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Detailed structure of the MPICM block, including the magnitude and [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Detailed architecture of the Hybrid-Attention Dual-FFN (HADF) module. (Top Left) The macroscopic residual block structure. (Bottom) The Hybrid [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Performance Comparison across varying SNRs. Models were trained on the DNS-2020 corpus and evaluated on re-mixed versions of the [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Spectrogram visualization of enhanced speech under diverse distortion scenarios. The audio files are taken from WSJ0+WHAMR! test set. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of learned attention patterns for a voiced speech segment. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

While deep learning has advanced speech enhancement (SE), effective phase modeling remains challenging, as conventional networks typically operate within a flat Euclidean feature space, which is not easy to model the underlying circular topology of the phase. To address this, we propose a magnitude-phase dual-stream framework that aligns the phase stream with its intrinsic circular geometry by enforcing Global Rotation Equivariance (GRE) characteristic. Specifically, we introduce a Magnitude-Phase Interactive Convolutional Module (MPICM) for modulus-based information exchange and a Hybrid-Attention Dual Feed-Forward Network (HADF) bottleneck for unified feature fusion, both of which are designed to preserve GRE in the phase stream. Comprehensive evaluations are conducted across phase retrieval, denoising, dereverberation, and bandwidth extension tasks to validate the superiority of the proposed method over multiple advanced baselines. Notably, the proposed architecture reduces Phase Distance by over 20\% in the phase retrieval task and improves PESQ by more than 0.1 in zero-shot cross-corpus denoising evaluations. The overall superiority is also established in universal SE tasks involving mixed distortions. Qualitative analysis further reveals that the learned phase features exhibit distinct periodic patterns, which are consistent with the intrinsic circular nature of the phase. The source code is available at https://github.com/wangchengzhong/GRE-Net.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds global rotation equivariance to a magnitude-phase dual-stream model for speech enhancement and reports metric gains, but lacks direct verification that the modules actually enforce the claimed property.

read the letter

The main thing to know is that this paper builds a dual-stream magnitude-phase network for speech enhancement and tries to align the phase path with its circular structure by enforcing global rotation equivariance through two new modules: MPICM for modulus-based exchange and HADF for hybrid-attention fusion. They evaluate on phase retrieval, denoising, dereverberation, and bandwidth extension, with claims of more than 20% lower phase distance on retrieval and over 0.1 PESQ gain in zero-shot cross-corpus denoising. The code release on GitHub is a clear positive for anyone who wants to check the details or extend the work.

Referee Report

2 major / 2 minor

Summary. The paper proposes a magnitude-phase dual-stream framework for speech enhancement that aligns the phase stream with its intrinsic circular geometry by enforcing Global Rotation Equivariance (GRE). It introduces the Magnitude-Phase Interactive Convolutional Module (MPICM) for modulus-based information exchange and the Hybrid-Attention Dual Feed-Forward Network (HADF) for unified feature fusion, both designed to preserve GRE in the phase stream. The method is evaluated on phase retrieval, denoising, dereverberation, and bandwidth extension tasks, with reported gains including over 20% reduction in Phase Distance for phase retrieval and more than 0.1 PESQ improvement in zero-shot cross-corpus denoising, plus overall superiority in universal SE tasks with mixed distortions. Qualitative analysis shows learned phase features with distinct periodic patterns, and source code is released.

Significance. If the modules indeed enforce GRE and performance gains are attributable to this property (rather than general capacity or fusion improvements), the work could meaningfully advance phase modeling in speech enhancement by providing an architectural solution to the circular topology of phase. The multi-task evaluation scope and public code release are strengths that support reproducibility and broader applicability.

major comments (2)

[Abstract and proposed method] Abstract and proposed method: The central claim that MPICM and HADF preserve Global Rotation Equivariance (and that this aligns the phase stream to circular topology unlike flat Euclidean networks) is load-bearing for attributing the reported gains (20% Phase Distance reduction, +0.1 PESQ) to topology alignment. However, no derivation is given showing that a global phase rotation by angle θ on the input produces the corresponding rotation on the output features, nor is there an empirical test (e.g., rotation-equivariance verification) or ablation isolating GRE from dual-stream or attention components.
[Experiments] Experiments: The quantitative superiority claims (e.g., Phase Distance reduction by over 20% in phase retrieval, PESQ gain >0.1 in zero-shot denoising) are presented without reference to specific tables, statistical testing, baseline configurations, or ablations that separate the contribution of GRE enforcement from increased model capacity or other architectural elements. This leaves the attribution of improvements to the equivariance property unsubstantiated.

minor comments (2)

[Abstract] The abstract states that learned features exhibit 'distinct periodic patterns' consistent with circular phase; consider adding quantitative metrics or visualizations in the results section to strengthen this qualitative observation.
[Method] Notation for phase representation and the exact definition of Global Rotation Equivariance could be clarified early in the method section for readers unfamiliar with the circular topology.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our paper. We address the major comments point-by-point below and plan to incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and proposed method] Abstract and proposed method: The central claim that MPICM and HADF preserve Global Rotation Equivariance (and that this aligns the phase stream to circular topology unlike flat Euclidean networks) is load-bearing for attributing the reported gains (20% Phase Distance reduction, +0.1 PESQ) to topology alignment. However, no derivation is given showing that a global phase rotation by angle θ on the input produces the corresponding rotation on the output features, nor is there an empirical test (e.g., rotation-equivariance verification) or ablation isolating GRE from dual-stream or attention components.

Authors: We agree that an explicit derivation and empirical validation would better support our claims. The modules are designed with operations that inherently respect rotational symmetry, such as using modulus for interactions which is rotation-invariant and phase adjustments that are equivariant. However, to address this, in the revised version we will add a formal derivation in the method section or appendix proving the GRE property for the proposed modules. We will also include an empirical test for rotation equivariance and an ablation study to separate the GRE contribution from other components like the dual-stream architecture and attention mechanisms. revision: yes
Referee: [Experiments] Experiments: The quantitative superiority claims (e.g., Phase Distance reduction by over 20% in phase retrieval, PESQ gain >0.1 in zero-shot denoising) are presented without reference to specific tables, statistical testing, baseline configurations, or ablations that separate the contribution of GRE enforcement from increased model capacity or other architectural elements. This leaves the attribution of improvements to the equivariance property unsubstantiated.

Authors: The performance gains are reported in the experimental results section with comparisons to state-of-the-art baselines across multiple tasks. To improve clarity, we will explicitly reference the relevant tables (e.g., Table 2 for phase retrieval and Table 3 for denoising) in the revised manuscript. We will also add statistical significance tests and additional ablation experiments that control for model capacity and other factors to better attribute the improvements to the GRE enforcement. This will substantiate the claims more rigorously. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on architectural design and empirical evaluation

full rationale

The paper proposes a magnitude-phase dual-stream framework with MPICM and HADF modules explicitly designed to preserve Global Rotation Equivariance for aligning the phase stream with circular topology. This is framed as an architectural design choice rather than a mathematical derivation or prediction that reduces to fitted inputs or self-referential definitions. Empirical results on phase retrieval (20% Phase Distance reduction) and denoising (+0.1 PESQ) are presented as validation, supported by qualitative observations of periodic patterns in learned features. No equations, self-citations, or ansatzes are shown that would make the central claims equivalent to the inputs by construction; the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no explicit free parameters, mathematical axioms, or newly postulated entities; the framework is described through named modules whose internal mechanics and any implicit assumptions remain unspecified.

pith-pipeline@v0.9.0 · 5771 in / 1163 out tokens · 58821 ms · 2026-05-21T14:47:56.366111+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

[1]

Speech enhancement using a minimum- mean square error short-time spectral amplitude estimator,

Y . Ephraim and D. Malah, “Speech enhancement using a minimum- mean square error short-time spectral amplitude estimator,”IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, pp. 1109–1121, 1984

work page 1984
[2]

Speech enhancement for non-stationary noise environments,

I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments,”Signal Process., vol. 81, no. 11, pp. 2403–2418, 2001

work page 2001
[3]

On training targets for su- pervised speech separation,

Y . Wang, A. Narayanan, and D. Wang, “On training targets for su- pervised speech separation,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1849–1858, 2014

work page 2014
[4]

A regression approach to speech enhancement based on deep neural networks,

Y . Xu, J. Du, L.-R. Dai, and C.-H. Lee, “A regression approach to speech enhancement based on deep neural networks,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 1, pp. 7–19, 2014

work page 2014
[5]

Sixty years of frequency-domain monaural speech enhance- ment: From traditional to deep learning methods,

C. Zheng, H. Zhang, W. Liu, X. Luo, A. Li, X. Li, and B. C. J. Moore, “Sixty years of frequency-domain monaural speech enhance- ment: From traditional to deep learning methods,”Trends Hear., vol. 27, p. 23312165231209913, 2023

work page 2023
[6]

Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement,

A. Li, W. Liu, C. Zheng, C. Fan, and X. Li, “Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 1829–1843, 2021

work page 2021
[7]

WHAMR!: Noisy and reverberant single-channel speech separation,

M. Maciejewski, G. Wichern, E. McQuinn, and J. Le Roux, “WHAMR!: Noisy and reverberant single-channel speech separation,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2020, pp. 696–700

work page 2020
[8]

The unimportance of phase in speech enhance- ment,

D. Wang and J. Lim, “The unimportance of phase in speech enhance- ment,”IEEE Trans. Acoust., Speech, Signal Process., vol. 30, no. 4, pp. 679–681, 1982

work page 1982
[9]

The importance of phase in speech enhancement,

K. Paliwal, K. W ´ojcicki, and B. Shannon, “The importance of phase in speech enhancement,”Speech Commun., vol. 53, no. 4, pp. 465–494, 2011

work page 2011
[10]

On the importance of power compression and phase estimation in monaural speech dereverberation,

A. Li, C. Zheng, R. Peng, and X. Li, “On the importance of power compression and phase estimation in monaural speech dereverberation,” JASA Express Lett., vol. 1, no. 1, p. 014802, Jan. 2021. [Online]. Available: https://doi.org/10.1121/10.0003321

work page doi:10.1121/10.0003321 2021
[11]

Towards high-quality and efficient speech bandwidth extension with parallel amplitude and phase prediction,

Y .-X. Lu, Y . Ai, H.-P. Du, and Z.-H. Ling, “Towards high-quality and efficient speech bandwidth extension with parallel amplitude and phase prediction,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 236–250, 2025

work page 2025
[12]

Phase processing for single-channel speech enhancement: History and recent advances,

T. Gerkmann, M. Krawczyk-Becker, and J. Le Roux, “Phase processing for single-channel speech enhancement: History and recent advances,” IEEE Signal Process. Mag., vol. 32, no. 2, pp. 55–66, 2015. 13

work page 2015
[13]

PHASEN: A phase-and- harmonics-aware speech enhancement network,

D. Yin, C. Luo, Z. Xiong, and W. Zeng, “PHASEN: A phase-and- harmonics-aware speech enhancement network,” inProc. AAAI Conf. Artif. Intell., vol. 34, no. 05, 2020, pp. 9458–9465

work page 2020
[14]

Dual-branch attention-in-attention transformer for single-channel speech enhance- ment,

G. Yu, A. Li, C. Zheng, Y . Guo, Y . Wang, and H. Wang, “Dual-branch attention-in-attention transformer for single-channel speech enhance- ment,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2022, pp. 761–765

work page 2022
[15]

CMGAN: Conformer-based metric- GAN for monaural speech enhancement,

S. Abdulatif, R. Cao, and B. Yang, “CMGAN: Conformer-based metric- GAN for monaural speech enhancement,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 32, pp. 2477–2493, 2024

work page 2024
[16]

MP-SENet: A speech enhancement model with parallel denoising of magnitude and phase spectra,

Y .-X. Lu, Y . Ai, and Z.-H. Ling, “MP-SENet: A speech enhancement model with parallel denoising of magnitude and phase spectra,” inProc. Interspeech, 2023, pp. 3834–3838

work page 2023
[17]

Neural speech phase prediction based on parallel estimation architecture and anti-wrapping losses,

Y . Ai and Z.-H. Ling, “Neural speech phase prediction based on parallel estimation architecture and anti-wrapping losses,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2023, pp. 1–5

work page 2023
[18]

Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement,

Y .-X. Lu, Y . Ai, and Z.-H. Ling, “Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement,”Neural Netw., vol. 189, p. 107562, 2025

work page 2025
[19]

Mamba- SEUNet: Mamba UNet for monaural speech enhancement,

J. Wang, Z. Lin, T. Wang, M. Ge, L. Wang, and J. Dang, “Mamba- SEUNet: Mamba UNet for monaural speech enhancement,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2025, pp. 1–5

work page 2025
[20]

ZipEnhancer: Dual-path down-up sampling- based zipformer for monaural speech enhancement,

H. Wang and B. Tian, “ZipEnhancer: Dual-path down-up sampling- based zipformer for monaural speech enhancement,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2025, pp. 1–5

work page 2025
[21]

MN- Net: Speech enhancement network via modeling the noise,

Y . Hu, Q. Yang, W. Wei, L. Lin, L. He, Z. Ou, and W. Yang, “MN- Net: Speech enhancement network via modeling the noise,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 1208–1219, 2025

work page 2025
[22]

Interactive target positive and negative features modeling for monaural speech enhancement,

X. Xu, W. Tu, Y . Yang, J. Li, and Y . Zhang, “Interactive target positive and negative features modeling for monaural speech enhancement,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 4856– 4869, 2025

work page 2025
[23]

DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement,

Y . Hu, Y . Liu, S. Lv, M. Xing, S. Zhang, Y . Fu, J. Wu, B. Zhang, and L. Xie, “DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement,” inProc. Interspeech, 2020, pp. 2472–2476

work page 2020
[24]

Inter-frequency phase difference for phase reconstruction using deep neural networks and maximum likelihood,

N. B. Thien, Y . Wakabayashi, K. Iwai, and T. Nishiura, “Inter-frequency phase difference for phase reconstruction using deep neural networks and maximum likelihood,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 31, pp. 1667–1680, 2023

work page 2023
[25]

Unre- stricted global phase bias-aware single-channel speech enhancement with conformer-based metric GAN,

S. Zhang, Z. Qiu, D. Takeuchi, N. Harada, and S. Makino, “Unre- stricted global phase bias-aware single-channel speech enhancement with conformer-based metric GAN,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2024, pp. 1026–1030

work page 2024
[26]

Phase reconstruction based on recurrent phase unwrapping with deep neural networks,

Y . Masuyama, K. Yatabe, Y . Koizumi, Y . Oikawa, and N. Harada, “Phase reconstruction based on recurrent phase unwrapping with deep neural networks,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process., Barcelona, Spain, 2020, pp. 726–730

work page 2020
[27]

D. C. Ghiglia and M. D. Pritt,Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software. New York, NY , USA: Wiley, 1998

work page 1998
[28]

Universal discrete-domain speech enhancement,

F. Liu, Y . Ai, Y .-X. Lu, R.-C. Zheng, H.-P. Du, and Z.-H. Ling, “Universal discrete-domain speech enhancement,”IEEE Trans. Audio Speech Lang. Process., vol. 34, pp. 285–298, 2026

work page 2026
[29]

Holographic transformers for complex-valued signal pro- cessing: Integrating phase interference into self-attention,

E. Huang, Z. Zhang, T. Xu, C. Xia, K. Hu, Y . Yang, T. Pan, D. Dong, and Z. Qin, “Holographic transformers for complex-valued signal pro- cessing: Integrating phase interference into self-attention,”arXiv, 2025

work page 2025
[30]

Spectral-envelope and group-delay models for transient signals—applications to castanets and stop conso- nants,

S. Maitra and B. Yegnanarayana, “Spectral-envelope and group-delay models for transient signals—applications to castanets and stop conso- nants,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2011, pp. 4412–4415

work page 2011
[31]

Group delay functions and its applications in speech technology,

H. A. Murthy and B. Yegnanarayana, “Group delay functions and its applications in speech technology,”S ¯adhan¯a, vol. 36, no. 5, pp. 745– 782, 2011

work page 2011
[32]

Complex-valued neural networks: A comprehensive survey,

C. Lee, H. Hasegawa, and S. Gao, “Complex-valued neural networks: A comprehensive survey,”IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1406–1426, 2022

work page 2022
[33]

DBT-Net: Dual-branch federative magnitude and phase estimation with attention- in-attention transformer for monaural speech enhancement,

G. Yu, A. Li, H. Wang, Y . Wang, Y . Ke, and C. Zheng, “DBT-Net: Dual-branch federative magnitude and phase estimation with attention- in-attention transformer for monaural speech enhancement,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 30, pp. 2629–2644, 2022

work page 2022
[34]

Root mean square layer normalization,

B. Zhang and R. Sennrich, “Root mean square layer normalization,” Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019

work page 2019
[35]

Building blocks for a complex-valued trans- former architecture,

F. Eilers and X. Jiang, “Building blocks for a complex-valued trans- former architecture,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2023, pp. 1–5

work page 2023
[36]

TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain,

K. Wang, B. He, and W.-P. Zhu, “TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2021, pp. 7098–7102

work page 2021
[37]

Investigat- ing RNN-based speech enhancement methods for noise-robust text-to- speech,

C. V . Botinhao, X. Wang, S. Takaki, and J. Yamagishi, “Investigat- ing RNN-based speech enhancement methods for noise-robust text-to- speech,” inProc. ISCA Speech Synth. Workshop, 2016, pp. 159–165

work page 2016
[38]

The INTERSPEECH 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,

C. K. Reddy, V . Gopal, R. Cutler, E. Beyrami, R. Cheng, H. Dubey, S. Matusevych, R. Aichner, A. Aazami, S. Braun, P. Rana, S. Srinivasan, and J. Gehrke, “The INTERSPEECH 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,” inProc. Interspeech, 2020, pp. 2492–2496

work page 2020
[39]

The voice bank corpus: Design, collection and data analysis of a large regional accent speech database,

C. Veaux, J. Yamagishi, and S. King, “The voice bank corpus: Design, collection and data analysis of a large regional accent speech database,” inProc. Orient. COCOSDA, 2013, pp. 1–4

work page 2013
[40]

The diverse environments multi- channel acoustic noise database (DEMAND): A database of multichan- nel environmental noise recordings,

J. Thiemann, N. Ito, and E. Vincent, “The diverse environments multi- channel acoustic noise database (DEMAND): A database of multichan- nel environmental noise recordings,” inProc. Meet. Acoust., vol. 19, no. 1, 2013, p. 035081

work page 2013
[41]

ICASSP 2021 deep noise suppression challenge,

C. K. A. Reddy, H. Dubey, V . Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, and S. Srinivasan, “ICASSP 2021 deep noise suppression challenge,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2021, pp. 6623–6627

work page 2021
[42]

CSR-I (WSJ0) Complete,

J. S. Garofoloet al., “CSR-I (WSJ0) Complete,” LDC93S6A, 1993. [Online]. Available: https://catalog.ldc.upenn.edu/LDC93S6A

work page 1993
[43]

Image method for efficiently simulating small-room acoustics,

J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,”J. Acoust. Soc. Amer., vol. 65, no. 4, pp. 943–950, 1979

work page 1979
[44]

HiFi-GAN: Generative adversarial net- works for efficient and high fidelity speech synthesis,

J. Kong, J. Kim, and J. Bae, “HiFi-GAN: Generative adversarial net- works for efficient and high fidelity speech synthesis,”Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 17 022–17 033, 2020

work page 2020
[45]

BAPEN: Towards versatile audio phase retrieval,

L. Dai, A. Li, Z. Han, C. Zheng, and X. Li, “BAPEN: Towards versatile audio phase retrieval,” inProc. ACM Int. Conf. Multimedia, 2025, pp. 8293–8302

work page 2025
[46]

Phase- aware speech enhancement with deep complex U-Net,

H.-S. Choi, J.-H. Kim, J. Huh, A. Kim, J.-W. Ha, and K. Lee, “Phase- aware speech enhancement with deep complex U-Net,” inProc. Int. Conf. Learn. Represent. (ICLR), 2019

work page 2019
[47]

Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs,

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., vol. 2, 2001, pp. 749–752

work page 2001
[48]

A short- time objective intelligibility measure for time-frequency weighted noisy speech,

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “A short- time objective intelligibility measure for time-frequency weighted noisy speech,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2010, pp. 4214–4217

work page 2010
[49]

Evaluation of objective quality measures for speech enhancement,

Y . Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,”IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 1, pp. 229–238, 2008

work page 2008
[50]

SDR – half- baked or well done?

J. Le Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “SDR – half- baked or well done?” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2019, pp. 626–630

work page 2019
[51]

DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,

C. K. Reddy, V . Gopal, and R. Cutler, “DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2021, pp. 6493–6497

work page 2021
[52]

UTMOS: UTokyo-SaruLab system for V oiceMOS chal- lenge 2022,

T. Saeki, D. Xin, W. Nakata, T. Koriyama, S. Takamichi, and H. Saruwatari, “UTMOS: UTokyo-SaruLab system for V oiceMOS chal- lenge 2022,” inProc. Interspeech, 2022, pp. 4521–4525

work page 2022
[53]

Signal estimation from modified short-time Fourier transform,

D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,”IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp. 236–243, 1984

work page 1984
[54]

DiffPhase: Generative diffusion- based STFT phase retrieval,

T. Peer, S. Welker, and T. Gerkmann, “DiffPhase: Generative diffusion- based STFT phase retrieval,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2023, pp. 1–5

work page 2023
[55]

An investigation of incorporating mamba for speech enhancement,

R. Chao, W.-H. Cheng, M. La Quatra, S. M. Siniscalchi, C.-H. H. Yang, S.-W. Fu, and Y . Tsao, “An investigation of incorporating mamba for speech enhancement,” inProc. IEEE Spoken Lang. Technol. Workshop (SLT), 2024, pp. 302–308

work page 2024
[56]

FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement,

S. Zhao, B. Ma, K. N. Watcharasupat, and W.-S. Gan, “FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2022, pp. 9281–9285

work page 2022
[57]

Universal score- based speech enhancement with high content preservation,

R. Scheibler, Y . Fujita, Y . Shirahata, and T. Komatsu, “Universal score- based speech enhancement with high content preservation,” inProc. Interspeech, 2024, pp. 1165–1169

work page 2024

[1] [1]

Speech enhancement using a minimum- mean square error short-time spectral amplitude estimator,

Y . Ephraim and D. Malah, “Speech enhancement using a minimum- mean square error short-time spectral amplitude estimator,”IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, pp. 1109–1121, 1984

work page 1984

[2] [2]

Speech enhancement for non-stationary noise environments,

I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments,”Signal Process., vol. 81, no. 11, pp. 2403–2418, 2001

work page 2001

[3] [3]

On training targets for su- pervised speech separation,

Y . Wang, A. Narayanan, and D. Wang, “On training targets for su- pervised speech separation,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1849–1858, 2014

work page 2014

[4] [4]

A regression approach to speech enhancement based on deep neural networks,

Y . Xu, J. Du, L.-R. Dai, and C.-H. Lee, “A regression approach to speech enhancement based on deep neural networks,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 1, pp. 7–19, 2014

work page 2014

[5] [5]

Sixty years of frequency-domain monaural speech enhance- ment: From traditional to deep learning methods,

C. Zheng, H. Zhang, W. Liu, X. Luo, A. Li, X. Li, and B. C. J. Moore, “Sixty years of frequency-domain monaural speech enhance- ment: From traditional to deep learning methods,”Trends Hear., vol. 27, p. 23312165231209913, 2023

work page 2023

[6] [6]

Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement,

A. Li, W. Liu, C. Zheng, C. Fan, and X. Li, “Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 1829–1843, 2021

work page 2021

[7] [7]

WHAMR!: Noisy and reverberant single-channel speech separation,

M. Maciejewski, G. Wichern, E. McQuinn, and J. Le Roux, “WHAMR!: Noisy and reverberant single-channel speech separation,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2020, pp. 696–700

work page 2020

[8] [8]

The unimportance of phase in speech enhance- ment,

D. Wang and J. Lim, “The unimportance of phase in speech enhance- ment,”IEEE Trans. Acoust., Speech, Signal Process., vol. 30, no. 4, pp. 679–681, 1982

work page 1982

[9] [9]

The importance of phase in speech enhancement,

K. Paliwal, K. W ´ojcicki, and B. Shannon, “The importance of phase in speech enhancement,”Speech Commun., vol. 53, no. 4, pp. 465–494, 2011

work page 2011

[10] [10]

On the importance of power compression and phase estimation in monaural speech dereverberation,

A. Li, C. Zheng, R. Peng, and X. Li, “On the importance of power compression and phase estimation in monaural speech dereverberation,” JASA Express Lett., vol. 1, no. 1, p. 014802, Jan. 2021. [Online]. Available: https://doi.org/10.1121/10.0003321

work page doi:10.1121/10.0003321 2021

[11] [11]

Towards high-quality and efficient speech bandwidth extension with parallel amplitude and phase prediction,

Y .-X. Lu, Y . Ai, H.-P. Du, and Z.-H. Ling, “Towards high-quality and efficient speech bandwidth extension with parallel amplitude and phase prediction,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 236–250, 2025

work page 2025

[12] [12]

Phase processing for single-channel speech enhancement: History and recent advances,

T. Gerkmann, M. Krawczyk-Becker, and J. Le Roux, “Phase processing for single-channel speech enhancement: History and recent advances,” IEEE Signal Process. Mag., vol. 32, no. 2, pp. 55–66, 2015. 13

work page 2015

[13] [13]

PHASEN: A phase-and- harmonics-aware speech enhancement network,

D. Yin, C. Luo, Z. Xiong, and W. Zeng, “PHASEN: A phase-and- harmonics-aware speech enhancement network,” inProc. AAAI Conf. Artif. Intell., vol. 34, no. 05, 2020, pp. 9458–9465

work page 2020

[14] [14]

Dual-branch attention-in-attention transformer for single-channel speech enhance- ment,

G. Yu, A. Li, C. Zheng, Y . Guo, Y . Wang, and H. Wang, “Dual-branch attention-in-attention transformer for single-channel speech enhance- ment,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2022, pp. 761–765

work page 2022

[15] [15]

CMGAN: Conformer-based metric- GAN for monaural speech enhancement,

S. Abdulatif, R. Cao, and B. Yang, “CMGAN: Conformer-based metric- GAN for monaural speech enhancement,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 32, pp. 2477–2493, 2024

work page 2024

[16] [16]

MP-SENet: A speech enhancement model with parallel denoising of magnitude and phase spectra,

Y .-X. Lu, Y . Ai, and Z.-H. Ling, “MP-SENet: A speech enhancement model with parallel denoising of magnitude and phase spectra,” inProc. Interspeech, 2023, pp. 3834–3838

work page 2023

[17] [17]

Neural speech phase prediction based on parallel estimation architecture and anti-wrapping losses,

Y . Ai and Z.-H. Ling, “Neural speech phase prediction based on parallel estimation architecture and anti-wrapping losses,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2023, pp. 1–5

work page 2023

[18] [18]

Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement,

Y .-X. Lu, Y . Ai, and Z.-H. Ling, “Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement,”Neural Netw., vol. 189, p. 107562, 2025

work page 2025

[19] [19]

Mamba- SEUNet: Mamba UNet for monaural speech enhancement,

J. Wang, Z. Lin, T. Wang, M. Ge, L. Wang, and J. Dang, “Mamba- SEUNet: Mamba UNet for monaural speech enhancement,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2025, pp. 1–5

work page 2025

[20] [20]

ZipEnhancer: Dual-path down-up sampling- based zipformer for monaural speech enhancement,

H. Wang and B. Tian, “ZipEnhancer: Dual-path down-up sampling- based zipformer for monaural speech enhancement,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2025, pp. 1–5

work page 2025

[21] [21]

MN- Net: Speech enhancement network via modeling the noise,

Y . Hu, Q. Yang, W. Wei, L. Lin, L. He, Z. Ou, and W. Yang, “MN- Net: Speech enhancement network via modeling the noise,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 1208–1219, 2025

work page 2025

[22] [22]

Interactive target positive and negative features modeling for monaural speech enhancement,

X. Xu, W. Tu, Y . Yang, J. Li, and Y . Zhang, “Interactive target positive and negative features modeling for monaural speech enhancement,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 4856– 4869, 2025

work page 2025

[23] [23]

DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement,

Y . Hu, Y . Liu, S. Lv, M. Xing, S. Zhang, Y . Fu, J. Wu, B. Zhang, and L. Xie, “DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement,” inProc. Interspeech, 2020, pp. 2472–2476

work page 2020

[24] [24]

Inter-frequency phase difference for phase reconstruction using deep neural networks and maximum likelihood,

N. B. Thien, Y . Wakabayashi, K. Iwai, and T. Nishiura, “Inter-frequency phase difference for phase reconstruction using deep neural networks and maximum likelihood,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 31, pp. 1667–1680, 2023

work page 2023

[25] [25]

Unre- stricted global phase bias-aware single-channel speech enhancement with conformer-based metric GAN,

S. Zhang, Z. Qiu, D. Takeuchi, N. Harada, and S. Makino, “Unre- stricted global phase bias-aware single-channel speech enhancement with conformer-based metric GAN,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2024, pp. 1026–1030

work page 2024

[26] [26]

Phase reconstruction based on recurrent phase unwrapping with deep neural networks,

Y . Masuyama, K. Yatabe, Y . Koizumi, Y . Oikawa, and N. Harada, “Phase reconstruction based on recurrent phase unwrapping with deep neural networks,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process., Barcelona, Spain, 2020, pp. 726–730

work page 2020

[27] [27]

D. C. Ghiglia and M. D. Pritt,Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software. New York, NY , USA: Wiley, 1998

work page 1998

[28] [28]

Universal discrete-domain speech enhancement,

F. Liu, Y . Ai, Y .-X. Lu, R.-C. Zheng, H.-P. Du, and Z.-H. Ling, “Universal discrete-domain speech enhancement,”IEEE Trans. Audio Speech Lang. Process., vol. 34, pp. 285–298, 2026

work page 2026

[29] [29]

Holographic transformers for complex-valued signal pro- cessing: Integrating phase interference into self-attention,

E. Huang, Z. Zhang, T. Xu, C. Xia, K. Hu, Y . Yang, T. Pan, D. Dong, and Z. Qin, “Holographic transformers for complex-valued signal pro- cessing: Integrating phase interference into self-attention,”arXiv, 2025

work page 2025

[30] [30]

Spectral-envelope and group-delay models for transient signals—applications to castanets and stop conso- nants,

S. Maitra and B. Yegnanarayana, “Spectral-envelope and group-delay models for transient signals—applications to castanets and stop conso- nants,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2011, pp. 4412–4415

work page 2011

[31] [31]

Group delay functions and its applications in speech technology,

H. A. Murthy and B. Yegnanarayana, “Group delay functions and its applications in speech technology,”S ¯adhan¯a, vol. 36, no. 5, pp. 745– 782, 2011

work page 2011

[32] [32]

Complex-valued neural networks: A comprehensive survey,

C. Lee, H. Hasegawa, and S. Gao, “Complex-valued neural networks: A comprehensive survey,”IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1406–1426, 2022

work page 2022

[33] [33]

DBT-Net: Dual-branch federative magnitude and phase estimation with attention- in-attention transformer for monaural speech enhancement,

G. Yu, A. Li, H. Wang, Y . Wang, Y . Ke, and C. Zheng, “DBT-Net: Dual-branch federative magnitude and phase estimation with attention- in-attention transformer for monaural speech enhancement,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 30, pp. 2629–2644, 2022

work page 2022

[34] [34]

Root mean square layer normalization,

B. Zhang and R. Sennrich, “Root mean square layer normalization,” Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019

work page 2019

[35] [35]

Building blocks for a complex-valued trans- former architecture,

F. Eilers and X. Jiang, “Building blocks for a complex-valued trans- former architecture,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2023, pp. 1–5

work page 2023

[36] [36]

TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain,

K. Wang, B. He, and W.-P. Zhu, “TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2021, pp. 7098–7102

work page 2021

[37] [37]

Investigat- ing RNN-based speech enhancement methods for noise-robust text-to- speech,

C. V . Botinhao, X. Wang, S. Takaki, and J. Yamagishi, “Investigat- ing RNN-based speech enhancement methods for noise-robust text-to- speech,” inProc. ISCA Speech Synth. Workshop, 2016, pp. 159–165

work page 2016

[38] [38]

The INTERSPEECH 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,

C. K. Reddy, V . Gopal, R. Cutler, E. Beyrami, R. Cheng, H. Dubey, S. Matusevych, R. Aichner, A. Aazami, S. Braun, P. Rana, S. Srinivasan, and J. Gehrke, “The INTERSPEECH 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,” inProc. Interspeech, 2020, pp. 2492–2496

work page 2020

[39] [39]

The voice bank corpus: Design, collection and data analysis of a large regional accent speech database,

C. Veaux, J. Yamagishi, and S. King, “The voice bank corpus: Design, collection and data analysis of a large regional accent speech database,” inProc. Orient. COCOSDA, 2013, pp. 1–4

work page 2013

[40] [40]

The diverse environments multi- channel acoustic noise database (DEMAND): A database of multichan- nel environmental noise recordings,

J. Thiemann, N. Ito, and E. Vincent, “The diverse environments multi- channel acoustic noise database (DEMAND): A database of multichan- nel environmental noise recordings,” inProc. Meet. Acoust., vol. 19, no. 1, 2013, p. 035081

work page 2013

[41] [41]

ICASSP 2021 deep noise suppression challenge,

C. K. A. Reddy, H. Dubey, V . Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, and S. Srinivasan, “ICASSP 2021 deep noise suppression challenge,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2021, pp. 6623–6627

work page 2021

[42] [42]

CSR-I (WSJ0) Complete,

J. S. Garofoloet al., “CSR-I (WSJ0) Complete,” LDC93S6A, 1993. [Online]. Available: https://catalog.ldc.upenn.edu/LDC93S6A

work page 1993

[43] [43]

Image method for efficiently simulating small-room acoustics,

J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,”J. Acoust. Soc. Amer., vol. 65, no. 4, pp. 943–950, 1979

work page 1979

[44] [44]

HiFi-GAN: Generative adversarial net- works for efficient and high fidelity speech synthesis,

J. Kong, J. Kim, and J. Bae, “HiFi-GAN: Generative adversarial net- works for efficient and high fidelity speech synthesis,”Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 17 022–17 033, 2020

work page 2020

[45] [45]

BAPEN: Towards versatile audio phase retrieval,

L. Dai, A. Li, Z. Han, C. Zheng, and X. Li, “BAPEN: Towards versatile audio phase retrieval,” inProc. ACM Int. Conf. Multimedia, 2025, pp. 8293–8302

work page 2025

[46] [46]

Phase- aware speech enhancement with deep complex U-Net,

H.-S. Choi, J.-H. Kim, J. Huh, A. Kim, J.-W. Ha, and K. Lee, “Phase- aware speech enhancement with deep complex U-Net,” inProc. Int. Conf. Learn. Represent. (ICLR), 2019

work page 2019

[47] [47]

Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs,

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., vol. 2, 2001, pp. 749–752

work page 2001

[48] [48]

A short- time objective intelligibility measure for time-frequency weighted noisy speech,

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “A short- time objective intelligibility measure for time-frequency weighted noisy speech,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2010, pp. 4214–4217

work page 2010

[49] [49]

Evaluation of objective quality measures for speech enhancement,

Y . Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,”IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 1, pp. 229–238, 2008

work page 2008

[50] [50]

SDR – half- baked or well done?

J. Le Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “SDR – half- baked or well done?” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2019, pp. 626–630

work page 2019

[51] [51]

DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,

C. K. Reddy, V . Gopal, and R. Cutler, “DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2021, pp. 6493–6497

work page 2021

[52] [52]

UTMOS: UTokyo-SaruLab system for V oiceMOS chal- lenge 2022,

T. Saeki, D. Xin, W. Nakata, T. Koriyama, S. Takamichi, and H. Saruwatari, “UTMOS: UTokyo-SaruLab system for V oiceMOS chal- lenge 2022,” inProc. Interspeech, 2022, pp. 4521–4525

work page 2022

[53] [53]

Signal estimation from modified short-time Fourier transform,

D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,”IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp. 236–243, 1984

work page 1984

[54] [54]

DiffPhase: Generative diffusion- based STFT phase retrieval,

T. Peer, S. Welker, and T. Gerkmann, “DiffPhase: Generative diffusion- based STFT phase retrieval,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2023, pp. 1–5

work page 2023

[55] [55]

An investigation of incorporating mamba for speech enhancement,

R. Chao, W.-H. Cheng, M. La Quatra, S. M. Siniscalchi, C.-H. H. Yang, S.-W. Fu, and Y . Tsao, “An investigation of incorporating mamba for speech enhancement,” inProc. IEEE Spoken Lang. Technol. Workshop (SLT), 2024, pp. 302–308

work page 2024

[56] [56]

FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement,

S. Zhao, B. Ma, K. N. Watcharasupat, and W.-S. Gan, “FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2022, pp. 9281–9285

work page 2022

[57] [57]

Universal score- based speech enhancement with high content preservation,

R. Scheibler, Y . Fujita, Y . Shirahata, and T. Komatsu, “Universal score- based speech enhancement with high content preservation,” inProc. Interspeech, 2024, pp. 1165–1169

work page 2024