Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction
Pith reviewed 2026-05-21 14:47 UTC · model grok-4.3
The pith
Enforcing global rotation equivariance lets a dual-stream network model phase's circular geometry for improved speech enhancement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a magnitude-phase dual-stream framework, using a Magnitude-Phase Interactive Convolutional Module for modulus-based exchange and a Hybrid-Attention Dual Feed-Forward Network for unified fusion, preserves Global Rotation Equivariance in the phase stream and thereby aligns features with the intrinsic circular geometry of phase, yielding superior results over multiple baselines on phase retrieval, denoising, dereverberation, and bandwidth extension.
What carries the argument
Global Rotation Equivariance (GRE) preserved by the Magnitude-Phase Interactive Convolutional Module (MPICM) and Hybrid-Attention Dual Feed-Forward Network (HADF) that enable modulus-based interaction while maintaining circular topology in the phase stream.
If this is right
- Phase distance drops by over 20 percent in the phase retrieval task.
- PESQ rises by more than 0.1 in zero-shot cross-corpus denoising evaluations.
- Overall superiority holds across universal speech enhancement tasks that mix multiple distortions.
- Learned phase features exhibit distinct periodic patterns that match the circular nature of phase.
Where Pith is reading between the lines
- The same equivariant modules could be tested on other circular quantities such as inter-channel phase differences or direction-of-arrival angles.
- Combining the dual-stream design with existing masking or generative vocoders might improve real-time enhancement pipelines.
- Evaluating the periodic pattern consistency on larger, noisier, or multilingual corpora would test whether the circular alignment generalizes.
Load-bearing premise
Phase possesses an intrinsic circular geometry that standard flat networks cannot model effectively without explicit global rotation equivariance constraints.
What would settle it
A standard convolutional network without the GRE-preserving modules achieves equal or lower phase distance and equal or higher PESQ on the same phase-retrieval and zero-shot denoising benchmarks.
Figures
read the original abstract
While deep learning has advanced speech enhancement (SE), effective phase modeling remains challenging, as conventional networks typically operate within a flat Euclidean feature space, which is not easy to model the underlying circular topology of the phase. To address this, we propose a magnitude-phase dual-stream framework that aligns the phase stream with its intrinsic circular geometry by enforcing Global Rotation Equivariance (GRE) characteristic. Specifically, we introduce a Magnitude-Phase Interactive Convolutional Module (MPICM) for modulus-based information exchange and a Hybrid-Attention Dual Feed-Forward Network (HADF) bottleneck for unified feature fusion, both of which are designed to preserve GRE in the phase stream. Comprehensive evaluations are conducted across phase retrieval, denoising, dereverberation, and bandwidth extension tasks to validate the superiority of the proposed method over multiple advanced baselines. Notably, the proposed architecture reduces Phase Distance by over 20\% in the phase retrieval task and improves PESQ by more than 0.1 in zero-shot cross-corpus denoising evaluations. The overall superiority is also established in universal SE tasks involving mixed distortions. Qualitative analysis further reveals that the learned phase features exhibit distinct periodic patterns, which are consistent with the intrinsic circular nature of the phase. The source code is available at https://github.com/wangchengzhong/GRE-Net.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a magnitude-phase dual-stream framework for speech enhancement that aligns the phase stream with its intrinsic circular geometry by enforcing Global Rotation Equivariance (GRE). It introduces the Magnitude-Phase Interactive Convolutional Module (MPICM) for modulus-based information exchange and the Hybrid-Attention Dual Feed-Forward Network (HADF) for unified feature fusion, both designed to preserve GRE in the phase stream. The method is evaluated on phase retrieval, denoising, dereverberation, and bandwidth extension tasks, with reported gains including over 20% reduction in Phase Distance for phase retrieval and more than 0.1 PESQ improvement in zero-shot cross-corpus denoising, plus overall superiority in universal SE tasks with mixed distortions. Qualitative analysis shows learned phase features with distinct periodic patterns, and source code is released.
Significance. If the modules indeed enforce GRE and performance gains are attributable to this property (rather than general capacity or fusion improvements), the work could meaningfully advance phase modeling in speech enhancement by providing an architectural solution to the circular topology of phase. The multi-task evaluation scope and public code release are strengths that support reproducibility and broader applicability.
major comments (2)
- [Abstract and proposed method] Abstract and proposed method: The central claim that MPICM and HADF preserve Global Rotation Equivariance (and that this aligns the phase stream to circular topology unlike flat Euclidean networks) is load-bearing for attributing the reported gains (20% Phase Distance reduction, +0.1 PESQ) to topology alignment. However, no derivation is given showing that a global phase rotation by angle θ on the input produces the corresponding rotation on the output features, nor is there an empirical test (e.g., rotation-equivariance verification) or ablation isolating GRE from dual-stream or attention components.
- [Experiments] Experiments: The quantitative superiority claims (e.g., Phase Distance reduction by over 20% in phase retrieval, PESQ gain >0.1 in zero-shot denoising) are presented without reference to specific tables, statistical testing, baseline configurations, or ablations that separate the contribution of GRE enforcement from increased model capacity or other architectural elements. This leaves the attribution of improvements to the equivariance property unsubstantiated.
minor comments (2)
- [Abstract] The abstract states that learned features exhibit 'distinct periodic patterns' consistent with circular phase; consider adding quantitative metrics or visualizations in the results section to strengthen this qualitative observation.
- [Method] Notation for phase representation and the exact definition of Global Rotation Equivariance could be clarified early in the method section for readers unfamiliar with the circular topology.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our paper. We address the major comments point-by-point below and plan to incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and proposed method] Abstract and proposed method: The central claim that MPICM and HADF preserve Global Rotation Equivariance (and that this aligns the phase stream to circular topology unlike flat Euclidean networks) is load-bearing for attributing the reported gains (20% Phase Distance reduction, +0.1 PESQ) to topology alignment. However, no derivation is given showing that a global phase rotation by angle θ on the input produces the corresponding rotation on the output features, nor is there an empirical test (e.g., rotation-equivariance verification) or ablation isolating GRE from dual-stream or attention components.
Authors: We agree that an explicit derivation and empirical validation would better support our claims. The modules are designed with operations that inherently respect rotational symmetry, such as using modulus for interactions which is rotation-invariant and phase adjustments that are equivariant. However, to address this, in the revised version we will add a formal derivation in the method section or appendix proving the GRE property for the proposed modules. We will also include an empirical test for rotation equivariance and an ablation study to separate the GRE contribution from other components like the dual-stream architecture and attention mechanisms. revision: yes
-
Referee: [Experiments] Experiments: The quantitative superiority claims (e.g., Phase Distance reduction by over 20% in phase retrieval, PESQ gain >0.1 in zero-shot denoising) are presented without reference to specific tables, statistical testing, baseline configurations, or ablations that separate the contribution of GRE enforcement from increased model capacity or other architectural elements. This leaves the attribution of improvements to the equivariance property unsubstantiated.
Authors: The performance gains are reported in the experimental results section with comparisons to state-of-the-art baselines across multiple tasks. To improve clarity, we will explicitly reference the relevant tables (e.g., Table 2 for phase retrieval and Table 3 for denoising) in the revised manuscript. We will also add statistical significance tests and additional ablation experiments that control for model capacity and other factors to better attribute the improvements to the GRE enforcement. This will substantiate the claims more rigorously. revision: yes
Circularity Check
No circularity detected; claims rest on architectural design and empirical evaluation
full rationale
The paper proposes a magnitude-phase dual-stream framework with MPICM and HADF modules explicitly designed to preserve Global Rotation Equivariance for aligning the phase stream with circular topology. This is framed as an architectural design choice rather than a mathematical derivation or prediction that reduces to fitted inputs or self-referential definitions. Empirical results on phase retrieval (20% Phase Distance reduction) and denoising (+0.1 PESQ) are presented as validation, supported by qualitative observations of periodic patterns in learned features. No equations, self-citations, or ansatzes are shown that would make the central claims equivalent to the inputs by construction; the derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Speech enhancement using a minimum- mean square error short-time spectral amplitude estimator,
Y . Ephraim and D. Malah, “Speech enhancement using a minimum- mean square error short-time spectral amplitude estimator,”IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, pp. 1109–1121, 1984
work page 1984
-
[2]
Speech enhancement for non-stationary noise environments,
I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments,”Signal Process., vol. 81, no. 11, pp. 2403–2418, 2001
work page 2001
-
[3]
On training targets for su- pervised speech separation,
Y . Wang, A. Narayanan, and D. Wang, “On training targets for su- pervised speech separation,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1849–1858, 2014
work page 2014
-
[4]
A regression approach to speech enhancement based on deep neural networks,
Y . Xu, J. Du, L.-R. Dai, and C.-H. Lee, “A regression approach to speech enhancement based on deep neural networks,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 1, pp. 7–19, 2014
work page 2014
-
[5]
C. Zheng, H. Zhang, W. Liu, X. Luo, A. Li, X. Li, and B. C. J. Moore, “Sixty years of frequency-domain monaural speech enhance- ment: From traditional to deep learning methods,”Trends Hear., vol. 27, p. 23312165231209913, 2023
work page 2023
-
[6]
A. Li, W. Liu, C. Zheng, C. Fan, and X. Li, “Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 1829–1843, 2021
work page 2021
-
[7]
WHAMR!: Noisy and reverberant single-channel speech separation,
M. Maciejewski, G. Wichern, E. McQuinn, and J. Le Roux, “WHAMR!: Noisy and reverberant single-channel speech separation,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2020, pp. 696–700
work page 2020
-
[8]
The unimportance of phase in speech enhance- ment,
D. Wang and J. Lim, “The unimportance of phase in speech enhance- ment,”IEEE Trans. Acoust., Speech, Signal Process., vol. 30, no. 4, pp. 679–681, 1982
work page 1982
-
[9]
The importance of phase in speech enhancement,
K. Paliwal, K. W ´ojcicki, and B. Shannon, “The importance of phase in speech enhancement,”Speech Commun., vol. 53, no. 4, pp. 465–494, 2011
work page 2011
-
[10]
On the importance of power compression and phase estimation in monaural speech dereverberation,
A. Li, C. Zheng, R. Peng, and X. Li, “On the importance of power compression and phase estimation in monaural speech dereverberation,” JASA Express Lett., vol. 1, no. 1, p. 014802, Jan. 2021. [Online]. Available: https://doi.org/10.1121/10.0003321
-
[11]
Y .-X. Lu, Y . Ai, H.-P. Du, and Z.-H. Ling, “Towards high-quality and efficient speech bandwidth extension with parallel amplitude and phase prediction,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 236–250, 2025
work page 2025
-
[12]
Phase processing for single-channel speech enhancement: History and recent advances,
T. Gerkmann, M. Krawczyk-Becker, and J. Le Roux, “Phase processing for single-channel speech enhancement: History and recent advances,” IEEE Signal Process. Mag., vol. 32, no. 2, pp. 55–66, 2015. 13
work page 2015
-
[13]
PHASEN: A phase-and- harmonics-aware speech enhancement network,
D. Yin, C. Luo, Z. Xiong, and W. Zeng, “PHASEN: A phase-and- harmonics-aware speech enhancement network,” inProc. AAAI Conf. Artif. Intell., vol. 34, no. 05, 2020, pp. 9458–9465
work page 2020
-
[14]
Dual-branch attention-in-attention transformer for single-channel speech enhance- ment,
G. Yu, A. Li, C. Zheng, Y . Guo, Y . Wang, and H. Wang, “Dual-branch attention-in-attention transformer for single-channel speech enhance- ment,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2022, pp. 761–765
work page 2022
-
[15]
CMGAN: Conformer-based metric- GAN for monaural speech enhancement,
S. Abdulatif, R. Cao, and B. Yang, “CMGAN: Conformer-based metric- GAN for monaural speech enhancement,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 32, pp. 2477–2493, 2024
work page 2024
-
[16]
MP-SENet: A speech enhancement model with parallel denoising of magnitude and phase spectra,
Y .-X. Lu, Y . Ai, and Z.-H. Ling, “MP-SENet: A speech enhancement model with parallel denoising of magnitude and phase spectra,” inProc. Interspeech, 2023, pp. 3834–3838
work page 2023
-
[17]
Neural speech phase prediction based on parallel estimation architecture and anti-wrapping losses,
Y . Ai and Z.-H. Ling, “Neural speech phase prediction based on parallel estimation architecture and anti-wrapping losses,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2023, pp. 1–5
work page 2023
-
[18]
Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement,
Y .-X. Lu, Y . Ai, and Z.-H. Ling, “Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement,”Neural Netw., vol. 189, p. 107562, 2025
work page 2025
-
[19]
Mamba- SEUNet: Mamba UNet for monaural speech enhancement,
J. Wang, Z. Lin, T. Wang, M. Ge, L. Wang, and J. Dang, “Mamba- SEUNet: Mamba UNet for monaural speech enhancement,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2025, pp. 1–5
work page 2025
-
[20]
ZipEnhancer: Dual-path down-up sampling- based zipformer for monaural speech enhancement,
H. Wang and B. Tian, “ZipEnhancer: Dual-path down-up sampling- based zipformer for monaural speech enhancement,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2025, pp. 1–5
work page 2025
-
[21]
MN- Net: Speech enhancement network via modeling the noise,
Y . Hu, Q. Yang, W. Wei, L. Lin, L. He, Z. Ou, and W. Yang, “MN- Net: Speech enhancement network via modeling the noise,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 1208–1219, 2025
work page 2025
-
[22]
Interactive target positive and negative features modeling for monaural speech enhancement,
X. Xu, W. Tu, Y . Yang, J. Li, and Y . Zhang, “Interactive target positive and negative features modeling for monaural speech enhancement,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 4856– 4869, 2025
work page 2025
-
[23]
DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement,
Y . Hu, Y . Liu, S. Lv, M. Xing, S. Zhang, Y . Fu, J. Wu, B. Zhang, and L. Xie, “DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement,” inProc. Interspeech, 2020, pp. 2472–2476
work page 2020
-
[24]
N. B. Thien, Y . Wakabayashi, K. Iwai, and T. Nishiura, “Inter-frequency phase difference for phase reconstruction using deep neural networks and maximum likelihood,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 31, pp. 1667–1680, 2023
work page 2023
-
[25]
S. Zhang, Z. Qiu, D. Takeuchi, N. Harada, and S. Makino, “Unre- stricted global phase bias-aware single-channel speech enhancement with conformer-based metric GAN,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2024, pp. 1026–1030
work page 2024
-
[26]
Phase reconstruction based on recurrent phase unwrapping with deep neural networks,
Y . Masuyama, K. Yatabe, Y . Koizumi, Y . Oikawa, and N. Harada, “Phase reconstruction based on recurrent phase unwrapping with deep neural networks,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process., Barcelona, Spain, 2020, pp. 726–730
work page 2020
-
[27]
D. C. Ghiglia and M. D. Pritt,Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software. New York, NY , USA: Wiley, 1998
work page 1998
-
[28]
Universal discrete-domain speech enhancement,
F. Liu, Y . Ai, Y .-X. Lu, R.-C. Zheng, H.-P. Du, and Z.-H. Ling, “Universal discrete-domain speech enhancement,”IEEE Trans. Audio Speech Lang. Process., vol. 34, pp. 285–298, 2026
work page 2026
-
[29]
E. Huang, Z. Zhang, T. Xu, C. Xia, K. Hu, Y . Yang, T. Pan, D. Dong, and Z. Qin, “Holographic transformers for complex-valued signal pro- cessing: Integrating phase interference into self-attention,”arXiv, 2025
work page 2025
-
[30]
S. Maitra and B. Yegnanarayana, “Spectral-envelope and group-delay models for transient signals—applications to castanets and stop conso- nants,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2011, pp. 4412–4415
work page 2011
-
[31]
Group delay functions and its applications in speech technology,
H. A. Murthy and B. Yegnanarayana, “Group delay functions and its applications in speech technology,”S ¯adhan¯a, vol. 36, no. 5, pp. 745– 782, 2011
work page 2011
-
[32]
Complex-valued neural networks: A comprehensive survey,
C. Lee, H. Hasegawa, and S. Gao, “Complex-valued neural networks: A comprehensive survey,”IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1406–1426, 2022
work page 2022
-
[33]
G. Yu, A. Li, H. Wang, Y . Wang, Y . Ke, and C. Zheng, “DBT-Net: Dual-branch federative magnitude and phase estimation with attention- in-attention transformer for monaural speech enhancement,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 30, pp. 2629–2644, 2022
work page 2022
-
[34]
Root mean square layer normalization,
B. Zhang and R. Sennrich, “Root mean square layer normalization,” Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019
work page 2019
-
[35]
Building blocks for a complex-valued trans- former architecture,
F. Eilers and X. Jiang, “Building blocks for a complex-valued trans- former architecture,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2023, pp. 1–5
work page 2023
-
[36]
TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain,
K. Wang, B. He, and W.-P. Zhu, “TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2021, pp. 7098–7102
work page 2021
-
[37]
Investigat- ing RNN-based speech enhancement methods for noise-robust text-to- speech,
C. V . Botinhao, X. Wang, S. Takaki, and J. Yamagishi, “Investigat- ing RNN-based speech enhancement methods for noise-robust text-to- speech,” inProc. ISCA Speech Synth. Workshop, 2016, pp. 159–165
work page 2016
-
[38]
C. K. Reddy, V . Gopal, R. Cutler, E. Beyrami, R. Cheng, H. Dubey, S. Matusevych, R. Aichner, A. Aazami, S. Braun, P. Rana, S. Srinivasan, and J. Gehrke, “The INTERSPEECH 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,” inProc. Interspeech, 2020, pp. 2492–2496
work page 2020
-
[39]
C. Veaux, J. Yamagishi, and S. King, “The voice bank corpus: Design, collection and data analysis of a large regional accent speech database,” inProc. Orient. COCOSDA, 2013, pp. 1–4
work page 2013
-
[40]
J. Thiemann, N. Ito, and E. Vincent, “The diverse environments multi- channel acoustic noise database (DEMAND): A database of multichan- nel environmental noise recordings,” inProc. Meet. Acoust., vol. 19, no. 1, 2013, p. 035081
work page 2013
-
[41]
ICASSP 2021 deep noise suppression challenge,
C. K. A. Reddy, H. Dubey, V . Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, and S. Srinivasan, “ICASSP 2021 deep noise suppression challenge,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2021, pp. 6623–6627
work page 2021
-
[42]
J. S. Garofoloet al., “CSR-I (WSJ0) Complete,” LDC93S6A, 1993. [Online]. Available: https://catalog.ldc.upenn.edu/LDC93S6A
work page 1993
-
[43]
Image method for efficiently simulating small-room acoustics,
J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,”J. Acoust. Soc. Amer., vol. 65, no. 4, pp. 943–950, 1979
work page 1979
-
[44]
HiFi-GAN: Generative adversarial net- works for efficient and high fidelity speech synthesis,
J. Kong, J. Kim, and J. Bae, “HiFi-GAN: Generative adversarial net- works for efficient and high fidelity speech synthesis,”Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 17 022–17 033, 2020
work page 2020
-
[45]
BAPEN: Towards versatile audio phase retrieval,
L. Dai, A. Li, Z. Han, C. Zheng, and X. Li, “BAPEN: Towards versatile audio phase retrieval,” inProc. ACM Int. Conf. Multimedia, 2025, pp. 8293–8302
work page 2025
-
[46]
Phase- aware speech enhancement with deep complex U-Net,
H.-S. Choi, J.-H. Kim, J. Huh, A. Kim, J.-W. Ha, and K. Lee, “Phase- aware speech enhancement with deep complex U-Net,” inProc. Int. Conf. Learn. Represent. (ICLR), 2019
work page 2019
-
[47]
A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., vol. 2, 2001, pp. 749–752
work page 2001
-
[48]
A short- time objective intelligibility measure for time-frequency weighted noisy speech,
C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “A short- time objective intelligibility measure for time-frequency weighted noisy speech,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2010, pp. 4214–4217
work page 2010
-
[49]
Evaluation of objective quality measures for speech enhancement,
Y . Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,”IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 1, pp. 229–238, 2008
work page 2008
-
[50]
SDR – half- baked or well done?
J. Le Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “SDR – half- baked or well done?” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2019, pp. 626–630
work page 2019
-
[51]
DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,
C. K. Reddy, V . Gopal, and R. Cutler, “DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2021, pp. 6493–6497
work page 2021
-
[52]
UTMOS: UTokyo-SaruLab system for V oiceMOS chal- lenge 2022,
T. Saeki, D. Xin, W. Nakata, T. Koriyama, S. Takamichi, and H. Saruwatari, “UTMOS: UTokyo-SaruLab system for V oiceMOS chal- lenge 2022,” inProc. Interspeech, 2022, pp. 4521–4525
work page 2022
-
[53]
Signal estimation from modified short-time Fourier transform,
D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,”IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp. 236–243, 1984
work page 1984
-
[54]
DiffPhase: Generative diffusion- based STFT phase retrieval,
T. Peer, S. Welker, and T. Gerkmann, “DiffPhase: Generative diffusion- based STFT phase retrieval,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2023, pp. 1–5
work page 2023
-
[55]
An investigation of incorporating mamba for speech enhancement,
R. Chao, W.-H. Cheng, M. La Quatra, S. M. Siniscalchi, C.-H. H. Yang, S.-W. Fu, and Y . Tsao, “An investigation of incorporating mamba for speech enhancement,” inProc. IEEE Spoken Lang. Technol. Workshop (SLT), 2024, pp. 302–308
work page 2024
-
[56]
FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement,
S. Zhao, B. Ma, K. N. Watcharasupat, and W.-S. Gan, “FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2022, pp. 9281–9285
work page 2022
-
[57]
Universal score- based speech enhancement with high content preservation,
R. Scheibler, Y . Fujita, Y . Shirahata, and T. Komatsu, “Universal score- based speech enhancement with high content preservation,” inProc. Interspeech, 2024, pp. 1165–1169
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.