BioSEN: A Bio-acoustic Signal Enhancement Network for Animal Vocalizations

Hisako Nomura; Linh Thi Hoai Nguyen; Ngamta Thamwattana; Tianyu Song; Ton Viet Ta

arxiv: 2605.12534 · v2 · pith:3PIWSOG2new · submitted 2026-05-02 · 💻 cs.SD · cs.LG· q-bio.NC

BioSEN: A Bio-acoustic Signal Enhancement Network for Animal Vocalizations

Tianyu Song , Ton Viet Ta , Ngamta Thamwattana , Hisako Nomura , Linh Thi Hoai Nguyen This is my paper

Pith reviewed 2026-05-15 07:33 UTC · model grok-4.3

classification 💻 cs.SD cs.LGq-bio.NC

keywords bioacousticssignal enhancementanimal vocalizationsneural networkattention mechanismnoise reductionharmonic structurebiodiversity monitoring

0 comments

The pith

BioSEN adapts speech enhancement methods into a lighter network that cleans animal vocalization recordings as well as or better than existing models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Most audio enhancement work targets human speech, but animal sounds present distinct challenges such as harmonic patterns and variable energy in noisy field recordings. The paper introduces BioSEN to close this gap by adapting speech techniques with three specialized modules that extract multi-scale features, preserve harmonics, and gate frequencies to retain vocalizations. Tests across three bioacoustic datasets show the model matches or exceeds leading speech enhancement systems while using far less computation. This efficiency matters because it could support practical, scalable processing for wildlife monitoring without heavy hardware demands. The results indicate that targeted modifications allow speech methods to transfer effectively to non-speech audio domains.

Core claim

BioSEN consists of a multi-scale dual-axis attention unit for time-frequency feature extraction, a bio-harmonic multi-scale enhancement unit for capturing harmonic structures, and an energy-adaptive gating connection unit that applies frequency weights to prevent vocalizations from being removed as noise. When evaluated on three bioacoustic datasets, this architecture matches or exceeds state-of-the-art speech enhancement models while requiring substantially less computation.

What carries the argument

BioSEN's three-module architecture, especially the energy-adaptive gating connection unit that uses frequency weights to preserve animal vocalizations during enhancement.

If this is right

Bioacoustic datasets can be cleaned effectively for downstream analysis such as species identification without high computational cost.
Conservation monitoring systems gain the ability to process field recordings in real time on modest hardware.
Speech enhancement techniques transfer to animal sounds when modified for harmonics and energy patterns.
Reduced model complexity enables wider deployment in biodiversity projects with limited resources.
Noisy animal recordings become more usable for long-term ecological studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The efficiency could support embedding the model in portable sensors for continuous on-site wildlife tracking.
Similar modular designs might extend to enhancement of other non-speech sounds such as insect calls or marine acoustics.
Broad generalization would reduce reliance on large labeled datasets for each new species.
Cross-domain audio processing in ecology becomes more feasible if the gating mechanism proves robust beyond the tested cases.

Load-bearing premise

Adaptations of speech enhancement methods with these modules will generalize across diverse animal species and recording conditions without requiring extensive species-specific retraining or validation.

What would settle it

Applying BioSEN to recordings from a previously untested animal species or under substantially different noise conditions and observing that it falls below speech model performance or needs major retraining to recover accuracy.

Figures

Figures reproduced from arXiv: 2605.12534 by Hisako Nomura, Linh Thi Hoai Nguyen, Ngamta Thamwattana, Tianyu Song, Ton Viet Ta.

**Figure 2.1.** Figure 2.1: Architecture of BioSEN [PITH_FULL_IMAGE:figures/full_fig_p003_2_1.png] view at source ↗

**Figure 2.2.** Figure 2.2: Structure of the MSDA module. Given complex input features X ∈ R B×F ×T ×C×2 , the process is as follows. First, dual multi-head attention layers capture temporal and frequency contextual information (Adual): Specifically, the input X is reshaped to (B×F,T,C) for time attention and (B×T,F,C) for frequency attention to capture respective dependencies. This axis-specific separation is designed to improve … view at source ↗

**Figure 2.3.** Figure 2.3: Structure of the BHME module. Unlike fixed Gammatone filters, this module naturally learns diverse harmonic spacing patterns via gradient optimization without explicit constraints. Using a parallel multibranch structure, it mimics the auditory perception of animal calls with varying fundamental frequencies and pitches. This allows the model to emphasize harmonic patterns of different densities, preserv… view at source ↗

**Figure 2.4.** Figure 2.4: Structure of the EAGC module. EAGC acts as an adaptive filter by combining frequency-aware gating with crossattention. For encoder features E and decoder features D, the process is as follows: Frequency-weighted gating: An initial learnable gate G identifies candidate encoder features Eo. These are then modulated by a frequency energy weight Wfreq, derived from the spectral energy distribution to preser… view at source ↗

read the original abstract

Most work in audio enhancement targets human speech, while bioacoustics is less studied due to noisy recordings and the distinct traits of animal sounds. To fill this gap, we adapt speech enhancement methods and build BioSEN, a model made for bioacoustic signals. BioSEN has three modules: a multi-scale dual-axis attention unit for time-frequency feature extraction, a bio-harmonic multi-scale enhancement unit for capturing harmonic structures, and an energy-adaptive gating connection unit that uses frequency weights to keep vocalizations from being removed as noise. Tests on three bioacoustic datasets show that BioSEN matches or exceeds state-of-the-art speech enhancement models while using far less computation. These results show BioSEN's strength for bioacoustic audio enhancement and its promise for biodiversity monitoring and conservation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BioSEN adapts speech enhancement to animal vocalizations with three modules and claims efficiency gains on three datasets, but the abstract supplies too few details to judge whether the results hold up.

read the letter

The main takeaway is that BioSEN adapts speech enhancement methods to bioacoustics by adding three specialized modules for attention, harmonics, and energy gating, claiming better or comparable performance with lower compute on three datasets. It does well by focusing on an understudied area. Animal vocalizations differ from speech in their harmonic content and energy distribution, so standard models might not preserve them effectively. The bio-harmonic unit and gating connection seem like reasonable ways to handle that. The soft spots are mainly in the evaluation details. The abstract doesn't provide specific performance numbers, baseline comparisons, or information on how the datasets were chosen and processed. This makes it tough to verify if the gains are significant or if the less computation claim is measured consistently. The assumption that it generalizes without species-specific tuning could be a weak point if the tests are limited. This paper is for researchers in bioacoustics, ecology, and conservation technology. Someone working on processing field audio for monitoring would get practical value from the architecture ideas. I recommend putting it through peer review. The work is grounded in a clear problem and offers a testable solution, so referees can check the full results and see if it holds up.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces BioSEN, a neural network architecture for enhancing bio-acoustic signals from animal vocalizations. The model adapts speech enhancement methods and incorporates three specialized modules: a multi-scale dual-axis attention unit for extracting time-frequency features, a bio-harmonic multi-scale enhancement unit to capture harmonic structures in animal sounds, and an energy-adaptive gating connection unit that uses frequency weights to prevent removal of vocalizations as noise. The central claim is that evaluations on three bioacoustic datasets show BioSEN matching or exceeding state-of-the-art speech enhancement models while requiring substantially less computational resources.

Significance. Should the empirical results be confirmed with rigorous validation, this contribution would be significant for the field of bioacoustics. It addresses the gap in audio enhancement for non-speech signals by providing a computationally efficient model tailored to the characteristics of animal vocalizations. This has direct implications for biodiversity monitoring, conservation efforts, and automated analysis of field recordings, where noise is prevalent and computational resources may be limited.

major comments (2)

[§4 (Experimental Evaluation)] §4 (Experimental Evaluation): The abstract and available text report performance gains on three bioacoustic datasets but provide no details on the specific baselines implemented, the evaluation metrics used (e.g., whether SNR, STOI, or bioacoustic-specific measures), error bars or statistical significance, dataset characteristics (species, recording conditions, sizes), or training procedures. This information is load-bearing for assessing the claim that BioSEN matches or exceeds SOTA with less computation.
[§3 (Model Architecture)] §3 (Model Architecture): The description of the three modules (multi-scale dual-axis attention unit, bio-harmonic multi-scale enhancement unit, energy-adaptive gating connection unit) is high-level; without equations or diagrams showing how they differ from standard speech enhancement components, it is difficult to evaluate the novelty and the rationale for their design choices.

minor comments (1)

[Abstract] Abstract: The phrase 'far less computation' is imprecise; the manuscript should quantify this (e.g., number of parameters, FLOPs, or inference time) in comparison to the baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and positive evaluation of the potential significance of BioSEN for bioacoustics applications. We address each major comment below and will revise the manuscript to strengthen the presentation of our contributions.

read point-by-point responses

Referee: [§4 (Experimental Evaluation)] §4 (Experimental Evaluation): The abstract and available text report performance gains on three bioacoustic datasets but provide no details on the specific baselines implemented, the evaluation metrics used (e.g., whether SNR, STOI, or bioacoustic-specific measures), error bars or statistical significance, dataset characteristics (species, recording conditions, sizes), or training procedures. This information is load-bearing for assessing the claim that BioSEN matches or exceeds SOTA with less computation.

Authors: We agree that §4 currently lacks sufficient detail to fully substantiate the empirical claims. In the revised manuscript we will expand the experimental evaluation section to explicitly list all baselines (including the specific speech enhancement models and their implementations), the full set of metrics (SNR, STOI, PESQ, and bioacoustic-specific measures such as vocalization detection F1-score), error bars computed over multiple random seeds with statistical significance tests (e.g., paired t-tests), complete dataset descriptions (species, recording environments, total duration, train/validation/test splits), and training procedures (optimizer, learning rate schedule, batch size, loss function, and hardware used). We will also add a table summarizing computational cost (FLOPs, parameters, inference time) for direct comparison. revision: yes
Referee: [§3 (Model Architecture)] §3 (Model Architecture): The description of the three modules (multi-scale dual-axis attention unit, bio-harmonic multi-scale enhancement unit, energy-adaptive gating connection unit) is high-level; without equations or diagrams showing how they differ from standard speech enhancement components, it is difficult to evaluate the novelty and the rationale for their design choices.

Authors: We acknowledge that the current description of the three modules in §3 is high-level. In the revision we will add the full mathematical formulations (equations) for each component, including the multi-scale dual-axis attention mechanism, the bio-harmonic multi-scale enhancement operations that explicitly model harmonic structures, and the energy-adaptive gating equations that incorporate frequency-dependent weights. We will also include a new figure with block diagrams that contrast each module against the corresponding standard blocks in speech enhancement networks (e.g., dual-path RNN or conformer layers) to clearly illustrate the bioacoustic-specific modifications and their motivation. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents BioSEN as an empirical adaptation of speech-enhancement architectures to bioacoustic signals, with performance claims resting entirely on standard dataset comparisons rather than any derivation chain. No equations, fitted parameters renamed as predictions, self-definitional modules, or load-bearing self-citations appear in the abstract or described structure; the three modules are introduced as design choices whose value is assessed externally via metrics on held-out bioacoustic recordings. This leaves the central claim self-contained and falsifiable against independent baselines.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 3 invented entities

Only abstract available; model likely rests on standard deep-learning assumptions plus domain-specific module designs whose parameters are learned from data.

free parameters (3)

multi-scale attention hyperparameters
Number of scales, attention heads, and channel dimensions fitted during training on bioacoustic data.
harmonic enhancement scales
Parameters controlling multi-scale harmonic feature extraction chosen or learned to match animal vocalization traits.
energy-adaptive gating weights
Frequency-specific weights learned to preserve vocalizations versus noise.

axioms (2)

domain assumption Neural networks can learn useful time-frequency representations from labeled or paired noisy-clean audio data.
Implicit in the use of supervised enhancement training.
domain assumption Animal vocalizations exhibit distinct harmonic structures separable from background noise via learned filters.
Basis for the bio-harmonic unit.

invented entities (3)

multi-scale dual-axis attention unit no independent evidence
purpose: Extract time-frequency features at multiple scales for bioacoustic signals.
New architectural component introduced for this task.
bio-harmonic multi-scale enhancement unit no independent evidence
purpose: Capture harmonic structures specific to animal vocalizations.
Domain-adapted module not present in standard speech models.
energy-adaptive gating connection unit no independent evidence
purpose: Use frequency weights to prevent removal of vocalizations as noise.
Gating mechanism tailored to bioacoustic energy patterns.

pith-pipeline@v0.9.0 · 5453 in / 1484 out tokens · 44828 ms · 2026-05-15T07:33:21.943417+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

B., Myers, C

Kohlberg, A. B., Myers, C. R., Figueroa, L. L. (2024). Fro m buzzes to bytes: A sys- tematic review of automated bioacoustics models used to det ect, classify and monitor insects. J. Appl. Ecol. , 61(6), 1199–1211

work page 2024
[2]

K., Camp, R

Navine, A. K., Camp, R. J., Weldy, M. J., Denton, T., Hart, P. J. (2024). Counting the chorus: A bioacoustic indicator of population density. Ecological Indicators, 169, 112930

work page 2024
[3]

H., Stowell, D., Briefer, E

Rasmussen, J. H., Stowell, D., Briefer, E. F. (2024). Sou nd evidence for biodiversity monitoring. Science, 385(6705), 138–140

work page 2024
[4]

Sharma, S., Sato, K., Gautam, B. P. (2023). A methodologi cal literature review of acoustic wildlife monitoring using artiﬁcial intelligenc e tools and techniques. Sustain- ability, 15(9), 7128

work page 2023
[5]

Gajecki, T., Nogueira, W. (2025). Adversarial learning for end-to-end cochlear speech denoising using lightweight deep learning models. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 1–5

work page 2025
[6]

Dementyev, A., Reddy, C. K. A., Wisdom, S., Chatlani, N., Hershey, J. R., Lyon, R. F. (2025). Towards sub-millisecond latency real-time spee ch enhancement models on hearables. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 1–5

work page 2025
[7]

Zhao, Y., Xie, Y., Ren, J., Wang, W., Xu, J. (2025). Dual-a xis spectrum attention network: A robust model for underwater acoustic signal deno ising. Applied Acoustics, 240, 110865

work page 2025
[8]

Tang, J., Chen, Z., Chen, M. (2025). A novel underwater ac oustic signal denoising model based on complex convolution dual-branch multi-scal e attention network. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 1–5

work page 2025
[9]

Barnhill, A., N¨ oth, E., Maier, A., Bergler, C. (2024). A NIMAL-CLEAN: A deep denoising toolkit for animal-independent signal enhancem ent. Proc. Interspeech

work page 2024
[10]

Juodakis, J., Marsland, S. (2022). Wind-robust sound e vent detection and denoising for bioacoustics. Methods Ecol. Evol. , 13, 2005–2017

work page 2022
[11]

Song, T., Ta, T. V. (2025). Towards high-ﬁdelity and con trollable bioacoustic gener- ation via enhanced diﬀusion learning. arXiv preprint arXiv:2509.00318

work page arXiv 2025
[12]

, Pietquin, O., Eﬀenberger, F., Cusimano, M

Miron, M., Keen, S., Liu, J.-Y., Hoﬀman, B., Hagiwara, M. , Pietquin, O., Eﬀenberger, F., Cusimano, M. (2024). Biodenoising: animal vocalizatio n denoising without access to clean data. arXiv preprint arXiv:2410.03427

work page arXiv 2024
[13]

Sarkar, E., Magimai.-Doss, M. (2025). Comparing self- supervised learning models pre-trained on human speech and animal vocalizations for bi oacoustics processing. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 1–5

work page 2025
[14]

Vellinga, W. (2025). Xeno-canto – bird sounds from arou nd the world. Xeno-canto Foundation for Nature Sounds, [Online]. Available: https://doi.org/10.15468/qv0ksn. 9

work page doi:10.15468/qv0ksn 2025
[15]

Earth species library: O pen datasets for bioacoustic research

Earth Species Project (2020). Earth species library: O pen datasets for bioacoustic research. [Online]. Available: https://github.com/earthspecies/library

work page 2020
[16]

Mumm, C. A. S., Kn¨ ornschild, M. (2014). The Vocal Repertoire of Adult and Neonate Giant Otters (Pteronura brasiliensis). PLoS ONE , 9(11), e112562

work page 2014
[17]

E., Theunissen, F

Elie, J. E., Theunissen, F. E. (2018). Zebra ﬁnches iden tify individuals using vocal signatures unique to each call type. Nature Communications, 9, 4026

work page 2018
[18]

Yin, S., McCowan, B. (2004). Acoustic similarity and aﬃ liation in rhesus macaques (Macaca mulatta). Animal Behaviour , 68, 343-355

work page 2004
[19]

Yang, L., Liu, W., Meng, R., Lee, G., Baek, S., Moon, H.-G . (2024). FSPEN: An ultra-lightweight network for real-time speech enhanceme nt. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 10671–10675

work page 2024
[20]

Yan, H., Zhang, J., Fan, C., Zhou, Y., Liu, P. (2025). LiS enNet: Lightweight sub- band and dual-path modeling for real-time speech enhanceme nt. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 1–5

work page 2025
[21]

Defossez, A., Synnaeve, G., Adi, Y. (2020). Real-time s peech enhancement in the waveform domain. arXiv preprint arXiv:2006.12847

work page arXiv 2020
[22]

Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement.arXiv preprint arXiv:2008.00264,

Hu, Y., Liu, Y., Lv, S., Xing, M., Zhang, S., Fu, Y., Wu, J. , Zhang, B., Xie, L. (2020). DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement. arXiv preprint arXiv:2008.00264

work page arXiv 2020
[23]

Chen, J., Wang, Z., Tuo, D., Wu, Z., Kang, S., Meng, H. (20 22). FullSubNet+: Channel attention FullSubNet with complex spectrograms fo r speech enhancement. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 7857–7861. 10

work page

[1] [1]

B., Myers, C

Kohlberg, A. B., Myers, C. R., Figueroa, L. L. (2024). Fro m buzzes to bytes: A sys- tematic review of automated bioacoustics models used to det ect, classify and monitor insects. J. Appl. Ecol. , 61(6), 1199–1211

work page 2024

[2] [2]

K., Camp, R

Navine, A. K., Camp, R. J., Weldy, M. J., Denton, T., Hart, P. J. (2024). Counting the chorus: A bioacoustic indicator of population density. Ecological Indicators, 169, 112930

work page 2024

[3] [3]

H., Stowell, D., Briefer, E

Rasmussen, J. H., Stowell, D., Briefer, E. F. (2024). Sou nd evidence for biodiversity monitoring. Science, 385(6705), 138–140

work page 2024

[4] [4]

Sharma, S., Sato, K., Gautam, B. P. (2023). A methodologi cal literature review of acoustic wildlife monitoring using artiﬁcial intelligenc e tools and techniques. Sustain- ability, 15(9), 7128

work page 2023

[5] [5]

Gajecki, T., Nogueira, W. (2025). Adversarial learning for end-to-end cochlear speech denoising using lightweight deep learning models. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 1–5

work page 2025

[6] [6]

Dementyev, A., Reddy, C. K. A., Wisdom, S., Chatlani, N., Hershey, J. R., Lyon, R. F. (2025). Towards sub-millisecond latency real-time spee ch enhancement models on hearables. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 1–5

work page 2025

[7] [7]

Zhao, Y., Xie, Y., Ren, J., Wang, W., Xu, J. (2025). Dual-a xis spectrum attention network: A robust model for underwater acoustic signal deno ising. Applied Acoustics, 240, 110865

work page 2025

[8] [8]

Tang, J., Chen, Z., Chen, M. (2025). A novel underwater ac oustic signal denoising model based on complex convolution dual-branch multi-scal e attention network. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 1–5

work page 2025

[9] [9]

Barnhill, A., N¨ oth, E., Maier, A., Bergler, C. (2024). A NIMAL-CLEAN: A deep denoising toolkit for animal-independent signal enhancem ent. Proc. Interspeech

work page 2024

[10] [10]

Juodakis, J., Marsland, S. (2022). Wind-robust sound e vent detection and denoising for bioacoustics. Methods Ecol. Evol. , 13, 2005–2017

work page 2022

[11] [11]

Song, T., Ta, T. V. (2025). Towards high-ﬁdelity and con trollable bioacoustic gener- ation via enhanced diﬀusion learning. arXiv preprint arXiv:2509.00318

work page arXiv 2025

[12] [12]

, Pietquin, O., Eﬀenberger, F., Cusimano, M

Miron, M., Keen, S., Liu, J.-Y., Hoﬀman, B., Hagiwara, M. , Pietquin, O., Eﬀenberger, F., Cusimano, M. (2024). Biodenoising: animal vocalizatio n denoising without access to clean data. arXiv preprint arXiv:2410.03427

work page arXiv 2024

[13] [13]

Sarkar, E., Magimai.-Doss, M. (2025). Comparing self- supervised learning models pre-trained on human speech and animal vocalizations for bi oacoustics processing. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 1–5

work page 2025

[14] [14]

Vellinga, W. (2025). Xeno-canto – bird sounds from arou nd the world. Xeno-canto Foundation for Nature Sounds, [Online]. Available: https://doi.org/10.15468/qv0ksn. 9

work page doi:10.15468/qv0ksn 2025

[15] [15]

Earth species library: O pen datasets for bioacoustic research

Earth Species Project (2020). Earth species library: O pen datasets for bioacoustic research. [Online]. Available: https://github.com/earthspecies/library

work page 2020

[16] [16]

Mumm, C. A. S., Kn¨ ornschild, M. (2014). The Vocal Repertoire of Adult and Neonate Giant Otters (Pteronura brasiliensis). PLoS ONE , 9(11), e112562

work page 2014

[17] [17]

E., Theunissen, F

Elie, J. E., Theunissen, F. E. (2018). Zebra ﬁnches iden tify individuals using vocal signatures unique to each call type. Nature Communications, 9, 4026

work page 2018

[18] [18]

Yin, S., McCowan, B. (2004). Acoustic similarity and aﬃ liation in rhesus macaques (Macaca mulatta). Animal Behaviour , 68, 343-355

work page 2004

[19] [19]

Yang, L., Liu, W., Meng, R., Lee, G., Baek, S., Moon, H.-G . (2024). FSPEN: An ultra-lightweight network for real-time speech enhanceme nt. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 10671–10675

work page 2024

[20] [20]

Yan, H., Zhang, J., Fan, C., Zhou, Y., Liu, P. (2025). LiS enNet: Lightweight sub- band and dual-path modeling for real-time speech enhanceme nt. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 1–5

work page 2025

[21] [21]

Defossez, A., Synnaeve, G., Adi, Y. (2020). Real-time s peech enhancement in the waveform domain. arXiv preprint arXiv:2006.12847

work page arXiv 2020

[22] [22]

Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement.arXiv preprint arXiv:2008.00264,

Hu, Y., Liu, Y., Lv, S., Xing, M., Zhang, S., Fu, Y., Wu, J. , Zhang, B., Xie, L. (2020). DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement. arXiv preprint arXiv:2008.00264

work page arXiv 2020

[23] [23]

Chen, J., Wang, Z., Tuo, D., Wu, Z., Kang, S., Meng, H. (20 22). FullSubNet+: Channel attention FullSubNet with complex spectrograms fo r speech enhancement. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , 7857–7861. 10

work page