Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement
Pith reviewed 2026-05-22 00:45 UTC · model grok-4.3
The pith
A new distillation framework for speech enhancement integrates local and global knowledge through intra-set and inter-set recursive fusion plus dual-stream time-frequency cross-calibration.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The I²SRF-TFCKD framework constructs intra-set and inter-set correlations for collaborative distillation, generates fused representative features via recursive fusion, and applies multi-layer interactive distillation with dual-stream time-frequency cross-calibration to exploit speech time-frequency differentials, yielding consistent gains for the low-complexity student model over other distillation schemes on both single-channel and multi-channel datasets.
What carries the argument
Dual-stream time-frequency cross-calibration that computes similarity weights in the time and frequency domains separately then cross-weights them to refine per-layer distillation contributions based on speech signal traits.
If this is right
- The low-complexity student model records higher objective scores on single-channel and multi-channel speech enhancement tasks than before distillation.
- The method surpasses other distillation schemes in direct comparisons on the same datasets.
- The framework can be applied to existing high-ranking networks such as DPDCRN to produce efficient yet capable enhancement systems.
- Local information focusing and global knowledge circulation occur simultaneously within one distillation pass.
Where Pith is reading between the lines
- Similar time-frequency cross-calibration could be tested in other audio tasks where spectral and temporal structure must be preserved during model compression.
- The recursive fusion step suggests a general pattern for circulating knowledge across multiple related training subsets in distillation pipelines.
- If the calibration weights prove stable across datasets, the approach may reduce the need for hand-tuned layer importance in future speech models.
Load-bearing premise
That speech signals contain distinct time-frequency differential information that pairwise multi-layer matching and dual-stream cross-calibration can reliably capture and exploit for measurable performance gains.
What would settle it
If objective metrics on the single-channel and multi-channel test sets show the student model gains no advantage or loses to standard distillation baselines, the claimed benefit of the time-frequency calibrated recursive fusion would not hold.
Figures
read the original abstract
In this paper, we propose an intra-set and inter-set recursive fusion framework with time-frequency calibrated knowledge distillation (I$^2$SRF-TFCKD) for SE. Different from previous distillation strategies for SE, the proposed framework fully exploits the time-frequency differential information of speech while facilitating both local information focusing and global knowledge circulation. Firstly, we construct a collaborative distillation paradigm for intra-set and inter-set correlations. Within a correlated set, multi-layer teacher-student features are pairwise matched for calibrated distillation. Subsequently, we generate representative features from each correlated set through recursive fusion to form the fused feature set that enables inter-set knowledge interaction. Secondly, we propose a multi-layer interactive distillation based on dual-stream time-frequency cross-calibration, which calculates the teacher-student similarity calibration weights in the time and frequency domains respectively and performs cross-weighting, thus enabling refined allocation of distillation contributions across different layers according to speech characteristics. The proposed distillation strategy is applied to the dual-path dilated convolutional recurrent network (DPDCRN) that ranked first in the SE track of the L3DAS23 challenge. To evaluate the effectiveness of I$^2$SRF-TFCKD, we conduct experiments on both single-channel and multi-channel SE datasets. Objective evaluations demonstrate that the proposed KD strategy consistently and effectively improves the performance of the low-complexity student model and outperforms other distillation schemes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes I²SRF-TFCKD, an intra-set and inter-set recursive fusion framework with time-frequency calibrated knowledge distillation for speech enhancement. It builds a collaborative distillation paradigm using multi-layer teacher-student feature matching within correlated sets, recursive fusion for inter-set interaction, and dual-stream time-frequency cross-calibration for refined layer-wise distillation weights based on speech time-frequency characteristics. The method is applied to the DPDCRN architecture (first place in L3DAS23 SE track) and evaluated on single- and multi-channel datasets, claiming consistent improvements to the low-complexity student model and outperformance versus other distillation schemes.
Significance. If the empirical gains are robustly verified, the work could contribute to more effective knowledge distillation for speech enhancement by explicitly leveraging time-frequency differential information for both local focusing and global circulation. The choice of a high-performing baseline (DPDCRN) and evaluation across single- and multi-channel scenarios are strengths that increase potential impact in practical low-complexity SE deployments.
major comments (2)
- [Method (multi-layer interactive distillation) and Experiments] The central claim that pairwise multi-layer matching plus dual-stream time-frequency cross-calibration produces measurable, consistent gains by exploiting time-frequency differential information (abstract and method description) is load-bearing for the outperformance assertion. No ablation is described that removes only the cross-calibration (or the recursive inter-set fusion) while holding the rest of the pipeline fixed; without such controls it remains possible that observed improvements arise from basic multi-layer matching or training schedule differences rather than the claimed TF exploitation.
- [Experiments / Results] The abstract states that objective evaluations demonstrate improvements and outperformance, yet the provided description supplies no quantitative metrics (e.g., PESQ, STOI, SI-SDR deltas), error bars, statistical tests, or explicit data-split details. These are required to substantiate the claim that the strategy 'consistently and effectively improves' the student model across datasets.
minor comments (2)
- [Abstract / Introduction] The acronym I²SRF-TFCKD is introduced without immediate expansion; spelling it out on first use would improve readability.
- [Introduction] Ensure all referenced prior distillation strategies for SE are accompanied by specific citations rather than general statements.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment below with clarifications drawn from the manuscript and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Method (multi-layer interactive distillation) and Experiments] The central claim that pairwise multi-layer matching plus dual-stream time-frequency cross-calibration produces measurable, consistent gains by exploiting time-frequency differential information (abstract and method description) is load-bearing for the outperformance assertion. No ablation is described that removes only the cross-calibration (or the recursive inter-set fusion) while holding the rest of the pipeline fixed; without such controls it remains possible that observed improvements arise from basic multi-layer matching or training schedule differences rather than the claimed TF exploitation.
Authors: We agree that a more granular ablation isolating the dual-stream time-frequency cross-calibration (and separately the recursive inter-set fusion) would strengthen the evidence that gains arise specifically from TF differential exploitation rather than generic multi-layer matching. The current manuscript reports comparisons of the full I²SRF-TFCKD against prior distillation methods and includes component-level analysis within the collaborative paradigm, but does not present the exact controlled removal requested. In the revised manuscript we will add these targeted ablations while holding all other elements (including training schedule and base multi-layer matching) fixed. revision: yes
-
Referee: [Experiments / Results] The abstract states that objective evaluations demonstrate improvements and outperformance, yet the provided description supplies no quantitative metrics (e.g., PESQ, STOI, SI-SDR deltas), error bars, statistical tests, or explicit data-split details. These are required to substantiate the claim that the strategy 'consistently and effectively improves' the student model across datasets.
Authors: The full manuscript (Section 4 and associated tables) reports concrete quantitative results on both single- and multi-channel datasets, including PESQ, STOI and SI-SDR values with direct comparisons to the teacher, the student without distillation, and competing distillation schemes. Data splits follow the standard partitions of the respective benchmarks (e.g., L3DAS23 and other public SE corpora). To make these findings immediately visible, we will incorporate representative numerical deltas into the abstract and ensure error bars together with any statistical significance statements are explicitly stated or added in the results section. revision: partial
Circularity Check
No circularity: empirical gains from proposed TF-calibrated KD rest on dataset comparisons, not self-referential definitions or fits
full rationale
The paper describes an I²SRF-TFCKD framework that combines intra-set/inter-set recursive fusion with dual-stream time-frequency cross-calibration for distilling a student model from a teacher on speech enhancement tasks. The central claim of consistent improvement over other KD schemes is presented as the outcome of objective evaluations on single- and multi-channel datasets, with no equations, derivations, or parameter-fitting steps shown that would make the reported gains equivalent to the method's own inputs by construction. The described mechanisms (pairwise multi-layer matching, recursive fusion, and cross-weighting of similarity calibration weights) are architectural choices whose contribution is asserted via external benchmark comparisons rather than internal redefinition or self-citation chains. This is a standard empirical ML paper whose validity hinges on reproducibility of the experiments, not on any load-bearing step that collapses to tautology.
Axiom & Free-Parameter Ledger
free parameters (1)
- layer-wise matching weights and calibration hyperparameters
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
multi-layer interactive distillation based on dual-stream time-frequency cross-calibration, which calculates the teacher-student similarity calibration weights in the time and frequency domains respectively and performs cross-weighting
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_add unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
recursive fusion to form the fused feature set that enables inter-set knowledge interaction
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
C. Zheng, H. Zhang, W. Liu, X. Luo, A. Li, X. Li, and B. C. Moore, “Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods,” Trends in Hearing , vol. 27, pp. 1–52, 2023
work page 2023
-
[2]
Multiple statis- tical models for soft decision in noisy speech enhancement,
J.-H. Chang, S. Gazor, N. S. Kim, and S. K. Mitra, “Multiple statis- tical models for soft decision in noisy speech enhancement,” Pattern Recognition, vol. 40, no. 3, pp. 1123–1134, 2007
work page 2007
-
[3]
A regression approach to speech enhancement based on deep neural networks,
Y . Xu, J. Du, L.-R. Dai, and C.-H. Lee, “A regression approach to speech enhancement based on deep neural networks,” IEEE/ACM transactions on audio, speech, and language processing , vol. 23, no. 1, pp. 7–19, 2014
work page 2014
-
[4]
C. K. Reddy, V . Gopal, R. Cutler, E. Beyrami, R. Cheng, H. Dubey, S. Matusevych, R. Aichner, A. Aazami, S. Braun, P. Rana, S. Srinivasan, and J. Gehrke, “The interspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,” in Inter- speech 2020 , 2020, pp. 2492–2496
work page 2020
-
[5]
Fullsubnet: A full-band and sub- band fusion model for real-time single-channel speech enhancement,
X. Hao, X. Su, R. Horaud, and X. Li, “Fullsubnet: A full-band and sub- band fusion model for real-time single-channel speech enhancement,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2021, pp. 6633–6637. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXXX XXXX 13
work page 2021
-
[6]
Frcrn: Boosting feature representation using frequency recurrence for monaural speech enhancement,
S. Zhao, B. Ma, K. N. Watcharasupat, and W.-S. Gan, “Frcrn: Boosting feature representation using frequency recurrence for monaural speech enhancement,” in ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) . IEEE, 2022, pp. 9281–9285
work page 2022
-
[7]
Dynamical channel pruning by conditional accuracy change for deep neural networks,
Z. Chen, T.-B. Xu, C. Du, C.-L. Liu, and H. He, “Dynamical channel pruning by conditional accuracy change for deep neural networks,”IEEE Transactions on Neural Networks and Learning Systems , vol. 32, no. 2, pp. 799–813, 2021
work page 2021
-
[8]
General bitwidth assignment for efficient deep convolutional neural network quantization,
W. Fei, W. Dai, C. Li, J. Zou, and H. Xiong, “General bitwidth assignment for efficient deep convolutional neural network quantization,” IEEE Transactions on Neural Networks and Learning Systems , vol. 33, no. 10, pp. 5253–5267, 2022
work page 2022
-
[9]
Col- laborative knowledge distillation via multiknowledge transfer,
J. Gou, L. Sun, B. Yu, L. Du, K. Ramamohanarao, and D. Tao, “Col- laborative knowledge distillation via multiknowledge transfer,” IEEE Transactions on Neural Networks and Learning Systems , vol. 35, no. 5, pp. 6718–6730, 2024
work page 2024
-
[10]
M. Stamenovic, N. L. Westhausen, L.-C. Yang, C. Jensen, and A. Pawlicki, “Weight, block or unit? exploring sparsity tradeoffs for speech enhancement on tiny neural accelerators,” arXiv preprint arXiv:2111.02351, 2021
-
[11]
Towards model compression for deep learning based speech enhancement,
K. Tan and D. Wang, “Towards model compression for deep learning based speech enhancement,” IEEE/ACM transactions on audio, speech, and language processing , vol. 29, pp. 1785–1794, 2021
work page 2021
-
[12]
S. Kim and M. Kim, “Test-time adaptation toward personalized speech enhancement: Zero-shot learning with knowledge distillation,” in 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2021, pp. 176–180
work page 2021
-
[13]
M. Thakker, S. E. Eskimez, T. Yoshioka, and H. Wang, “Fast real-time personalized speech enhancement: End-to-end enhancement network (e3net) and knowledge distillation,” in Interspeech 2022, 2022, pp. 991– 995
work page 2022
-
[14]
Y . Wan, Y . Zhou, X. Peng, K.-W. Chang, and Y . Lu, “Abc-kd: Attention- based-compression knowledge distillation for deep learning-based noise suppression,” in Interspeech 2023 , 2023, pp. 2528–2532
work page 2023
-
[15]
Two-step knowledge distillation for tiny speech enhancement,
R. D. Nathoo, M. Kegler, and M. Stamenovic, “Two-step knowledge distillation for tiny speech enhancement,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 10 141–10 145
work page 2024
-
[16]
Residual fusion probabilistic knowledge distillation for speech en- hancement,
J. Cheng, R. Liang, L. Zhou, L. Zhao, C. Huang, and B. W. Schuller, “Residual fusion probabilistic knowledge distillation for speech en- hancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2680–2691, 2024
work page 2024
-
[17]
J. Cheng, C. Pang, R. Liang, J. Fan, and L. Zhao, “Dual-path dilated convolutional recurrent network with group attention for multi-channel speech enhancement,” in ICASSP 2023-2023 IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2023, pp. 1–2
work page 2023
-
[18]
Distilling the Knowledge in a Neural Network
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531 , 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[19]
X. Niu, J. Gu, G. Zhang, P. Wan, and Z. Wang, “Learning an inference- accelerated network from a pre-trained model with frequency-enhanced feature distillation,” in Proceedings of the 30th ACM International Conference on Multimedia , 2022, pp. 1847–1856
work page 2022
-
[20]
Cross- image relational knowledge distillation for semantic segmentation,
C. Yang, H. Zhou, Z. An, X. Jiang, Y . Xu, and Q. Zhang, “Cross- image relational knowledge distillation for semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12 319–12 328
work page 2022
-
[21]
Distilling knowledge via knowledge review,
P. Chen, S. Liu, H. Zhao, and J. Jia, “Distilling knowledge via knowledge review,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2021, pp. 5008–5017
work page 2021
-
[22]
D. Zhao, B. Yuan, and Z. Shi, “Inherit with distillation and evolve with contrast: Exploring class incremental semantic segmentation without exemplar memory,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 11 932–11 947, 2023
work page 2023
-
[23]
Knowledge distillation from transformers for low-complexity acoustic scene clas- sification
F. Schmid, S. Masoudian, K. Koutini, and G. Widmer, “Knowledge distillation from transformers for low-complexity acoustic scene clas- sification.” in DCASE, 2022
work page 2022
-
[24]
Audio-visual representa- tion learning via knowledge distillation from speech foundation models,
J.-X. Zhang, G. Wan, J. Gao, and Z.-H. Ling, “Audio-visual representa- tion learning via knowledge distillation from speech foundation models,” Pattern Recognition, p. 111432, 2025
work page 2025
-
[25]
Sub-band knowledge distillation framework for speech enhancement,
X. Hao, S. Wen, X. Su, Y . Liu, G. Gao, and X. Li, “Sub-band knowledge distillation framework for speech enhancement,” in Interspeech 2020 , 2020, pp. 2687–2691
work page 2020
-
[26]
Cross-layer similarity knowledge distillation for speech enhancement
J. Cheng, R. Liang, Y . Xie, L. Zhao, B. Schuller, J. Jia, and Y . Peng, “Cross-layer similarity knowledge distillation for speech enhancement.” in INTERSPEECH, 2022, pp. 926–930
work page 2022
-
[27]
Multi-view attention transfer for efficient speech enhancement,
W. Shin, H. J. Park, J. S. Kim, B. H. Lee, and S. W. Han, “Multi-view attention transfer for efficient speech enhancement,” inInterspeech 2022, 2022, pp. 1198–1202
work page 2022
-
[28]
Dynamic frequency-adaptive knowledge distillation for speech enhancement,
X. Yuan, S. Liu, H. Chen, L. Zhou, J. Li, and J. Hu, “Dynamic frequency-adaptive knowledge distillation for speech enhancement,” in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2025, pp. 1–5
work page 2025
-
[29]
R. Han, W. Xu, Z. Zhang, M. Liu, and L. Xie, “Distil-dccrn: A small- footprint dccrn leveraging feature-based knowledge distillation in speech enhancement,” IEEE Signal Processing Letters , 2024
work page 2024
-
[30]
Semckd: Semantic calibration for cross-layer knowledge distillation,
C. Wang, D. Chen, J.-P. Mei, Y . Zhang, Y . Feng, and C. Chen, “Semckd: Semantic calibration for cross-layer knowledge distillation,” IEEE Transactions on Knowledge and Data Engineering , vol. 35, no. 6, pp. 6305–6319, 2022
work page 2022
-
[31]
Real time speech enhancement in the waveform domain,
A. D ´efossez, G. Synnaeve, and Y . Adi, “Real time speech enhancement in the waveform domain,” in Interspeech 2020 , 2020, pp. 3291–3295
work page 2020
-
[32]
Teacher-student learn- ing for low-latency online speech enhancement using wave-u-net,
S. Nakaoka, L. Li, S. Inoue, and S. Makino, “Teacher-student learn- ing for low-latency online speech enhancement using wave-u-net,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2021, pp. 661–665
work page 2021
-
[33]
FitNets: Hints for Thin Deep Nets
A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y . Ben- gio, “Fitnets: Hints for thin deep nets,” arXiv preprint arXiv:1412.6550 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[34]
A hybrid dsp/deep learning approach to real-time full- band speech enhancement,
J.-M. Valin, “A hybrid dsp/deep learning approach to real-time full- band speech enhancement,” in 2018 IEEE 20th international workshop on multimedia signal processing (MMSP) . IEEE, 2018, pp. 1–5
work page 2018
-
[35]
Weighted speech distortion losses for neural-network-based real-time speech enhancement,
Y . Xia, S. Braun, C. K. Reddy, H. Dubey, R. Cutler, and I. Tashev, “Weighted speech distortion losses for neural-network-based real-time speech enhancement,” in ICASSP 2020-2020 IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2020, pp. 871–875
work page 2020
-
[36]
Dual-signal transformation lstm network for real-time noise suppression,
N. L. Westhausen and B. T. Meyer, “Dual-signal transformation lstm network for real-time noise suppression,” in Interspeech 2020 , 2020, pp. 2477–2481
work page 2020
-
[37]
Dccrn: Deep complex convolution recurrent network for phase- aware speech enhancement,
Y . Hu, Y . Liu, S. Lv, M. Xing, S. Zhang, Y . Fu, J. Wu, B. Zhang, and L. Xie, “Dccrn: Deep complex convolution recurrent network for phase- aware speech enhancement,” in Interspeech 2020, 2020, pp. 2472–2476
work page 2020
-
[38]
Fullsubnet+: Channel attention fullsubnet with complex spectrograms for speech enhancement,
J. Chen, Z. Wang, D. Tuo, Z. Wu, S. Kang, and H. Meng, “Fullsubnet+: Channel attention fullsubnet with complex spectrograms for speech enhancement,” in ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) . IEEE, 2022, pp. 7857–7861
work page 2022
-
[39]
A. Li, W. Liu, C. Zheng, C. Fan, and X. Li, “Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1829–1843, 2021
work page 2021
-
[40]
Z. Lin, J. Wang, R. Li, F. Shen, and X. Xuan, “Primek-net: Multi-scale spectral learning via group prime-kernel convolutional neural networks for single channel speech enhancement,” in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025, pp. 1–5
work page 2025
-
[41]
A neural beamforming network for b-format 3d speech enhance- ment and recognition,
X. Ren, L. Chen, X. Zheng, C. Xu, X. Zhang, C. Zhang, L. Guo, and B. Yu, “A neural beamforming network for b-format 3d speech enhance- ment and recognition,” in 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) . IEEE, 2021, pp. 1–6
work page 2021
-
[42]
Embedding and beamforming: All-neural causal beamformer for multichannel speech enhancement,
A. Li, W. Liu, C. Zheng, and X. Li, “Embedding and beamforming: All-neural causal beamformer for multichannel speech enhancement,” in ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) . IEEE, 2022, pp. 6487–6491
work page 2022
-
[43]
Stream attention based u-net for l3das23 challenge,
H. Wang, Y . Fu, J. Li, M. Ge, L. Wang, and X. Qian, “Stream attention based u-net for l3das23 challenge,” inICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–2
work page 2023
-
[44]
Deft-an: Dense frequency-time attentive net- work for multichannel speech enhancement,
D. Lee and J.-W. Choi, “Deft-an: Dense frequency-time attentive net- work for multichannel speech enhancement,” IEEE Signal Processing Letters, vol. 30, pp. 155–159, 2023
work page 2023
-
[45]
C. Quan and X. Li, “Spatialnet: Extensively learning spatial information for multichannel joint speech separation, denoising and dereverberation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 32, pp. 1310–1323, 2024
work page 2024
-
[46]
A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Per- ceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs,” in 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221) , vol. 2. IEEE, 2001, pp. 749–752
work page 2001
-
[47]
An algorithm for intelligibility prediction of time–frequency weighted noisy speech,
C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXXX XXXX 14 IEEE Transactions on audio, speech, and language processing , vol. 19, no. 7, pp. 2125–2136, 2011
work page 2011
-
[48]
J. Le Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “Sdr–half-baked or well done?” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2019, pp. 626–630
work page 2019
-
[49]
Evaluation of objective quality measures for speech enhancement,
Y . Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Transactions on audio, speech, and language processing, vol. 16, no. 1, pp. 229–238, 2007
work page 2007
-
[50]
wav2vec 2.0: A framework for self-supervised learning of speech representations,
A. Baevski, Y . Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in neural information processing systems , vol. 33, pp. 12 449– 12 460, 2020
work page 2020
-
[51]
Statistical methods for research workers,
R. A. Fisher, “Statistical methods for research workers,” in Break- throughs in statistics: Methodology and distribution . Springer, 1970, pp. 66–70. Jiaming Cheng received the PhD degree from Southeast University, Nanjing, China, in 2024. He is currently a Lecturer with the School of Com- puter Science, Nanjing Audit University, Nanjing, China. His resea...
work page 1970
-
[52]
His research interests include big data technology and artificial intelligence
He is currently a Professor with the School of Computer Science, Nanjing Audit University, Nanjing, China. His research interests include big data technology and artificial intelligence. Ye Ni received the M.S. degree from Nanjing University, Nanjing, China, in 2022. He is cur- rently working toward a PhD degree from Southeast University, Nanjing, China. ...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.