Mosnet: Deep Learning Based Objective Assessment for Voice Conversion

Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang · 1904 · arXiv 1904.08352

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Neural networks for Text-to-Speech evaluation

cs.CL · 2026-03-17 · conditional · novelty 6.0

NeuralSBS reaches 73.7% accuracy on side-by-side TTS comparisons and enhanced MOS models reach RMSE 0.40, beating the human inter-rater baseline of 0.62.

Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment

eess.AS · 2026-04-21 · unverdicted · novelty 3.0

Voice range indicates TTS model capability with VITS highest, Glow-TTS best at soft phonation, and CPPs of 7-8 dB marking natural quality while values over 10 dB sound robotic.

A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models

eess.AS · 2026-05-15 · unverdicted · novelty 2.0

A structured survey of audio bandwidth extension that organizes the transition from deterministic discriminative DNNs to generative approaches including GANs, diffusion models, and flow-based methods.

citing papers explorer

Showing 3 of 3 citing papers.

Neural networks for Text-to-Speech evaluation cs.CL · 2026-03-17 · conditional · none · ref 4
NeuralSBS reaches 73.7% accuracy on side-by-side TTS comparisons and enhanced MOS models reach RMSE 0.40, beating the human inter-rater baseline of 0.62.
Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment eess.AS · 2026-04-21 · unverdicted · none · ref 12
Voice range indicates TTS model capability with VITS highest, Glow-TTS best at soft phonation, and CPPs of 7-8 dB marking natural quality while values over 10 dB sound robotic.
A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models eess.AS · 2026-05-15 · unverdicted · none · ref 41
A structured survey of audio bandwidth extension that organizes the transition from deterministic discriminative DNNs to generative approaches including GANs, diffusion models, and flow-based methods.

Mosnet: Deep Learning Based Objective Assessment for Voice Conversion

fields

years

verdicts

representative citing papers

citing papers explorer