S2ST-Omni 2 uses typology-informed hierarchical encoding, gated Dual-CTC, and typology-aware prompting to improve multilingual S2ST over flat-label baselines on CVSS-C, with gains in low-data regimes.
Robust speech recognition via large-scale weak supervi- sion,
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
HQTN-SER combines a low-parameter quantum tensor network module with classical latent embeddings to reach 73-80% accuracy on three speech emotion datasets while using few qubits and showing stable training.
UniPASE extends the PASE framework with DeWavLM-Omni to convert degraded speech into high-fidelity, low-hallucination audio across sampling rates via phonetic enhancement, acoustic adaptation, and multi-rate vocoding.
citing papers explorer
-
From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation
S2ST-Omni 2 uses typology-informed hierarchical encoding, gated Dual-CTC, and typology-aware prompting to improve multilingual S2ST over flat-label baselines on CVSS-C, with gains in low-data regimes.
-
HQTN-SER: Speech Emotion Recognition with Hybrid Quantum Tensor Networks
HQTN-SER combines a low-parameter quantum tensor network module with classical latent embeddings to reach 73-80% accuracy on three speech emotion datasets while using few qubits and showing stable training.
-
UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations
UniPASE extends the PASE framework with DeWavLM-Omni to convert degraded speech into high-fidelity, low-hallucination audio across sampling rates via phonetic enhancement, acoustic adaptation, and multi-rate vocoding.