Title resolution pending

Xu Tan, Tao Qin, Frank K · 2021 · arXiv 2106.15561

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages

eess.AS · 2026-04-21 · unverdicted · novelty 7.0

Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

cs.CL · 2023-01-05 · unverdicted · novelty 7.0

VALL-E is a neural codec language model trained on 60K hours of speech that performs zero-shot TTS, synthesizing natural speech that matches an unseen speaker's voice, emotion, and environment from a 3-second prompt.

Asymmetric Phase Coding Audio Watermarking

cs.CR · 2026-05-08 · unverdicted · novelty 6.0

APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.

Evaluating Generalization and Robustness in Russian Anti-Spoofing: The RuASD Initiative

cs.SD · 2026-03-31 · accept · novelty 6.0

RuASD is a comprehensive Russian speech anti-spoofing dataset featuring 37 synthesis systems and a robustness evaluation pipeline for real-world channel distortions.

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

eess.AS · 2026-04-28 · unverdicted · novelty 4.0

A system based on OmniVoice with multi-model ensemble distillation for fine-tuning shows consistent gains in intelligibility metrics while keeping speaker similarity for cross-lingual scientific speech.

XR-CareerAssist: An Immersive Platform for Personalised Career Guidance Leveraging Extended Reality and Multimodal AI

cs.CE · 2026-04-08 · unverdicted · novelty 4.0 · 2 refs

XR-CareerAssist fuses XR and five AI modules into a Unity-based immersive platform for multilingual, personalized career guidance via 3D avatars and dynamic Sankey diagrams, reporting 78.3% user satisfaction in a 23-person pilot.

Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment

eess.AS · 2026-04-21 · unverdicted · novelty 3.0

Voice range indicates TTS model capability with VITS highest, Glow-TTS best at soft phonation, and CPPs of 7-8 dB marking natural quality while values over 10 dB sound robotic.

citing papers explorer

Showing 7 of 7 citing papers.

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages eess.AS · 2026-04-21 · unverdicted · none · ref 83
Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers cs.CL · 2023-01-05 · unverdicted · none · ref 17
VALL-E is a neural codec language model trained on 60K hours of speech that performs zero-shot TTS, synthesizing natural speech that matches an unseen speaker's voice, emotion, and environment from a 3-second prompt.
Asymmetric Phase Coding Audio Watermarking cs.CR · 2026-05-08 · unverdicted · none · ref 2
APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.
Evaluating Generalization and Robustness in Russian Anti-Spoofing: The RuASD Initiative cs.SD · 2026-03-31 · accept · none · ref 1
RuASD is a comprehensive Russian speech anti-spoofing dataset featuring 37 synthesis systems and a robustness evaluation pipeline for real-world channel distortions.
One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech eess.AS · 2026-04-28 · unverdicted · none · ref 8
A system based on OmniVoice with multi-model ensemble distillation for fine-tuning shows consistent gains in intelligibility metrics while keeping speaker similarity for cross-lingual scientific speech.
XR-CareerAssist: An Immersive Platform for Personalised Career Guidance Leveraging Extended Reality and Multimodal AI cs.CE · 2026-04-08 · unverdicted · none · ref 18 · 2 links
XR-CareerAssist fuses XR and five AI modules into a Unity-based immersive platform for multilingual, personalized career guidance via 3D avatars and dynamic Sankey diagrams, reporting 78.3% user satisfaction in a 23-person pilot.
Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment eess.AS · 2026-04-21 · unverdicted · none · ref 44
Voice range indicates TTS model capability with VITS highest, Glow-TTS best at soft phonation, and CPPs of 7-8 dB marking natural quality while values over 10 dB sound robotic.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer