Title resolution pending

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever · 2023

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

cs.CL · 2025-12-18 · unverdicted · novelty 7.0

Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.

VoiceBench: Benchmarking LLM-Based Voice Assistants

cs.CL · 2024-10-22 · unverdicted · novelty 7.0

VoiceBench is the first benchmark for multi-faceted evaluation of LLM voice assistants using real and synthetic spoken instructions with speaker, environmental, and content variations.

RTCFake: Speech Deepfake Detection in Real-Time Communication

cs.SD · 2026-04-26 · unverdicted · novelty 6.0

RTCFake is the first large-scale dataset of real-time communication speech deepfakes paired with offline versions, paired with a phoneme-guided consistency learning method that improves cross-platform and noise-robust detection.

Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps

cs.CL · 2026-04-21 · unverdicted · novelty 5.0

Four attention metrics enable logistic regression classifiers that detect hallucinations in SpeechLLMs with up to +0.23 PR-AUC gains over baselines on ASR and translation tasks.

ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing

cs.SD · 2026-04-13 · unverdicted · novelty 5.0

ActorMind is a four-agent chain-of-thought framework that emulates human actors to produce spontaneous, emotion-infused speech responses for role-playing scenarios.

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

eess.AS · 2024-10-09 · unverdicted · novelty 5.0

F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.

citing papers explorer

Showing 6 of 6 citing papers.

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs cs.CL · 2025-12-18 · unverdicted · none · ref 74
Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.
VoiceBench: Benchmarking LLM-Based Voice Assistants cs.CL · 2024-10-22 · unverdicted · none · ref 91
VoiceBench is the first benchmark for multi-faceted evaluation of LLM voice assistants using real and synthetic spoken instructions with speaker, environmental, and content variations.
RTCFake: Speech Deepfake Detection in Real-Time Communication cs.SD · 2026-04-26 · unverdicted · none · ref 20
RTCFake is the first large-scale dataset of real-time communication speech deepfakes paired with offline versions, paired with a phoneme-guided consistency learning method that improves cross-platform and noise-robust detection.
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps cs.CL · 2026-04-21 · unverdicted · none · ref 21
Four attention metrics enable logistic regression classifiers that detect hallucinations in SpeechLLMs with up to +0.23 PR-AUC gains over baselines on ASR and translation tasks.
ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing cs.SD · 2026-04-13 · unverdicted · none · ref 36
ActorMind is a four-agent chain-of-thought framework that emulates human actors to produce spontaneous, emotion-infused speech responses for role-playing scenarios.
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching eess.AS · 2024-10-09 · unverdicted · none · ref 133
F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer