Robust speech recognition via large-scale weak supervision, 2022

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever · 2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

MiniCPM-o 4.5 uses the Omni-Flow streaming framework to deliver real-time full-duplex omni-modal interaction with proactive behavior in a 9B model that approaches Gemini 2.5 Flash performance.

3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars

cs.CV · 2026-02-11 · unverdicted · novelty 6.0

3DXTalker unifies identity modeling, lip synchronization, emotional expression, and head-pose dynamics in audio-driven 3D avatars via 2D-to-3D curation, amplitude/emotion audio cues, and a flow-matching transformer with prompt control.

WhisperRT -- Turning Whisper into a Causal Streaming Model

cs.CL · 2025-08-17 · conditional · novelty 6.0

WhisperRT converts Whisper to a causal streaming ASR model via encoder causality, decoder synchronization on partial states, and fine-tuning, achieving better performance than non-fine-tuned streaming methods on sub-300ms chunks with lower complexity.

citing papers explorer

Showing 3 of 3 citing papers.

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction cs.CL · 2026-04-30 · unverdicted · none · ref 20
MiniCPM-o 4.5 uses the Omni-Flow streaming framework to deliver real-time full-duplex omni-modal interaction with proactive behavior in a 9B model that approaches Gemini 2.5 Flash performance.
3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars cs.CV · 2026-02-11 · unverdicted · none · ref 57
3DXTalker unifies identity modeling, lip synchronization, emotional expression, and head-pose dynamics in audio-driven 3D avatars via 2D-to-3D curation, amplitude/emotion audio cues, and a flow-matching transformer with prompt control.
WhisperRT -- Turning Whisper into a Causal Streaming Model cs.CL · 2025-08-17 · conditional · none · ref 29
WhisperRT converts Whisper to a causal streaming ASR model via encoder causality, decoder synchronization on partial states, and fine-tuning, achieving better performance than non-fine-tuned streaming methods on sub-300ms chunks with lower complexity.

Robust speech recognition via large-scale weak supervision, 2022

fields

years

verdicts

representative citing papers

citing papers explorer