pith. sign in

arXiv preprint arXiv:2501.16327 , year=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.CL 3

years

2026 1 2025 2

verdicts

UNVERDICTED 3

representative citing papers

Step-Audio 2 Technical Report

cs.CL · 2025-07-22 · unverdicted · novelty 6.0

Step-Audio 2 integrates a latent audio encoder, reasoning-centric reinforcement learning, and discrete audio token generation into language modeling to deliver state-of-the-art performance on audio understanding and conversational benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

  • VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing cs.CL · 2026-05-07 · unverdicted · none · ref 99

    VITA-QinYu is the first expressive end-to-end spoken language model supporting role-playing and singing alongside conversation, trained on 15.8K hours of data and outperforming prior models on expressiveness and conversational benchmarks.

  • Step-Audio 2 Technical Report cs.CL · 2025-07-22 · unverdicted · none · ref 22

    Step-Audio 2 integrates a latent audio encoder, reasoning-centric reinforcement learning, and discrete audio token generation into language modeling to deliver state-of-the-art performance on audio understanding and conversational benchmarks.

  • Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction cs.CL · 2025-02-17 · unverdicted · none · ref 45

    Step-Audio introduces a 130B-parameter unified speech-text model with open-sourced components for understanding, generation, affordable voice cloning, and dynamic control, claiming SOTA human evaluation results on a new benchmark.