pith. sign in

hub Canonical reference

Mimo-audio: Audio language models are few-shot learners

Canonical reference. 75% of citing Pith papers cite this work as background.

15 Pith papers citing it
Background 75% of classified citations

hub tools

citation-role summary

background 6 baseline 2

citation-polarity summary

years

2026 15

verdicts

UNVERDICTED 15

representative citing papers

TiCo: Time-Controllable Spoken Dialogue Model

cs.CL · 2026-03-23 · unverdicted · novelty 7.0

TiCo enables spoken dialogue models to follow explicit time constraints in generated responses using Spoken Time Markers and reinforcement learning with verifiable rewards, cutting duration error by 2.7x over its backbone.

Qwen3.5-Omni Technical Report

cs.CL · 2026-04-17 · unverdicted · novelty 5.0

Qwen3.5-Omni scales an omnimodal model to hundreds of billions of parameters with 256k context, introduces ARIA for stable speech synthesis, and reports SOTA performance on 215 audio-visual benchmarks while adding multilingual and audio-visual coding capabilities.

Step-Audio-R1.5 Technical Report

eess.AS · 2026-04-28 · unverdicted · novelty 4.0

Step-Audio-R1.5 applies RLHF to audio reasoning models to maintain analytical performance while improving prosodic naturalness and immersion in extended spoken interactions.

citing papers explorer

Showing 15 of 15 citing papers.