Son- icvisionlm: Playing sound with vision language models

Zhifeng Xie, Shengye Yu, Qile He, Mengtian Li · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text

cs.SD · 2026-04-06 · unverdicted · novelty 7.0

OmniSonic introduces a TriAttn-DiT architecture with MoE gating to jointly generate on-screen, off-screen, and speech audio from video and text, outperforming prior models on a new UniHAGen-Bench.

citing papers explorer

Showing 1 of 1 citing paper.

OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text cs.SD · 2026-04-06 · unverdicted · none · ref 48
OmniSonic introduces a TriAttn-DiT architecture with MoE gating to jointly generate on-screen, off-screen, and speech audio from video and text, outperforming prior models on a new UniHAGen-Bench.

Son- icvisionlm: Playing sound with vision language models

fields

years

verdicts

representative citing papers

citing papers explorer