V2a-mapper: A lightweight solution for vision-to-audio generation by connecting foun- dation models

Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, Weidong Cai · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text

cs.SD · 2026-04-06 · unverdicted · novelty 7.0

OmniSonic introduces a TriAttn-DiT architecture with MoE gating to jointly generate on-screen, off-screen, and speech audio from video and text, outperforming prior models on a new UniHAGen-Bench.

citing papers explorer

Showing 1 of 1 citing paper.

OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text cs.SD · 2026-04-06 · unverdicted · none · ref 45
OmniSonic introduces a TriAttn-DiT architecture with MoE gating to jointly generate on-screen, off-screen, and speech audio from video and text, outperforming prior models on a new UniHAGen-Bench.

V2a-mapper: A lightweight solution for vision-to-audio generation by connecting foun- dation models

fields

years

verdicts

representative citing papers

citing papers explorer