OmniSonic introduces a TriAttn-DiT architecture with MoE gating to jointly generate on-screen, off-screen, and speech audio from video and text, outperforming prior models on a new UniHAGen-Bench.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
FlowLPS perturbs flow-model estimates with Langevin steps then applies proximal refinement to balance fidelity and perceptual quality on linear inverse problems.
citing papers explorer
-
OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text
OmniSonic introduces a TriAttn-DiT architecture with MoE gating to jointly generate on-screen, off-screen, and speech audio from video and text, outperforming prior models on a new UniHAGen-Bench.
-
FlowLPS: Langevin-Proximal Sampling for Flow-based Inverse Problem Solvers
FlowLPS perturbs flow-model estimates with Langevin steps then applies proximal refinement to balance fidelity and perceptual quality on linear inverse problems.