OmniSonic introduces a TriAttn-DiT architecture with MoE gating to jointly generate on-screen, off-screen, and speech audio from video and text, outperforming prior models on a new UniHAGen-Bench.
Score-based generative modeling through stochastic differential equa- tions
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
LocalDPO aligns text-to-video diffusion models with human preferences at the spatio-temporal region level by automatically generating localized preference pairs from corrupted real videos and applying a region-aware DPO loss.
RelativeFlow reformulates flow matching into relative noisier-to-noisy mappings via consistent transport and simulation-based velocity fields to outperform prior methods on CT and MR denoising with noisy references.
citing papers explorer
-
OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text
OmniSonic introduces a TriAttn-DiT architecture with MoE gating to jointly generate on-screen, off-screen, and speech audio from video and text, outperforming prior models on a new UniHAGen-Bench.
-
Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models
LocalDPO aligns text-to-video diffusion models with human preferences at the spatio-temporal region level by automatically generating localized preference pairs from corrupted real videos and applying a region-aware DPO loss.
-
RelativeFlow: Taming Medical Image Denoising Learning with Noisy Reference
RelativeFlow reformulates flow matching into relative noisier-to-noisy mappings via consistent transport and simulation-based velocity fields to outperform prior methods on CT and MR denoising with noisy references.