Mmau- dio: Taming multimodal joint training for high-quality video- to-audio synthesis,

Ho Kei Cheng, Masato Ishii, Akio Hayakawa, Takashi Shibuya, Alexander G · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

StereoFoley: Object-Aware Stereo Audio Generation from Video

cs.SD · 2025-09-22 · conditional · novelty 7.0

StereoFoley is an end-to-end video-to-stereo-audio framework that uses a base generative model fine-tuned on synthetic object-tracked data with panning and distance controls to achieve object-aware spatial sound.

citing papers explorer

Showing 1 of 1 citing paper.

StereoFoley: Object-Aware Stereo Audio Generation from Video cs.SD · 2025-09-22 · conditional · none · ref 13
StereoFoley is an end-to-end video-to-stereo-audio framework that uses a base generative model fine-tuned on synthetic object-tracked data with panning and distance controls to achieve object-aware spatial sound.

Mmau- dio: Taming multimodal joint training for high-quality video- to-audio synthesis,

fields

years

verdicts

representative citing papers

citing papers explorer