pith. sign in

Mmau- dio: Taming multimodal joint training for high-quality video- to-audio synthesis,

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.SD 1

years

2025 1

verdicts

CONDITIONAL 1

representative citing papers

StereoFoley: Object-Aware Stereo Audio Generation from Video

cs.SD · 2025-09-22 · conditional · novelty 7.0

StereoFoley is an end-to-end video-to-stereo-audio framework that uses a base generative model fine-tuned on synthetic object-tracked data with panning and distance controls to achieve object-aware spatial sound.

citing papers explorer

Showing 1 of 1 citing paper.

  • StereoFoley: Object-Aware Stereo Audio Generation from Video cs.SD · 2025-09-22 · conditional · none · ref 13

    StereoFoley is an end-to-end video-to-stereo-audio framework that uses a base generative model fine-tuned on synthetic object-tracked data with panning and distance controls to achieve object-aware spatial sound.