FoleyDesigner generates spatio-temporally aligned stereo Foley audio for film clips via multi-agent analysis, diffusion models on video cues, and LLM mixing, supported by the new FilmStereo dataset.
Omniaudio: Generating spatial audio from 360-degree video
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
StereoFoley is an end-to-end video-to-stereo-audio framework that uses a base generative model fine-tuned on synthetic object-tracked data with panning and distance controls to achieve object-aware spatial sound.
SwanSphere introduces a causal autoregressive diffusion transformer architecture with SVAC contrastive learning and ODPO optimization for streaming spatial audio generation from video and text.
citing papers explorer
-
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer
SwanSphere introduces a causal autoregressive diffusion transformer architecture with SVAC contrastive learning and ODPO optimization for streaming spatial audio generation from video and text.