FoleyDesigner generates spatio-temporally aligned stereo Foley audio for film clips via multi-agent analysis, diffusion models on video cues, and LLM mixing, supported by the new FilmStereo dataset.
Frieren: Efficient video-to-audio generation network with rectified flow matching.Advances in Neural Information Processing Systems, 37:128118–128138, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
MMHNet enables video-to-audio models trained on short clips to generalize and generate audio for videos over 5 minutes long.
citing papers explorer
-
FoleyDesigner: Immersive Stereo Foley Generation with Precise Spatio-Temporal Alignment for Film Clips
FoleyDesigner generates spatio-temporally aligned stereo Foley audio for film clips via multi-agent analysis, diffusion models on video cues, and LLM mixing, supported by the new FilmStereo dataset.
-
Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models
MMHNet enables video-to-audio models trained on short clips to generalize and generate audio for videos over 5 minutes long.