Omniaudio: Generating spatial audio from 360-degree video

Liu, H · 2025 · arXiv 2504.14906

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

FoleyDesigner: Immersive Stereo Foley Generation with Precise Spatio-Temporal Alignment for Film Clips

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

FoleyDesigner generates spatio-temporally aligned stereo Foley audio for film clips via multi-agent analysis, diffusion models on video cues, and LLM mixing, supported by the new FilmStereo dataset.

StereoFoley: Object-Aware Stereo Audio Generation from Video

cs.SD · 2025-09-22 · conditional · novelty 7.0

StereoFoley is an end-to-end video-to-stereo-audio framework that uses a base generative model fine-tuned on synthetic object-tracked data with panning and distance controls to achieve object-aware spatial sound.

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

eess.AS · 2026-05-29 · unverdicted · novelty 4.0

SwanSphere introduces a causal autoregressive diffusion transformer architecture with SVAC contrastive learning and ODPO optimization for streaming spatial audio generation from video and text.

citing papers explorer

Showing 3 of 3 citing papers.

FoleyDesigner: Immersive Stereo Foley Generation with Precise Spatio-Temporal Alignment for Film Clips cs.CV · 2026-04-07 · unverdicted · none · ref 24
FoleyDesigner generates spatio-temporally aligned stereo Foley audio for film clips via multi-agent analysis, diffusion models on video cues, and LLM mixing, supported by the new FilmStereo dataset.
StereoFoley: Object-Aware Stereo Audio Generation from Video cs.SD · 2025-09-22 · conditional · none · ref 18
StereoFoley is an end-to-end video-to-stereo-audio framework that uses a base generative model fine-tuned on synthetic object-tracked data with panning and distance controls to achieve object-aware spatial sound.
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer eess.AS · 2026-05-29 · unverdicted · none · ref 30
SwanSphere introduces a causal autoregressive diffusion transformer architecture with SVAC contrastive learning and ODPO optimization for streaming spatial audio generation from video and text.

Omniaudio: Generating spatial audio from 360-degree video

fields

years

verdicts

representative citing papers

citing papers explorer