Uniflow-audio: Unified flow matching for audio generation from omni-modalities.ArXiv, abs/2509.24391

Uniflow-audio: Unified flow matching for audio generation from omni-modalities , author= · 2025 · arXiv 2509.24391

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

VidAudio-Bench: Benchmarking V2A and VT2A Generation across Four Audio Categories

cs.SD · 2026-04-12 · unverdicted · novelty 7.0

VidAudio-Bench benchmarks V2A and VT2A models across four audio categories, revealing poor speech/singing performance and a tension between visual alignment and text instruction following.

Omni2Sound: Towards Unified Video-Text-to-Audio Generation

cs.SD · 2026-01-06 · unverdicted · novelty 7.0

A single DiT-based diffusion model unifies video-to-audio, text-to-audio, and joint video-text-to-audio generation, supported by a new 470k-pair dataset and three-stage progressive training that resolves task competition.

Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation

cs.SD · 2026-06-02 · unverdicted · novelty 6.0

Foley-Omni extends isolated audio synthesis to joint generation of full video soundtracks across speech, effects, and music, with a new V2ST-Bench for evaluation showing competitive single-task results and gains in mixed-track consistency.

Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text

cs.SD · 2026-05-27 · unverdicted · novelty 6.0

Dasheng AudioGen uses multi-view captions and a unified semantic-acoustic representation to enable end-to-end generation of mixed audio scenes from text descriptions.

citing papers explorer

Showing 4 of 4 citing papers after filters.

VidAudio-Bench: Benchmarking V2A and VT2A Generation across Four Audio Categories cs.SD · 2026-04-12 · unverdicted · none · ref 64
VidAudio-Bench benchmarks V2A and VT2A models across four audio categories, revealing poor speech/singing performance and a tension between visual alignment and text instruction following.
Omni2Sound: Towards Unified Video-Text-to-Audio Generation cs.SD · 2026-01-06 · unverdicted · none · ref 29
A single DiT-based diffusion model unifies video-to-audio, text-to-audio, and joint video-text-to-audio generation, supported by a new 470k-pair dataset and three-stage progressive training that resolves task competition.
Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation cs.SD · 2026-06-02 · unverdicted · none · ref 81
Foley-Omni extends isolated audio synthesis to joint generation of full video soundtracks across speech, effects, and music, with a new V2ST-Bench for evaluation showing competitive single-task results and gains in mixed-track consistency.
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text cs.SD · 2026-05-27 · unverdicted · none · ref 8
Dasheng AudioGen uses multi-view captions and a unified semantic-acoustic representation to enable end-to-end generation of mixed audio scenes from text descriptions.

Uniflow-audio: Unified flow matching for audio generation from omni-modalities.ArXiv, abs/2509.24391

fields

years

verdicts

representative citing papers

citing papers explorer