Temporally aligned audio for video with autoregression

Ilpo Viertola, Vladimir Iashin, Esa Rahtu · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

WavFlow: Audio Generation in Waveform Space

cs.SD · 2026-05-18 · conditional · novelty 6.0

WavFlow performs direct waveform audio generation via flow matching on 2D token grids from raw patches plus amplitude lifting, matching latent-based methods on VGGSound and AudioCaps without intermediate compression.

Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models

cs.CV · 2026-02-24 · unverdicted · novelty 6.0

MMHNet enables video-to-audio models trained on short clips to generalize and generate audio for videos over 5 minutes long.

citing papers explorer

Showing 2 of 2 citing papers.

WavFlow: Audio Generation in Waveform Space cs.SD · 2026-05-18 · conditional · none · ref 22
WavFlow performs direct waveform audio generation via flow matching on 2D token grids from raw patches plus amplitude lifting, matching latent-based methods on VGGSound and AudioCaps without intermediate compression.
Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models cs.CV · 2026-02-24 · unverdicted · none · ref 44
MMHNet enables video-to-audio models trained on short clips to generalize and generate audio for videos over 5 minutes long.

Temporally aligned audio for video with autoregression

fields

years

verdicts

representative citing papers

citing papers explorer