Titan-guide: Taming inference-time alignment for guided text-to-video diffusion models

Christian Simon, Masato Ishii, Akio Hayakawa, Zhi Zhong, Shusuke Takahashi, Takashi Shibuya, Yuki Mitsufuji · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CV · 2026-02-24 · unverdicted · novelty 6.0

MMHNet enables video-to-audio models trained on short clips to generalize and generate audio for videos over 5 minutes long.

Showing 1 of 1 citing paper.

Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models cs.CV · 2026-02-24 · unverdicted · none · ref 38
MMHNet enables video-to-audio models trained on short clips to generalize and generate audio for videos over 5 minutes long.