Fine- grained open domain image animation with motion guidance

Zuozhuo Dai, Zhenghao Zhang, Yao Yao, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang · 2023 · arXiv 2311.12886

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

Substantial, Decomposable, and Invisible: Visual Context Misalignment in Instructional Videos for Physical Tasks

cs.HC · 2026-05-16 · conditional · novelty 6.0

Fully aligned instructional videos for physical tasks yield 11.1% better completion quality and 15.5% faster times, with four decomposable visual attributes whose isolated misalignments degrade performance without users noticing.

Show-o2: Improved Native Unified Multimodal Models

cs.CV · 2025-06-18 · unverdicted · novelty 4.0

Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.

Evolution of Video Generative Foundations

cs.CV · 2026-04-07 · unverdicted · novelty 2.0

This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

citing papers explorer

Showing 3 of 3 citing papers.

Substantial, Decomposable, and Invisible: Visual Context Misalignment in Instructional Videos for Physical Tasks cs.HC · 2026-05-16 · conditional · none · ref 7
Fully aligned instructional videos for physical tasks yield 11.1% better completion quality and 15.5% faster times, with four decomposable visual attributes whose isolated misalignments degrade performance without users noticing.
Show-o2: Improved Native Unified Multimodal Models cs.CV · 2025-06-18 · unverdicted · none · ref 31
Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.
Evolution of Video Generative Foundations cs.CV · 2026-04-07 · unverdicted · none · ref 226
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

Fine- grained open domain image animation with motion guidance

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer