VHOI densifies sparse trajectories into color-encoded HOI mask sequences and conditions a fine-tuned video diffusion model on them to produce controllable human-object interaction videos, including full navigation sequences.
Mask2iv: Interaction-centric video generation via mask tra- jectories
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.