Ze- rohsi: Zero-shot 4d human-scene interaction by video gen- eration

Hongjie Li, Hong-Xing Yu, Jiaman Li, Jiajun Wu · 2024 · arXiv 2412.18600

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

InfBaGel generates consistent human-object-scene interactions via dynamic perception during iterative refinement in a consistency model, bump-aware guidance to avoid collisions, and hybrid training that mixes synthesized pseudo-samples with real HSI data.

GenHSI: Controllable Generation of Human-Scene Interaction Videos

cs.CV · 2025-06-24 · unverdicted · novelty 7.0

GenHSI is a training-free three-stage pipeline that turns a scene image, character image, and complex HSI prompt into long videos with plausible chained interactions by generating atomic actions, 3D keyframes via 2D inpainting plus optimization, and then feeding them to pre-trained video diffusion.

VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification

cs.CV · 2025-12-10 · unverdicted · novelty 6.0

VHOI densifies sparse trajectories into color-encoded HOI mask sequences and conditions a fine-tuned video diffusion model on them to produce controllable human-object interaction videos, including full navigation sequences.

Prompt-to-Gesture: Measuring the Capabilities of Image-to-Video Deictic Gesture Generation

cs.CV · 2026-04-16 · unverdicted · novelty 5.0

Prompt-driven image-to-video generation produces deictic gestures that match real data visually, add useful variety, and improve downstream recognition models when mixed with human recordings.

citing papers explorer

Showing 4 of 4 citing papers.

InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement cs.CV · 2026-04-06 · unverdicted · none · ref 7
InfBaGel generates consistent human-object-scene interactions via dynamic perception during iterative refinement in a consistency model, bump-aware guidance to avoid collisions, and hybrid training that mixes synthesized pseudo-samples with real HSI data.
GenHSI: Controllable Generation of Human-Scene Interaction Videos cs.CV · 2025-06-24 · unverdicted · none · ref 48
GenHSI is a training-free three-stage pipeline that turns a scene image, character image, and complex HSI prompt into long videos with plausible chained interactions by generating atomic actions, 3D keyframes via 2D inpainting plus optimization, and then feeding them to pre-trained video diffusion.
VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification cs.CV · 2025-12-10 · unverdicted · none · ref 40
VHOI densifies sparse trajectories into color-encoded HOI mask sequences and conditions a fine-tuned video diffusion model on them to produce controllable human-object interaction videos, including full navigation sequences.
Prompt-to-Gesture: Measuring the Capabilities of Image-to-Video Deictic Gesture Generation cs.CV · 2026-04-16 · unverdicted · none · ref 26
Prompt-driven image-to-video generation produces deictic gestures that match real data visually, add useful variety, and improve downstream recognition models when mixed with human recordings.

Ze- rohsi: Zero-shot 4d human-scene interaction by video gen- eration

fields

years

verdicts

representative citing papers

citing papers explorer