pith. sign in

HO-Flow: Generalizable Hand-Object Interaction Generation with Latent Flow Matching

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Generating realistic 3D hand-object interactions (HOI) is a fundamental challenge in computer vision and robotics, requiring both temporal coherence and high-fidelity physical plausibility. Existing methods remain limited in their ability to learn expressive motion representations for generation and perform temporal reasoning. In this paper, we present HO-Flow, a framework for synthesizing realistic hand-object motion sequences from texts and canoncial 3D objects. HO-Flow first employs an interaction-aware variational autoencoder to encode sequences of hand and object motions into a unified latent manifold by incorporating hand and object kinematics, enabling the representation to capture rich interaction dynamics. It then leverages a masked flow matching model that combines auto-regressive temporal reasoning with continuous latent generation, improving temporal coherence. To further enhance generalization, HO-Flow predicts object motions relative to the initial frame, enabling effective pre-training on large-scale synthetic data. Experiments on the GRAB, OakInk, and DexYCB benchmarks demonstrate that HO-Flow achieves state-of-the-art performance in both physical plausibility and motion diversity for interaction motion synthesis.

fields

cs.CV 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Dex2HOI: Dexterous Bimanual Two-Object Interaction Generation cs.CV · 2026-05-28 · unverdicted · none · ref 7 · internal anchor

    Dex2HOI is a dual-stream diffusion model with bidirectional cross-attention and motion fusion that generates long bimanual single- and two-object HOI sequences from text at real-time speeds.