DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions

Zhenjia Xu , Jiajun Wu , Andy Zeng , Joshua B. Tenenbaum , Shuran Song

Authors on Pith no claims yet

classification 💻 cs.RO cs.AIcs.CVcs.LG

keywords objectphysicalrepresentationsdensedensephysnetmanipulationpropertiesdynamic

read the original abstract

We study the problem of learning physical object representations for robot manipulation. Understanding object physics is critical for successful object manipulation, but also challenging because physical object properties can rarely be inferred from the object's static appearance. In this paper, we propose DensePhysNet, a system that actively executes a sequence of dynamic interactions (e.g., sliding and colliding), and uses a deep predictive model over its visual observations to learn dense, pixel-wise representations that reflect the physical properties of observed objects. Our experiments in both simulation and real settings demonstrate that the learned representations carry rich physical information, and can directly be used to decode physical object properties such as friction and mass. The use of dense representation enables DensePhysNet to generalize well to novel scenes with more objects than in training. With knowledge of object physics, the learned representation also leads to more accurate and efficient manipulation in downstream tasks than the state-of-the-art.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
cs.RO 2024-09 conditional novelty 7.0

ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms ...
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
cs.RO 2023-07 unverdicted novelty 7.0

VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
Causal World Modeling for Robot Control
cs.CV 2026-01 unverdicted novelty 5.0

LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.