Frame Mining: a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds

Hao Su; Minghua Liu; Xuanlin Li; Yangyan Li; Zhan Ling

arxiv: 2210.07442 · v1 · pith:I7W4ARHInew · submitted 2022-10-14 · 💻 cs.RO · cs.CV

Frame Mining: a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds

Minghua Liu , Xuanlin Li , Zhan Ling , Yangyan Li , Hao Su This is my paper

classification 💻 cs.RO cs.CV

keywords framepointlearningcloudsframesmanipulationtasksacross

0 comments

read the original abstract

We study how choices of input point cloud coordinate frames impact learning of manipulation skills from 3D point clouds. There exist a variety of coordinate frame choices to normalize captured robot-object-interaction point clouds. We find that different frames have a profound effect on agent learning performance, and the trend is similar across 3D backbone networks. In particular, the end-effector frame and the target-part frame achieve higher training efficiency than the commonly used world frame and robot-base frame in many tasks, intuitively because they provide helpful alignments among point clouds across time steps and thus can simplify visual module learning. Moreover, the well-performing frames vary across tasks, and some tasks may benefit from multiple frame candidates. We thus propose FrameMiners to adaptively select candidate frames and fuse their merits in a task-agnostic manner. Experimentally, FrameMiners achieves on-par or significantly higher performance than the best single-frame version on five fully physical manipulation tasks adapted from ManiSkill and OCRTOC. Without changing existing camera placements or adding extra cameras, point cloud frame mining can serve as a free lunch to improve 3D manipulation learning.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SSI-Policy: Learning Structured Scene Interfaces for Vision-Language Robotic Manipulation
cs.RO 2026-06 unverdicted novelty 6.0

SSI-Policy learns a robot-agnostic RGB-only scene interface from video to improve vision-language manipulation policies by 15% on LIBERO with only 10 demos per task.
SSI-Policy: Learning Structured Scene Interfaces for Vision-Language Robotic Manipulation
cs.RO 2026-06 unverdicted novelty 6.0

SSI-Policy uses an RGB-only Structured Scene Interface to improve LIBERO benchmark performance by nearly 15% with only 10 demonstrations per task compared to prior methods.