PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

Yu Xiang , Tanner Schmidt , Venkatraman Narayanan , Dieter Fox

Authors on Pith no claims yet

classification 💻 cs.CV cs.RO

keywords datasetobjectsposecnnobjectposeestimationaccuratechallenging

read the original abstract

Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset. Our code and dataset are available at https://rse-lab.cs.washington.edu/projects/posecnn/.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Point2Pose: Occlusion-Recovering 6D Pose Tracking and 3D Reconstruction for Multiple Unknown Objects Via 2D Point Trackers
cs.CV 2026-04 unverdicted novelty 7.0

A model-free system uses 2D point trackers to achieve causal 6D pose tracking and incremental 3D reconstruction for multiple unseen rigid objects from RGB-D video, with recovery from complete occlusions.
Physically Grounded 3D Generative Reconstruction under Hand Occlusion using Proprioception and Multi-Contact Touch
cs.CV 2026-04 unverdicted novelty 7.0

A conditional diffusion model using proprioception and multi-contact touch produces metric-scale, physically consistent 3D object reconstructions under hand occlusion.
FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios
cs.CV 2026-04 conditional novelty 7.0

FORGE benchmark shows domain-specific knowledge, not visual grounding, is the main bottleneck for MLLMs in manufacturing, with SFT on a 3B model delivering up to 90.8% relative accuracy improvement on held-out scenarios.
TORA: Topological Representation Alignment for 3D Shape Assembly
cs.CV 2026-04 unverdicted novelty 7.0

TORA distills topological structure from pretrained 3D encoders into flow-matching backbones via cosine matching and CKA loss, delivering up to 6.9x faster convergence and better accuracy on 3D shape assembly benchmar...
Event6D: Event-based Novel Object 6D Pose Tracking
cs.CV 2026-03 conditional novelty 7.0

EventTrack6D tracks 6D poses of unseen objects from event cameras by reconstructing dense intensity and depth cues between frames, generalizing from synthetic training to real data at high speed.
Focusable Monocular Depth Estimation
cs.CV 2026-05 unverdicted novelty 6.0

FocusDepth is a prompt-conditioned framework that fuses SAM3 features into Depth Anything models via Multi-Scale Spatial-Aligned Fusion to improve target-region depth accuracy on the new FDE-Bench.
Doppler Prompting for Stable mmWave-based Human Pose Estimation
cs.HC 2026-05 unverdicted novelty 5.0

PULSE stabilizes mmWave human pose estimation by screening Doppler motion prompts before injecting them into spatial magnitude reasoning.
Temporally Consistent Object 6D Pose Estimation for Robot Control
cs.RO 2026-05 unverdicted novelty 5.0

A factor graph that fuses motion models with uncertainty-aware pose measurements improves temporal consistency and benchmark scores for vision-based robot control.
TSM-Pose: Topology-Aware Learning with Semantic Mamba for Category-Level Object Pose Estimation
cs.CV 2026-04 unverdicted novelty 5.0

TSM-Pose adds topology extraction and semantic Mamba blocks to point-cloud features, outperforming prior methods on REAL275, CAMERA25, and HouseCat6D for category-level pose estimation.
MAPRPose: Mask-Aware Proposal and Amodal Refinement for Multi-Object 6D Pose Estimation
cs.CV 2026-04 unverdicted novelty 4.0

MAPRPose reports 76.5% Average Recall on the BOP benchmark for multi-object 6D pose estimation, beating FoundationPose by 3.1% while running 43 times faster through mask-aware proposals and amodal refinement.