hub

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

Xiang, Y · 2017 · cs.CV · arXiv 1711.00199

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

open full Pith review browse 18 citing papers arXiv PDF

abstract

Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset. Our code and dataset are available at https://rse-lab.cs.washington.edu/projects/posecnn/.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 1 dataset 1

citation-polarity summary

background 2 baseline 1 use dataset 1

representative citing papers

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

cs.RO · 2024-03-14 · accept · novelty 8.0

BEHAVIOR-1K introduces a benchmark of 1,000 human everyday activities in realistic simulated scenes together with the OMNIGIBSON physics simulator to evaluate embodied AI.

Event6D: Event-based Novel Object 6D Pose Tracking

cs.CV · 2026-03-30 · conditional · novelty 7.0

EventTrack6D tracks 6D poses of unseen objects from event cameras by reconstructing dense intensity and depth cues between frames, generalizing from synthetic training to real data at high speed.

Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization

cs.RO · 2026-02-23 · unverdicted · novelty 7.0

SPARCS uses a differentiable contact model and sparse Hessian solver to jointly optimize shapes and poses of up to five interacting objects, producing physically valid simulation-ready reconstructions.

Point2Pose: Occlusion-Recovering 6D Pose Tracking and 3D Reconstruction for Multiple Unknown Objects Via 2D Point Trackers

cs.CV · 2026-04-12 · unverdicted · novelty 7.0

A model-free system uses 2D point trackers to achieve causal 6D pose tracking and incremental 3D reconstruction for multiple unseen rigid objects from RGB-D video, with recovery from complete occlusions.

Physically Grounded 3D Generative Reconstruction under Hand Occlusion using Proprioception and Multi-Contact Touch

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

A conditional diffusion model using proprioception and multi-contact touch produces metric-scale, physically consistent 3D object reconstructions under hand occlusion.

FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios

cs.CV · 2026-04-08 · conditional · novelty 7.0

FORGE benchmark shows domain-specific knowledge, not visual grounding, is the main bottleneck for MLLMs in manufacturing, with SFT on a 3B model delivering up to 90.8% relative accuracy improvement on held-out scenarios.

TORA: Topological Representation Alignment for 3D Shape Assembly

cs.CV · 2026-04-05 · unverdicted · novelty 7.0

TORA distills topological structure from pretrained 3D encoders into flow-matching backbones via cosine matching and CKA loss, delivering up to 6.9x faster convergence and better accuracy on 3D shape assembly benchmarks with zero inference overhead.

Imaging Hidden Objects with Consumer LiDAR via Motion Induced Sampling

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

Demonstrates non-line-of-sight 3D reconstruction, tracking, and camera localization on smartphone-grade LiDAR by fusing frames via a motion-induced aperture sampling model.

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

cs.RO · 2025-08-19 · conditional · novelty 6.0

Embodied-R1 uses a pointing-centric representation and reinforced fine-tuning on a 200K dataset to achieve state-of-the-art results on embodied benchmarks plus 56.2% success in SIMPLEREnv and 87.5% on real XArm tasks without task-specific training.

Focusable Monocular Depth Estimation

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

FocusDepth is a prompt-conditioned framework that fuses SAM3 features into Depth Anything models via Multi-Scale Spatial-Aligned Fusion to improve target-region depth accuracy on the new FDE-Bench.

Generalizable and Actionable Parts Pose Estimation with Symmetry Annotation-Free Learning Strategy

cs.RO · 2026-05-16 · unverdicted · novelty 5.0

SAFAG introduces a symmetry annotation-free two-stage learning strategy for generalizable actionable parts pose estimation in robotics.

Doppler Prompting for Stable mmWave-based Human Pose Estimation

cs.HC · 2026-05-13 · unverdicted · novelty 5.0

PULSE stabilizes mmWave human pose estimation by screening Doppler motion prompts before injecting them into spatial magnitude reasoning.

Temporally Consistent Object 6D Pose Estimation for Robot Control

cs.RO · 2026-05-04 · unverdicted · novelty 5.0

A factor graph that fuses motion models with uncertainty-aware pose measurements improves temporal consistency and benchmark scores for vision-based robot control.

TSM-Pose: Topology-Aware Learning with Semantic Mamba for Category-Level Object Pose Estimation

cs.CV · 2026-04-18 · unverdicted · novelty 5.0

TSM-Pose adds topology extraction and semantic Mamba blocks to point-cloud features, outperforming prior methods on REAL275, CAMERA25, and HouseCat6D for category-level pose estimation.

RDGen: Demonstration Generation for High-Quality Robot Learning via Reinforcement Learning

cs.RO · 2026-05-29 · unverdicted · novelty 4.0

RDGen uses sim-to-real RL policies to generate smoother robot demonstrations that improve downstream VLA performance over human-collected data on pick-and-place tasks.

GNC-Pose: Geometry-Aware GNC-PnP for Accurate 6D Pose Estimation

cs.CV · 2025-12-06 · unverdicted · novelty 4.0

GNC-Pose achieves competitive 6D pose accuracy on the YCB dataset for textured objects using only geometric priors, rendering initialization, and robust GNC optimization without any learned features or training data.

Deep Visual Servoing of an Aerial Robot Using Keypoint Feature Extraction

cs.RO · 2025-03-29 · unverdicted · novelty 3.0

CNN keypoint detection enables marker-free image-based visual servoing for aerial robots with robustness to occlusion and lighting changes, validated in Gazebo simulations.

MAPRPose: Mask-Aware Proposal and Amodal Refinement for Multi-Object 6D Pose Estimation

cs.CV · 2026-04-22

citing papers explorer

Showing 3 of 3 citing papers after filters.

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation cs.RO · 2025-08-19 · conditional · none · ref 36 · internal anchor
Embodied-R1 uses a pointing-centric representation and reinforced fine-tuning on a 200K dataset to achieve state-of-the-art results on embodied benchmarks plus 56.2% success in SIMPLEREnv and 87.5% on real XArm tasks without task-specific training.
GNC-Pose: Geometry-Aware GNC-PnP for Accurate 6D Pose Estimation cs.CV · 2025-12-06 · unverdicted · none · ref 31 · internal anchor
GNC-Pose achieves competitive 6D pose accuracy on the YCB dataset for textured objects using only geometric priors, rendering initialization, and robust GNC optimization without any learned features or training data.
Deep Visual Servoing of an Aerial Robot Using Keypoint Feature Extraction cs.RO · 2025-03-29 · unverdicted · none · ref 9 · internal anchor
CNN keypoint detection enables marker-free image-based visual servoing for aerial robots with robustness to occlusion and lighting changes, validated in Gazebo simulations.

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer