ComPose: When to Trust Hands for Object Pose Tracking

Dohyeon Lee; Hae-Gon Jeon; Hokyun Im; Inhwan Bae; Jisu Shin; JunGyu Lee; Junoh Lee; Youngwoon Lee

arxiv: 2605.23523 · v1 · pith:X7ZK2KEJnew · submitted 2026-05-22 · 💻 cs.CV

ComPose: When to Trust Hands for Object Pose Tracking

Jisu Shin , Junoh Lee , JunGyu Lee , Inhwan Bae , Dohyeon Lee , Hokyun Im , Youngwoon Lee , Hae-Gon Jeon This is my paper

Pith reviewed 2026-05-25 05:11 UTC · model grok-4.3

classification 💻 cs.CV

keywords 6DoF object trackinghand-aware pose estimationRGB videofoundation modelstemporal consistencyocclusion handlingrobot manipulation

0 comments

The pith

ComPose tracks 6DoF object poses from RGB video by treating hand motions as complementary cues rather than occluders.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that object pose tracking in videos can be made robust to hand occlusions by using hand motions as helpful signals instead of obstacles. It introduces a framework that pulls cues from foundation models for both objects and hands, selects useful hand joints on the fly, fuses them for motion estimates, refines with geometry and a learned fix, and keeps rotation and translation consistent over time. This produces steady 3D object paths without needing extra smoothing. A reader would care because reliable object motion from ordinary video supports embodied AI and lets robots learn from watching people.

Core claim

ComPose recovers a variety of object motions over time by combining object and hand cues from foundation models within a unified tracking pipeline, adaptively selects informative hand joints, refines motion using visible geometric evidence and a learned correction, and enforces temporal consistency over rotation and translation, yielding stable 3D object trajectories without external smoothing and remaining robust under severe hand occlusion.

What carries the argument

The unified tracking pipeline that adaptively selects informative hand joints and fuses object- and hand-derived cues for motion estimation.

If this is right

Yields stable 3D object trajectories without any external smoothing.
Remains robust under severe hand occlusion and geometric ambiguity.
The resulting trajectories transfer effectively to downstream robot manipulation by enabling robots to reconstruct human actions from online videos.
The method is accurate and efficient.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The cue-fusion strategy could extend to other moving occluders if similar motion correlations hold.
Performance would likely scale with improvements in the underlying foundation models for hands and objects.
Online versions could support real-time robot imitation of observed manipulations.

Load-bearing premise

That hand motions supply reliable complementary motion cues for the object even under heavy occlusion and that the foundation models produce sufficiently accurate hand and object cues to support the adaptive selection and fusion steps.

What would settle it

Video sequences with severe hand occlusions where the tracking accuracy of ComPose does not exceed that of object-only tracking methods.

Figures

Figures reproduced from arXiv: 2605.23523 by Dohyeon Lee, Hae-Gon Jeon, Hokyun Im, Inhwan Bae, Jisu Shin, JunGyu Lee, Junoh Lee, Youngwoon Lee.

**Figure 1.** Figure 1: Robot manipulation guided by the internet video. (a) Given an instructional painting video "Bob Ross Painting" from Youtube, our method (b) predicts Bob’s brush trajectory and (c) the estimated smooth trajectory is directly transferred to a real-world robot for painting. ∗Corresponding Author Preprint. arXiv:2605.23523v1 [cs.CV] 22 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: An illustration of the proposed object tracking pipeline and its application to robot [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative results from the state-of-the-art models on three datasets: DexYCB, HO3D-v2, [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Analysis on the predicted α and w along with the input frames. surface is visible, but degrades substantially under severe hand occlusion or limited geometric support. Hand-only tracking is more robust to object occlusion, but becomes less reliable when the articulated hand motion or re-grasping violates the rigid-motion assumption. These results confirm that object geometry and hand motion provide complem… view at source ↗

**Figure 5.** Figure 5: Robot manipulation from internet videos. We overlay the predicted hand joints and mesh [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of mask tracking using SAM3. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of mesh reconstruction and pose initialization using SAM3D. [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of point map using Flow3R. [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of 2D hand joints using WiLoR. [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Robot manipulation from Internet video. We overlay the predicted hand joints and mesh [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

read the original abstract

Reconstructing the motion of objects from videos is a key component for embodied AI and robot manipulation. While diverse approaches to object pose tracking have been studied, they rely heavily on strong external priors, such as depth data or 3D templates, and remain highly vulnerable to severe occlusions by hand grasps despite the use of explicit masks. In this work, we present ComPose, a 6DoF object tracking framework designed for hand-aware object pose estimation from RGB video. Rather than treating the hand purely as an occluder, our method harmonizes hand motions as a \textit{complementary cue} for object tracking. In detail, we recover a variety of object motions over time by combining object and hand cues from foundation models within a unified tracking pipeline. Here, ComPose adaptively selects informative hand joints, combines object- and hand-derived cues for motion estimation, and refines the resulting object motion using visible geometric evidence and a learned correction. We further enforce the temporal consistency over both rotation and translation, yielding stable 3D object trajectories over time without any external smoothing. Extensive experiments show that our method is accurate, efficient, and robust under severe hand occlusion and geometric ambiguity. In addition, the resulting trajectories can also effectively transfer to downstream robot manipulation by enabling robots to reconstruct human actions from online videos.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ComPose fuses hand and object cues from foundation models for tracking but rests on an untested claim that those hand cues stay reliable under heavy occlusion.

read the letter

ComPose builds a 6DoF tracker that pulls cues from both object and hand foundation models, adaptively picks informative hand joints, fuses the signals for motion, refines with visible geometry plus a learned correction, and adds a temporal consistency term on rotation and translation. The result is meant to give stable trajectories without external smoothing and to work when hands heavily occlude the object. The core move—treating the hand as a motion source instead of pure noise—is a direct response to a common failure mode in manipulation videos and could matter for robot learning from human demonstrations.

Referee Report

1 major / 0 minor

Summary. The paper presents ComPose, a 6DoF object pose tracking framework from RGB video for hand-object interactions. It treats hands as a complementary cue rather than pure occluder by combining object and hand cues from foundation models in a unified pipeline: adaptively selecting informative hand joints, fusing object- and hand-derived motion estimates, refining via visible geometric evidence plus a learned correction, and enforcing temporal consistency on both rotation and translation to yield stable trajectories without external smoothing. The work claims accuracy, efficiency, and robustness under severe hand occlusion, with downstream utility for robot manipulation from online videos.

Significance. If the central claims hold, the approach could meaningfully advance hand-aware object tracking for embodied AI and robotics by reducing dependence on depth sensors or 3D templates and by converting hand occlusion into usable motion signal. The explicit avoidance of post-hoc smoothing while maintaining temporal consistency and the integration of foundation-model cues with a learned correction step are concrete strengths worth highlighting. The absence of any reported cue-degradation curves or occlusion-specific ablations in the visible text, however, leaves the practical significance conditional on verification of the hand-cue reliability assumption.

major comments (1)

[Abstract] Abstract: The claim of being 'robust under severe hand occlusion' is load-bearing for the adaptive joint selection and cue-fusion steps, yet the text supplies no quantitative characterization of hand-pose error versus occlusion fraction nor an ablation that isolates hand-cue contribution once object visibility falls below a stated threshold. Without such evidence the fusion and selection modules rest on an unverified precondition about foundation-model cue quality.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. The point raised about supporting the robustness claim with quantitative evidence is well-taken, and we will incorporate the requested analyses to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of being 'robust under severe hand occlusion' is load-bearing for the adaptive joint selection and cue-fusion steps, yet the text supplies no quantitative characterization of hand-pose error versus occlusion fraction nor an ablation that isolates hand-cue contribution once object visibility falls below a stated threshold. Without such evidence the fusion and selection modules rest on an unverified precondition about foundation-model cue quality.

Authors: We agree that the abstract claim would be better supported by explicit quantitative evidence. In the revised manuscript we will add (1) a plot or table reporting hand-pose estimation error as a function of occlusion fraction on the evaluation sequences and (2) an ablation that isolates the contribution of the hand-derived motion cues when object visibility drops below a defined threshold (e.g., <30 % visible surface). These additions will directly verify the reliability of the foundation-model cues used by the adaptive selection and fusion modules. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic pipeline with no derivation chain

full rationale

The paper describes an engineering pipeline that fuses foundation-model cues via adaptive joint selection, cue combination, geometric refinement, learned correction, and temporal consistency enforcement. No equations, parameter-fitting steps, or first-principles derivations appear in the abstract or described method; the central claims rest on empirical validation rather than any reduction of outputs to inputs by construction. External benchmarks (experiments under occlusion) are invoked, so the work is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no visible free parameters, axioms, or invented entities; ledger left empty.

pith-pipeline@v0.9.0 · 5792 in / 1100 out tokens · 30804 ms · 2026-05-25T05:11:29.279959+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ComPose adaptively selects informative hand joints, combines object- and hand-derived cues for motion estimation, and refines the resulting object motion using visible geometric evidence and a learned correction... We further enforce the temporal consistency over both rotation and translation
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We leverage the hand as a complementary cue under occlusion and propose an adaptive module that selects informative joints and fuses hand and object cues.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

97 extracted references · 97 canonical work pages · 4 internal anchors

[1]

In: Proc

Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Proc. SPIE (1992)

work page 1992
[2]

Carion, N., Gustafson, L., Hu, Y .T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala, K.V ., Khedr, H., Huang, A., Lei, J., Ma, T., Guo, B., Kalla, A., Marks, M., Greer, J., Wang, M., Sun, P., Rädle, R., Afouras, T., Mavroudi, E., Xu, K., Wu, T.H., Zhou, Y ., Momeni, L., Hazra, R., Ding, S., Vaze, S., Porcher, F., Li, F., Li, S., Kamath, A., Cheng, H.K...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV) (2023)

Castro, P., Kim, T.K.: Crt-6d: Fast 6d object pose estimation with cascaded refinement transformers. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV) (2023)

work page 2023
[4]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

Chao, Y .W., Yang, W., Xiang, Y ., Molchanov, P., Handa, A., Tremblay, J., Narang, Y .S., Van Wyk, K., Iqbal, U., Birchfield, S., et al.: Dexycb: A benchmark for capturing hand grasping of objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

work page 2021
[5]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Chen, H., Wang, P., Wang, F., Tian, W., Xiong, L., Li, H.: Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[6]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

Chen, K., Dou, Q.: Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

work page 2021
[7]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Chen, W., Jia, X., Chang, H.J., Duan, J., Leonardis, A.: G2l-net: Global to local network for real-time 6d pose estimation with embedding vector features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020
[8]

SAM 3D: 3Dfy Anything in Images

Chen, X., Chu, F.J., Gleize, P., Liang, K.J., Sax, A., Tang, H., Wang, W., Guo, M., Hardin, T., Li, X., et al.: Sam 3d: 3dfy anything in images. arXiv preprint arXiv:2511.16624 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

Chen, X., Dong, Z., Song, J., Geiger, A., Hilliges, O.: Category level object pose estimation via neural analysis-by-synthesis. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

work page 2020
[10]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Choy, C., Dong, W., Koltun, V .: Deep global registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020
[11]

arXiv preprint (2026)

Cong, Z., Zhao, Q., Jeon, M., Tulsiani, S.: Flow3r: Factored flow prediction for scalable visual geometry learning. arXiv preprint (2026)

work page 2026
[12]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Deitke, M., Schwenk, D., Salvador, J., Weihs, L., Michel, O., VanderBilt, E., Schmidt, L., Ehsani, K., Kembhavi, A., Farhadi, A.: Objaverse: A universe of annotated 3d objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13142–13153 (2023)

work page 2023
[13]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

Di, Y ., Manhardt, F., Wang, G., Ji, X., Navab, N., Tombari, F.: So-pose: Exploiting self-occlusion for direct 6d pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

work page 2021
[14]

arXiv preprint (2018)

Do, T.T., Cai, M., Pham, T., Reid, I.: Deep-6dpose: Recovering 6d object pose from a single rgb image. arXiv preprint (2018)

work page 2018
[15]

In: 2022 International Conference on Robotics and Automation (ICRA)

Downs, L., Francis, A., Koenig, N., Kinman, B., Hickman, R., Reymann, K., McHugh, T.B., Vanhoucke, V .: Google scanned objects: A high-quality dataset of 3d scanned household items. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2553–2560. Ieee (2022)

work page 2022
[16]

arXiv preprint (2026)

Elflein, S., Li, R., Agostinho, S., Gojcic, Z., Leal-Taixé, L., Zhou, Q., Osep, A.: VGG-T 3: Offline feed-forward 3d reconstruction at scale. arXiv preprint (2026)

work page 2026
[17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Fan, Z., Pan, P., Wang, P., Jiang, Y ., Xu, D., Wang, Z.: Pope: 6-dof promptable pose estimation of any object in any scene with one reference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

work page 2024
[18]

Psychometrika (1975)

Gower, J.C.: Generalized procrustes analysis. Psychometrika (1975)

work page 1975
[19]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

Hai, Y ., Song, R., Li, J., Ferstl, D., Hu, Y .: Pseudo flow consistency for self-supervised 6d object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

work page 2023
[20]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) 10

Hai, Y ., Song, R., Li, J., Salzmann, M., Hu, Y .: Rigidity-aware detection for 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) 10

work page 2023
[21]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Hampali, S., Rad, M., Oberweger, M., Lepetit, V .: Honnotate: A method for 3d annotation of hand and object poses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020
[22]

Advances in Neural Information Processing Systems (2022)

He, X., Sun, J., Wang, Y ., Huang, D., Bao, H., Zhou, X.: Onepose++: Keypoint-free one-shot object pose estimation without cad models. Advances in Neural Information Processing Systems (2022)

work page 2022
[23]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

He, Y ., Huang, H., Fan, H., Chen, Q., Sun, J.: Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

work page 2021
[24]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

He, Y ., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020
[25]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Hodan, T., Barath, D., Matas, J.: Epos: Estimating 6d pose of objects with symmetries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020
[26]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

Hu, Y ., Hugonot, J., Fua, P., Salzmann, M.: Segmentation-driven 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

work page 2019
[27]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Huang, J., Lin, H., Wang, T., Fu, Y ., Xue, X., Zhu, Y .: Cap-net: A unified network for 6d pose and size estimation of categorical articulated parts from a single rgb-d image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025
[28]

In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)

Irshad, M.Z., Kollar, T., Laskey, M., Stone, K., Kira, Z.: Centersnap: Single-shot multi-object 3d shape reconstruction and categorical 6d pose and size estimation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)

work page 2022
[29]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

Iwase, S., Liu, X., Khirodkar, R., Yokota, R., Kitani, K.M.: Repose: Fast 6d object pose refinement via deep texture rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

work page 2021
[30]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Jiang, X., Li, D., Chen, H., Zheng, Y ., Zhao, R., Wu, L.: Uni6d: A unified cnn framework without projection breakdown for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[31]

In: European conference on computer vision

Karaev, N., Rocco, I., Graham, B., Neverova, N., Vedaldi, A., Rupprecht, C.: Cotracker: It is better to track together. In: European conference on computer vision. pp. 18–35. Springer (2024)

work page 2024
[32]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)

Kehl, W., Manhardt, F.E., Tombari, F., Ilic, S., Navab, N.: Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)

work page 2017
[33]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2015)

Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2015)

work page 2015
[34]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

Labbé, Y ., Carpentier, J., Aubry, M., Sivic, J.: Cosypose: Consistent multi-view multi-object 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

work page 2020
[35]

In: Proceedings of the Conference on Robot Learning (CoRL) (2022)

Labbé, Y ., Manuelli, L., Mousavian, A., Tyree, S., Birchfield, S., Tremblay, J., Carpentier, J., Aubry, M., Fox, D., Sivic, J.: Megapose: 6d pose estimation of novel objects via render & compare. In: Proceedings of the Conference on Robot Learning (CoRL) (2022)

work page 2022
[36]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Lee, T., Wen, B., Kang, M., Kang, G., Kweon, I.S., Yoon, K.J.: Any6d: Model-free 6d pose estimation of novel objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025
[37]

International journal of computer vision81(2), 155–166 (2009)

Lepetit, V ., Moreno-Noguer, F., Fua, P.: Ep n p: An accurate o (n) solution to the p n p problem. International journal of computer vision81(2), 155–166 (2009)

work page 2009
[38]

In: Proceedings of the Conference on Robot Learning (CoRL) (2023)

Li, G., Li, Y ., Ye, Z., Zhang, Q., Kong, T., Cui, Z., Zhang, G.: Generative category-level shape and pose estimation with semantic primitives. In: Proceedings of the Conference on Robot Learning (CoRL) (2023)

work page 2023
[39]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2018) 11

Li, Y ., Wang, G., Ji, X., Xiang, Y ., Fox, D.: Deepim: Deep iterative matching for 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018) 11

work page 2018
[40]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

Li, Z., Wang, G., Ji, X.: Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

work page 2019
[41]

In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2023)

Li, Z., Stamos, I.: Depth-based 6dof object pose estimation using swin transformer. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2023)

work page 2023
[42]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Lin, J., Liu, L., Lu, D., Jia, K.: Sam-6d: Segment anything model meets zero-shot 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

work page 2024
[43]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

Lin, J., Wei, Z., Ding, C., Jia, K.: Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

work page 2022
[44]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y .: Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

work page 2021
[45]

In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)

Lin, Y ., Tremblay, J., Tyree, S., Vela, P.A., Birchfield, S.: Single-stage keypoint-based category-level object pose estimation from an rgb image. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)

work page 2022
[46]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Lipson, L., Teed, Z., Goyal, A., Deng, J.: Coupled iterative refinement for 6d multi-object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[47]

Psychometrika (1976)

Lissitz, R.W., Schönemann, P.H., Lingoes, J.C.: A solution to the weighted procrustes problem in which the transformation is in agreement with the loss function. Psychometrika (1976)

work page 1976
[48]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Liu, M., Li, S., Chhatkuli, A., Truong, P., Van Gool, L., Tombari, F.: One2any: One-reference 6d pose estimation for any object. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025
[49]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

Liu, X., Wang, G., Li, Y ., Ji, X.: Catre: Iterative point clouds alignment for category-level object pose refinement. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

work page 2022
[50]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

Liu, Y ., Wen, Y ., Peng, S., Lin, C., Long, X.F., Komura, T., Wang, W.: Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

work page 2022
[51]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Liu, Y ., Liu, Y ., Jiang, C., Lyu, K., Wan, W., Shen, H., Liang, B., Fu, Z., Wang, H., Yi, L.: Hoi4d: A 4d egocentric dataset for category-level human-object interaction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[52]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[53]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Moon, S., Son, H., Hur, D., Kim, S.: Co-op: Correspondence-based novel object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025
[54]

arXiv preprint arXiv:2512.15508 (2025)

Moreau, A., Shaw, R., Nazarczuk, M., Shin, J., Tanay, T., Zhang, Z., Xu, S., Pérez-Pellitero, E.: Off the grid: Detection of primitives for feed-forward 3d gaussian splatting. arXiv preprint arXiv:2512.15508 (2025)

work page arXiv 2025
[55]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

Mousavian, A., Eppner, C., Fox, D.: 6-dof graspnet: Variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

work page 2019
[56]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Nguyen, V .N., Groueix, T., Salzmann, M., Lepetit, V .: Gigapose: Fast and robust novel object pose estimation via one correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

work page 2024
[57]

Oquab, M., Darcet, T., Moutakanni, T., et al.: Dinov2: Learning robust visual features without supervision (2024)

work page 2024
[58]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2024)

Örnek, E.P., Labbé, Y ., Tekin, B., Ma, L., Keskin, C., Forster, C., Hodan, T.: Foundpose: Unseen object pose estimation with foundation features. In: Proceedings of the European Conference on Computer Vision (ECCV) (2024)

work page 2024
[59]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020) 12

Park, K., Mousavian, A., Xiang, Y ., Fox, D.: Latentfusion: End-to-end differentiable reconstruction and rendering for unseen object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020) 12

work page 2020
[60]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

Park, K., Patten, T., Vincze, M.: Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

work page 2019
[61]

In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2017)

Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-dof object pose from semantic keypoints. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2017)

work page 2017
[62]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

Peng, S., Liu, Y ., Huang, Q., Zhou, X., Bao, H.: Pvnet: Pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

work page 2019
[63]

arXiv preprint (2025)

Ponimatkin, G., Cífka, M., Souˇcek, T., Fourmy, M., Labbé, Y ., Petrik, V ., Sivic, J.: 6d object pose tracking in internet videos for robotic manipulation. arXiv preprint (2025)

work page 2025
[64]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Potamias, R.A., Zhang, J., Deng, J., Zafeiriou, S.: Wilor: End-to-end 3d hand localization and recon- struction in-the-wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025
[65]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)

Rad, M., Lepetit, V .: Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)

work page 2017
[66]

arXiv preprint (2022)

Romero, J., Tzionas, D., Black, M.J.: Embodied hands: Modeling and capturing hands and bodies together. arXiv preprint (2022)

work page 2022
[67]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Shugurov, I., Li, F., Busam, B., Ilic, S.: Osop: A multi-stage one shot object pose estimation framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[68]

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021)

Shugurov, I., Zakharov, S., Ilic, S.: Dpodv2: Dense correspondence-based 6 dof pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021)

work page 2021
[69]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Song, C., Song, J., Huang, Q.: Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020
[70]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Song, X., Jin, L., Zhang, Z., Li, J., Zhong, F., Zhang, G., Qin, X.: Prior-free 3d object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025
[71]

In: 2012 IEEE/RSJ international conference on intelligent robots and systems

Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of rgb-d slam systems. In: 2012 IEEE/RSJ international conference on intelligent robots and systems. pp. 573–580. IEEE (2012)

work page 2012
[72]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Su, Y ., Saleh, M., Fetzer, T., Rambach, J., Navab, N., Busam, B., Stricker, D., Tombari, F.: Zebrapose: Coarse to fine surface encoding for 6dof object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[73]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Sun, J., Wang, Z., Zhang, S., He, X., Zhao, H., Zhang, G., Zhou, X.: Onepose: One-shot object pose estimation without cad models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[74]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

Tian, M., Ang Jr, M.H., Lee, G.H.: Shape prior deformation for categorical 6d object pose and size estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

work page 2020
[75]

IEEE Transactions on pattern analysis and machine intelligence13(4), 376–380 (2002)

Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on pattern analysis and machine intelligence13(4), 376–380 (2002)

work page 2002
[76]

In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2020)

Wang, C., Martín-Martín, R., Xu, D., Lv, J., Lu, C., Fei-Fei, L., Savarese, S., Zhu, Y .: 6-pack: Category- level 6d pose tracker with anchor-based keypoints. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2020)

work page 2020
[77]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

Wang, C., Xu, D., Zhu, Y ., Martín-Martín, R., Lu, C., Fei-Fei, L., Savarese, S.: Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

work page 2019
[78]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

Wang, G., Manhardt, F., Tombari, F., Ji, X.: Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

work page 2021
[79]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019) 13

Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019) 13

work page 2019
[80]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025

Showing first 80 references.

[1] [1]

In: Proc

Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Proc. SPIE (1992)

work page 1992

[2] [2]

Carion, N., Gustafson, L., Hu, Y .T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala, K.V ., Khedr, H., Huang, A., Lei, J., Ma, T., Guo, B., Kalla, A., Marks, M., Greer, J., Wang, M., Sun, P., Rädle, R., Afouras, T., Mavroudi, E., Xu, K., Wu, T.H., Zhou, Y ., Momeni, L., Hazra, R., Ding, S., Vaze, S., Porcher, F., Li, F., Li, S., Kamath, A., Cheng, H.K...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV) (2023)

Castro, P., Kim, T.K.: Crt-6d: Fast 6d object pose estimation with cascaded refinement transformers. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV) (2023)

work page 2023

[4] [4]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

Chao, Y .W., Yang, W., Xiang, Y ., Molchanov, P., Handa, A., Tremblay, J., Narang, Y .S., Van Wyk, K., Iqbal, U., Birchfield, S., et al.: Dexycb: A benchmark for capturing hand grasping of objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

work page 2021

[5] [5]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Chen, H., Wang, P., Wang, F., Tian, W., Xiong, L., Li, H.: Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022

[6] [6]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

Chen, K., Dou, Q.: Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

work page 2021

[7] [7]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Chen, W., Jia, X., Chang, H.J., Duan, J., Leonardis, A.: G2l-net: Global to local network for real-time 6d pose estimation with embedding vector features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020

[8] [8]

SAM 3D: 3Dfy Anything in Images

Chen, X., Chu, F.J., Gleize, P., Liang, K.J., Sax, A., Tang, H., Wang, W., Guo, M., Hardin, T., Li, X., et al.: Sam 3d: 3dfy anything in images. arXiv preprint arXiv:2511.16624 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[9] [9]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

Chen, X., Dong, Z., Song, J., Geiger, A., Hilliges, O.: Category level object pose estimation via neural analysis-by-synthesis. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

work page 2020

[10] [10]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Choy, C., Dong, W., Koltun, V .: Deep global registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020

[11] [11]

arXiv preprint (2026)

Cong, Z., Zhao, Q., Jeon, M., Tulsiani, S.: Flow3r: Factored flow prediction for scalable visual geometry learning. arXiv preprint (2026)

work page 2026

[12] [12]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Deitke, M., Schwenk, D., Salvador, J., Weihs, L., Michel, O., VanderBilt, E., Schmidt, L., Ehsani, K., Kembhavi, A., Farhadi, A.: Objaverse: A universe of annotated 3d objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13142–13153 (2023)

work page 2023

[13] [13]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

Di, Y ., Manhardt, F., Wang, G., Ji, X., Navab, N., Tombari, F.: So-pose: Exploiting self-occlusion for direct 6d pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

work page 2021

[14] [14]

arXiv preprint (2018)

Do, T.T., Cai, M., Pham, T., Reid, I.: Deep-6dpose: Recovering 6d object pose from a single rgb image. arXiv preprint (2018)

work page 2018

[15] [15]

In: 2022 International Conference on Robotics and Automation (ICRA)

Downs, L., Francis, A., Koenig, N., Kinman, B., Hickman, R., Reymann, K., McHugh, T.B., Vanhoucke, V .: Google scanned objects: A high-quality dataset of 3d scanned household items. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2553–2560. Ieee (2022)

work page 2022

[16] [16]

arXiv preprint (2026)

Elflein, S., Li, R., Agostinho, S., Gojcic, Z., Leal-Taixé, L., Zhou, Q., Osep, A.: VGG-T 3: Offline feed-forward 3d reconstruction at scale. arXiv preprint (2026)

work page 2026

[17] [17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Fan, Z., Pan, P., Wang, P., Jiang, Y ., Xu, D., Wang, Z.: Pope: 6-dof promptable pose estimation of any object in any scene with one reference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

work page 2024

[18] [18]

Psychometrika (1975)

Gower, J.C.: Generalized procrustes analysis. Psychometrika (1975)

work page 1975

[19] [19]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

Hai, Y ., Song, R., Li, J., Ferstl, D., Hu, Y .: Pseudo flow consistency for self-supervised 6d object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

work page 2023

[20] [20]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) 10

Hai, Y ., Song, R., Li, J., Salzmann, M., Hu, Y .: Rigidity-aware detection for 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) 10

work page 2023

[21] [21]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Hampali, S., Rad, M., Oberweger, M., Lepetit, V .: Honnotate: A method for 3d annotation of hand and object poses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020

[22] [22]

Advances in Neural Information Processing Systems (2022)

He, X., Sun, J., Wang, Y ., Huang, D., Bao, H., Zhou, X.: Onepose++: Keypoint-free one-shot object pose estimation without cad models. Advances in Neural Information Processing Systems (2022)

work page 2022

[23] [23]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

He, Y ., Huang, H., Fan, H., Chen, Q., Sun, J.: Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

work page 2021

[24] [24]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

He, Y ., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020

[25] [25]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Hodan, T., Barath, D., Matas, J.: Epos: Estimating 6d pose of objects with symmetries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020

[26] [26]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

Hu, Y ., Hugonot, J., Fua, P., Salzmann, M.: Segmentation-driven 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

work page 2019

[27] [27]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Huang, J., Lin, H., Wang, T., Fu, Y ., Xue, X., Zhu, Y .: Cap-net: A unified network for 6d pose and size estimation of categorical articulated parts from a single rgb-d image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025

[28] [28]

In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)

Irshad, M.Z., Kollar, T., Laskey, M., Stone, K., Kira, Z.: Centersnap: Single-shot multi-object 3d shape reconstruction and categorical 6d pose and size estimation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)

work page 2022

[29] [29]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

Iwase, S., Liu, X., Khirodkar, R., Yokota, R., Kitani, K.M.: Repose: Fast 6d object pose refinement via deep texture rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

work page 2021

[30] [30]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Jiang, X., Li, D., Chen, H., Zheng, Y ., Zhao, R., Wu, L.: Uni6d: A unified cnn framework without projection breakdown for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022

[31] [31]

In: European conference on computer vision

Karaev, N., Rocco, I., Graham, B., Neverova, N., Vedaldi, A., Rupprecht, C.: Cotracker: It is better to track together. In: European conference on computer vision. pp. 18–35. Springer (2024)

work page 2024

[32] [32]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)

Kehl, W., Manhardt, F.E., Tombari, F., Ilic, S., Navab, N.: Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)

work page 2017

[33] [33]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2015)

Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2015)

work page 2015

[34] [34]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

Labbé, Y ., Carpentier, J., Aubry, M., Sivic, J.: Cosypose: Consistent multi-view multi-object 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

work page 2020

[35] [35]

In: Proceedings of the Conference on Robot Learning (CoRL) (2022)

Labbé, Y ., Manuelli, L., Mousavian, A., Tyree, S., Birchfield, S., Tremblay, J., Carpentier, J., Aubry, M., Fox, D., Sivic, J.: Megapose: 6d pose estimation of novel objects via render & compare. In: Proceedings of the Conference on Robot Learning (CoRL) (2022)

work page 2022

[36] [36]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Lee, T., Wen, B., Kang, M., Kang, G., Kweon, I.S., Yoon, K.J.: Any6d: Model-free 6d pose estimation of novel objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025

[37] [37]

International journal of computer vision81(2), 155–166 (2009)

Lepetit, V ., Moreno-Noguer, F., Fua, P.: Ep n p: An accurate o (n) solution to the p n p problem. International journal of computer vision81(2), 155–166 (2009)

work page 2009

[38] [38]

In: Proceedings of the Conference on Robot Learning (CoRL) (2023)

Li, G., Li, Y ., Ye, Z., Zhang, Q., Kong, T., Cui, Z., Zhang, G.: Generative category-level shape and pose estimation with semantic primitives. In: Proceedings of the Conference on Robot Learning (CoRL) (2023)

work page 2023

[39] [39]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2018) 11

Li, Y ., Wang, G., Ji, X., Xiang, Y ., Fox, D.: Deepim: Deep iterative matching for 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018) 11

work page 2018

[40] [40]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

Li, Z., Wang, G., Ji, X.: Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

work page 2019

[41] [41]

In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2023)

Li, Z., Stamos, I.: Depth-based 6dof object pose estimation using swin transformer. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2023)

work page 2023

[42] [42]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Lin, J., Liu, L., Lu, D., Jia, K.: Sam-6d: Segment anything model meets zero-shot 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

work page 2024

[43] [43]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

Lin, J., Wei, Z., Ding, C., Jia, K.: Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

work page 2022

[44] [44]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y .: Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

work page 2021

[45] [45]

In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)

Lin, Y ., Tremblay, J., Tyree, S., Vela, P.A., Birchfield, S.: Single-stage keypoint-based category-level object pose estimation from an rgb image. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)

work page 2022

[46] [46]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Lipson, L., Teed, Z., Goyal, A., Deng, J.: Coupled iterative refinement for 6d multi-object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022

[47] [47]

Psychometrika (1976)

Lissitz, R.W., Schönemann, P.H., Lingoes, J.C.: A solution to the weighted procrustes problem in which the transformation is in agreement with the loss function. Psychometrika (1976)

work page 1976

[48] [48]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Liu, M., Li, S., Chhatkuli, A., Truong, P., Van Gool, L., Tombari, F.: One2any: One-reference 6d pose estimation for any object. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025

[49] [49]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

Liu, X., Wang, G., Li, Y ., Ji, X.: Catre: Iterative point clouds alignment for category-level object pose refinement. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

work page 2022

[50] [50]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

Liu, Y ., Wen, Y ., Peng, S., Lin, C., Long, X.F., Komura, T., Wang, W.: Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)

work page 2022

[51] [51]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Liu, Y ., Liu, Y ., Jiang, C., Lyu, K., Wan, W., Shen, H., Liang, B., Fu, Z., Wang, H., Yi, L.: Hoi4d: A 4d egocentric dataset for category-level human-object interaction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022

[52] [52]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[53] [53]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Moon, S., Son, H., Hur, D., Kim, S.: Co-op: Correspondence-based novel object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025

[54] [54]

arXiv preprint arXiv:2512.15508 (2025)

Moreau, A., Shaw, R., Nazarczuk, M., Shin, J., Tanay, T., Zhang, Z., Xu, S., Pérez-Pellitero, E.: Off the grid: Detection of primitives for feed-forward 3d gaussian splatting. arXiv preprint arXiv:2512.15508 (2025)

work page arXiv 2025

[55] [55]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

Mousavian, A., Eppner, C., Fox, D.: 6-dof graspnet: Variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

work page 2019

[56] [56]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Nguyen, V .N., Groueix, T., Salzmann, M., Lepetit, V .: Gigapose: Fast and robust novel object pose estimation via one correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

work page 2024

[57] [57]

Oquab, M., Darcet, T., Moutakanni, T., et al.: Dinov2: Learning robust visual features without supervision (2024)

work page 2024

[58] [58]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2024)

Örnek, E.P., Labbé, Y ., Tekin, B., Ma, L., Keskin, C., Forster, C., Hodan, T.: Foundpose: Unseen object pose estimation with foundation features. In: Proceedings of the European Conference on Computer Vision (ECCV) (2024)

work page 2024

[59] [59]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020) 12

Park, K., Mousavian, A., Xiang, Y ., Fox, D.: Latentfusion: End-to-end differentiable reconstruction and rendering for unseen object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020) 12

work page 2020

[60] [60]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

Park, K., Patten, T., Vincze, M.: Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

work page 2019

[61] [61]

In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2017)

Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-dof object pose from semantic keypoints. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2017)

work page 2017

[62] [62]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

Peng, S., Liu, Y ., Huang, Q., Zhou, X., Bao, H.: Pvnet: Pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

work page 2019

[63] [63]

arXiv preprint (2025)

Ponimatkin, G., Cífka, M., Souˇcek, T., Fourmy, M., Labbé, Y ., Petrik, V ., Sivic, J.: 6d object pose tracking in internet videos for robotic manipulation. arXiv preprint (2025)

work page 2025

[64] [64]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Potamias, R.A., Zhang, J., Deng, J., Zafeiriou, S.: Wilor: End-to-end 3d hand localization and recon- struction in-the-wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025

[65] [65]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)

Rad, M., Lepetit, V .: Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)

work page 2017

[66] [66]

arXiv preprint (2022)

Romero, J., Tzionas, D., Black, M.J.: Embodied hands: Modeling and capturing hands and bodies together. arXiv preprint (2022)

work page 2022

[67] [67]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Shugurov, I., Li, F., Busam, B., Ilic, S.: Osop: A multi-stage one shot object pose estimation framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022

[68] [68]

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021)

Shugurov, I., Zakharov, S., Ilic, S.: Dpodv2: Dense correspondence-based 6 dof pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021)

work page 2021

[69] [69]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Song, C., Song, J., Huang, Q.: Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

work page 2020

[70] [70]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Song, X., Jin, L., Zhang, Z., Li, J., Zhong, F., Zhang, G., Qin, X.: Prior-free 3d object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025

[71] [71]

In: 2012 IEEE/RSJ international conference on intelligent robots and systems

Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of rgb-d slam systems. In: 2012 IEEE/RSJ international conference on intelligent robots and systems. pp. 573–580. IEEE (2012)

work page 2012

[72] [72]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Su, Y ., Saleh, M., Fetzer, T., Rambach, J., Navab, N., Busam, B., Stricker, D., Tombari, F.: Zebrapose: Coarse to fine surface encoding for 6dof object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022

[73] [73]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Sun, J., Wang, Z., Zhang, S., He, X., Zhao, H., Zhang, G., Zhou, X.: Onepose: One-shot object pose estimation without cad models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022

[74] [74]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

Tian, M., Ang Jr, M.H., Lee, G.H.: Shape prior deformation for categorical 6d object pose and size estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)

work page 2020

[75] [75]

IEEE Transactions on pattern analysis and machine intelligence13(4), 376–380 (2002)

Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on pattern analysis and machine intelligence13(4), 376–380 (2002)

work page 2002

[76] [76]

In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2020)

Wang, C., Martín-Martín, R., Xu, D., Lv, J., Lu, C., Fei-Fei, L., Savarese, S., Zhu, Y .: 6-pack: Category- level 6d pose tracker with anchor-based keypoints. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2020)

work page 2020

[77] [77]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

Wang, C., Xu, D., Zhu, Y ., Martín-Martín, R., Lu, C., Fei-Fei, L., Savarese, S.: Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

work page 2019

[78] [78]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

Wang, G., Manhardt, F., Tombari, F., Ji, X.: Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

work page 2021

[79] [79]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019) 13

Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019) 13

work page 2019

[80] [80]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025