ComPose: When to Trust Hands for Object Pose Tracking
Pith reviewed 2026-05-25 05:11 UTC · model grok-4.3
The pith
ComPose tracks 6DoF object poses from RGB video by treating hand motions as complementary cues rather than occluders.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ComPose recovers a variety of object motions over time by combining object and hand cues from foundation models within a unified tracking pipeline, adaptively selects informative hand joints, refines motion using visible geometric evidence and a learned correction, and enforces temporal consistency over rotation and translation, yielding stable 3D object trajectories without external smoothing and remaining robust under severe hand occlusion.
What carries the argument
The unified tracking pipeline that adaptively selects informative hand joints and fuses object- and hand-derived cues for motion estimation.
If this is right
- Yields stable 3D object trajectories without any external smoothing.
- Remains robust under severe hand occlusion and geometric ambiguity.
- The resulting trajectories transfer effectively to downstream robot manipulation by enabling robots to reconstruct human actions from online videos.
- The method is accurate and efficient.
Where Pith is reading between the lines
- The cue-fusion strategy could extend to other moving occluders if similar motion correlations hold.
- Performance would likely scale with improvements in the underlying foundation models for hands and objects.
- Online versions could support real-time robot imitation of observed manipulations.
Load-bearing premise
That hand motions supply reliable complementary motion cues for the object even under heavy occlusion and that the foundation models produce sufficiently accurate hand and object cues to support the adaptive selection and fusion steps.
What would settle it
Video sequences with severe hand occlusions where the tracking accuracy of ComPose does not exceed that of object-only tracking methods.
Figures
read the original abstract
Reconstructing the motion of objects from videos is a key component for embodied AI and robot manipulation. While diverse approaches to object pose tracking have been studied, they rely heavily on strong external priors, such as depth data or 3D templates, and remain highly vulnerable to severe occlusions by hand grasps despite the use of explicit masks. In this work, we present ComPose, a 6DoF object tracking framework designed for hand-aware object pose estimation from RGB video. Rather than treating the hand purely as an occluder, our method harmonizes hand motions as a \textit{complementary cue} for object tracking. In detail, we recover a variety of object motions over time by combining object and hand cues from foundation models within a unified tracking pipeline. Here, ComPose adaptively selects informative hand joints, combines object- and hand-derived cues for motion estimation, and refines the resulting object motion using visible geometric evidence and a learned correction. We further enforce the temporal consistency over both rotation and translation, yielding stable 3D object trajectories over time without any external smoothing. Extensive experiments show that our method is accurate, efficient, and robust under severe hand occlusion and geometric ambiguity. In addition, the resulting trajectories can also effectively transfer to downstream robot manipulation by enabling robots to reconstruct human actions from online videos.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents ComPose, a 6DoF object pose tracking framework from RGB video for hand-object interactions. It treats hands as a complementary cue rather than pure occluder by combining object and hand cues from foundation models in a unified pipeline: adaptively selecting informative hand joints, fusing object- and hand-derived motion estimates, refining via visible geometric evidence plus a learned correction, and enforcing temporal consistency on both rotation and translation to yield stable trajectories without external smoothing. The work claims accuracy, efficiency, and robustness under severe hand occlusion, with downstream utility for robot manipulation from online videos.
Significance. If the central claims hold, the approach could meaningfully advance hand-aware object tracking for embodied AI and robotics by reducing dependence on depth sensors or 3D templates and by converting hand occlusion into usable motion signal. The explicit avoidance of post-hoc smoothing while maintaining temporal consistency and the integration of foundation-model cues with a learned correction step are concrete strengths worth highlighting. The absence of any reported cue-degradation curves or occlusion-specific ablations in the visible text, however, leaves the practical significance conditional on verification of the hand-cue reliability assumption.
major comments (1)
- [Abstract] Abstract: The claim of being 'robust under severe hand occlusion' is load-bearing for the adaptive joint selection and cue-fusion steps, yet the text supplies no quantitative characterization of hand-pose error versus occlusion fraction nor an ablation that isolates hand-cue contribution once object visibility falls below a stated threshold. Without such evidence the fusion and selection modules rest on an unverified precondition about foundation-model cue quality.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our work. The point raised about supporting the robustness claim with quantitative evidence is well-taken, and we will incorporate the requested analyses to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim of being 'robust under severe hand occlusion' is load-bearing for the adaptive joint selection and cue-fusion steps, yet the text supplies no quantitative characterization of hand-pose error versus occlusion fraction nor an ablation that isolates hand-cue contribution once object visibility falls below a stated threshold. Without such evidence the fusion and selection modules rest on an unverified precondition about foundation-model cue quality.
Authors: We agree that the abstract claim would be better supported by explicit quantitative evidence. In the revised manuscript we will add (1) a plot or table reporting hand-pose estimation error as a function of occlusion fraction on the evaluation sequences and (2) an ablation that isolates the contribution of the hand-derived motion cues when object visibility drops below a defined threshold (e.g., <30 % visible surface). These additions will directly verify the reliability of the foundation-model cues used by the adaptive selection and fusion modules. revision: yes
Circularity Check
No circularity: algorithmic pipeline with no derivation chain
full rationale
The paper describes an engineering pipeline that fuses foundation-model cues via adaptive joint selection, cue combination, geometric refinement, learned correction, and temporal consistency enforcement. No equations, parameter-fitting steps, or first-principles derivations appear in the abstract or described method; the central claims rest on empirical validation rather than any reduction of outputs to inputs by construction. External benchmarks (experiments under occlusion) are invoked, so the work is self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ComPose adaptively selects informative hand joints, combines object- and hand-derived cues for motion estimation, and refines the resulting object motion using visible geometric evidence and a learned correction... We further enforce the temporal consistency over both rotation and translation
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We leverage the hand as a complementary cue under occlusion and propose an adaptive module that selects informative joints and fuses hand and object cues.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Carion, N., Gustafson, L., Hu, Y .T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala, K.V ., Khedr, H., Huang, A., Lei, J., Ma, T., Guo, B., Kalla, A., Marks, M., Greer, J., Wang, M., Sun, P., Rädle, R., Afouras, T., Mavroudi, E., Xu, K., Wu, T.H., Zhou, Y ., Momeni, L., Hazra, R., Ding, S., Vaze, S., Porcher, F., Li, F., Li, S., Kamath, A., Cheng, H.K...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV) (2023)
Castro, P., Kim, T.K.: Crt-6d: Fast 6d object pose estimation with cascaded refinement transformers. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV) (2023)
work page 2023
-
[4]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Chao, Y .W., Yang, W., Xiang, Y ., Molchanov, P., Handa, A., Tremblay, J., Narang, Y .S., Van Wyk, K., Iqbal, U., Birchfield, S., et al.: Dexycb: A benchmark for capturing hand grasping of objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
work page 2021
-
[5]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Chen, H., Wang, P., Wang, F., Tian, W., Xiong, L., Li, H.: Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
work page 2022
-
[6]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Chen, K., Dou, Q.: Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
work page 2021
-
[7]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Chen, W., Jia, X., Chang, H.J., Duan, J., Leonardis, A.: G2l-net: Global to local network for real-time 6d pose estimation with embedding vector features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
work page 2020
-
[8]
SAM 3D: 3Dfy Anything in Images
Chen, X., Chu, F.J., Gleize, P., Liang, K.J., Sax, A., Tang, H., Wang, W., Guo, M., Hardin, T., Li, X., et al.: Sam 3d: 3dfy anything in images. arXiv preprint arXiv:2511.16624 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Chen, X., Dong, Z., Song, J., Geiger, A., Hilliges, O.: Category level object pose estimation via neural analysis-by-synthesis. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
work page 2020
-
[10]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Choy, C., Dong, W., Koltun, V .: Deep global registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
work page 2020
-
[11]
Cong, Z., Zhao, Q., Jeon, M., Tulsiani, S.: Flow3r: Factored flow prediction for scalable visual geometry learning. arXiv preprint (2026)
work page 2026
-
[12]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Deitke, M., Schwenk, D., Salvador, J., Weihs, L., Michel, O., VanderBilt, E., Schmidt, L., Ehsani, K., Kembhavi, A., Farhadi, A.: Objaverse: A universe of annotated 3d objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13142–13153 (2023)
work page 2023
-
[13]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Di, Y ., Manhardt, F., Wang, G., Ji, X., Navab, N., Tombari, F.: So-pose: Exploiting self-occlusion for direct 6d pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
work page 2021
-
[14]
Do, T.T., Cai, M., Pham, T., Reid, I.: Deep-6dpose: Recovering 6d object pose from a single rgb image. arXiv preprint (2018)
work page 2018
-
[15]
In: 2022 International Conference on Robotics and Automation (ICRA)
Downs, L., Francis, A., Koenig, N., Kinman, B., Hickman, R., Reymann, K., McHugh, T.B., Vanhoucke, V .: Google scanned objects: A high-quality dataset of 3d scanned household items. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2553–2560. Ieee (2022)
work page 2022
-
[16]
Elflein, S., Li, R., Agostinho, S., Gojcic, Z., Leal-Taixé, L., Zhou, Q., Osep, A.: VGG-T 3: Offline feed-forward 3d reconstruction at scale. arXiv preprint (2026)
work page 2026
-
[17]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Fan, Z., Pan, P., Wang, P., Jiang, Y ., Xu, D., Wang, Z.: Pope: 6-dof promptable pose estimation of any object in any scene with one reference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
work page 2024
-
[18]
Gower, J.C.: Generalized procrustes analysis. Psychometrika (1975)
work page 1975
-
[19]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Hai, Y ., Song, R., Li, J., Ferstl, D., Hu, Y .: Pseudo flow consistency for self-supervised 6d object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
work page 2023
-
[20]
Hai, Y ., Song, R., Li, J., Salzmann, M., Hu, Y .: Rigidity-aware detection for 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023) 10
work page 2023
-
[21]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Hampali, S., Rad, M., Oberweger, M., Lepetit, V .: Honnotate: A method for 3d annotation of hand and object poses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
work page 2020
-
[22]
Advances in Neural Information Processing Systems (2022)
He, X., Sun, J., Wang, Y ., Huang, D., Bao, H., Zhou, X.: Onepose++: Keypoint-free one-shot object pose estimation without cad models. Advances in Neural Information Processing Systems (2022)
work page 2022
-
[23]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
He, Y ., Huang, H., Fan, H., Chen, Q., Sun, J.: Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
work page 2021
-
[24]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
He, Y ., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
work page 2020
-
[25]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Hodan, T., Barath, D., Matas, J.: Epos: Estimating 6d pose of objects with symmetries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
work page 2020
-
[26]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Hu, Y ., Hugonot, J., Fua, P., Salzmann, M.: Segmentation-driven 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
work page 2019
-
[27]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
Huang, J., Lin, H., Wang, T., Fu, Y ., Xue, X., Zhu, Y .: Cap-net: A unified network for 6d pose and size estimation of categorical articulated parts from a single rgb-d image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
work page 2025
-
[28]
In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)
Irshad, M.Z., Kollar, T., Laskey, M., Stone, K., Kira, Z.: Centersnap: Single-shot multi-object 3d shape reconstruction and categorical 6d pose and size estimation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)
work page 2022
-
[29]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Iwase, S., Liu, X., Khirodkar, R., Yokota, R., Kitani, K.M.: Repose: Fast 6d object pose refinement via deep texture rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
work page 2021
-
[30]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Jiang, X., Li, D., Chen, H., Zheng, Y ., Zhao, R., Wu, L.: Uni6d: A unified cnn framework without projection breakdown for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
work page 2022
-
[31]
In: European conference on computer vision
Karaev, N., Rocco, I., Graham, B., Neverova, N., Vedaldi, A., Rupprecht, C.: Cotracker: It is better to track together. In: European conference on computer vision. pp. 18–35. Springer (2024)
work page 2024
-
[32]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)
Kehl, W., Manhardt, F.E., Tombari, F., Ilic, S., Navab, N.: Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)
work page 2017
-
[33]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2015)
Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2015)
work page 2015
-
[34]
In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Labbé, Y ., Carpentier, J., Aubry, M., Sivic, J.: Cosypose: Consistent multi-view multi-object 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
work page 2020
-
[35]
In: Proceedings of the Conference on Robot Learning (CoRL) (2022)
Labbé, Y ., Manuelli, L., Mousavian, A., Tyree, S., Birchfield, S., Tremblay, J., Carpentier, J., Aubry, M., Fox, D., Sivic, J.: Megapose: 6d pose estimation of novel objects via render & compare. In: Proceedings of the Conference on Robot Learning (CoRL) (2022)
work page 2022
-
[36]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
Lee, T., Wen, B., Kang, M., Kang, G., Kweon, I.S., Yoon, K.J.: Any6d: Model-free 6d pose estimation of novel objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
work page 2025
-
[37]
International journal of computer vision81(2), 155–166 (2009)
Lepetit, V ., Moreno-Noguer, F., Fua, P.: Ep n p: An accurate o (n) solution to the p n p problem. International journal of computer vision81(2), 155–166 (2009)
work page 2009
-
[38]
In: Proceedings of the Conference on Robot Learning (CoRL) (2023)
Li, G., Li, Y ., Ye, Z., Zhang, Q., Kong, T., Cui, Z., Zhang, G.: Generative category-level shape and pose estimation with semantic primitives. In: Proceedings of the Conference on Robot Learning (CoRL) (2023)
work page 2023
-
[39]
In: Proceedings of the European Conference on Computer Vision (ECCV) (2018) 11
Li, Y ., Wang, G., Ji, X., Xiang, Y ., Fox, D.: Deepim: Deep iterative matching for 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018) 11
work page 2018
-
[40]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Li, Z., Wang, G., Ji, X.: Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
work page 2019
-
[41]
Li, Z., Stamos, I.: Depth-based 6dof object pose estimation using swin transformer. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2023)
work page 2023
-
[42]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Lin, J., Liu, L., Lu, D., Jia, K.: Sam-6d: Segment anything model meets zero-shot 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
work page 2024
-
[43]
In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)
Lin, J., Wei, Z., Ding, C., Jia, K.: Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)
work page 2022
-
[44]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y .: Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
work page 2021
-
[45]
In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)
Lin, Y ., Tremblay, J., Tyree, S., Vela, P.A., Birchfield, S.: Single-stage keypoint-based category-level object pose estimation from an rgb image. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2022)
work page 2022
-
[46]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Lipson, L., Teed, Z., Goyal, A., Deng, J.: Coupled iterative refinement for 6d multi-object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
work page 2022
-
[47]
Lissitz, R.W., Schönemann, P.H., Lingoes, J.C.: A solution to the weighted procrustes problem in which the transformation is in agreement with the loss function. Psychometrika (1976)
work page 1976
-
[48]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
Liu, M., Li, S., Chhatkuli, A., Truong, P., Van Gool, L., Tombari, F.: One2any: One-reference 6d pose estimation for any object. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
work page 2025
-
[49]
In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)
Liu, X., Wang, G., Li, Y ., Ji, X.: Catre: Iterative point clouds alignment for category-level object pose refinement. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)
work page 2022
-
[50]
In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)
Liu, Y ., Wen, Y ., Peng, S., Lin, C., Long, X.F., Komura, T., Wang, W.: Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)
work page 2022
-
[51]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Liu, Y ., Liu, Y ., Jiang, C., Lyu, K., Wan, W., Shen, H., Liang, B., Fu, Z., Wang, H., Yi, L.: Hoi4d: A 4d egocentric dataset for category-level human-object interaction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
work page 2022
-
[52]
Decoupled Weight Decay Regularization
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[53]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
Moon, S., Son, H., Hur, D., Kim, S.: Co-op: Correspondence-based novel object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
work page 2025
-
[54]
arXiv preprint arXiv:2512.15508 (2025)
Moreau, A., Shaw, R., Nazarczuk, M., Shin, J., Tanay, T., Zhang, Z., Xu, S., Pérez-Pellitero, E.: Off the grid: Detection of primitives for feed-forward 3d gaussian splatting. arXiv preprint arXiv:2512.15508 (2025)
-
[55]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Mousavian, A., Eppner, C., Fox, D.: 6-dof graspnet: Variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
work page 2019
-
[56]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Nguyen, V .N., Groueix, T., Salzmann, M., Lepetit, V .: Gigapose: Fast and robust novel object pose estimation via one correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
work page 2024
-
[57]
Oquab, M., Darcet, T., Moutakanni, T., et al.: Dinov2: Learning robust visual features without supervision (2024)
work page 2024
-
[58]
In: Proceedings of the European Conference on Computer Vision (ECCV) (2024)
Örnek, E.P., Labbé, Y ., Tekin, B., Ma, L., Keskin, C., Forster, C., Hodan, T.: Foundpose: Unseen object pose estimation with foundation features. In: Proceedings of the European Conference on Computer Vision (ECCV) (2024)
work page 2024
-
[59]
Park, K., Mousavian, A., Xiang, Y ., Fox, D.: Latentfusion: End-to-end differentiable reconstruction and rendering for unseen object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020) 12
work page 2020
-
[60]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Park, K., Patten, T., Vincze, M.: Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
work page 2019
-
[61]
In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2017)
Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-dof object pose from semantic keypoints. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2017)
work page 2017
-
[62]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Peng, S., Liu, Y ., Huang, Q., Zhou, X., Bao, H.: Pvnet: Pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
work page 2019
-
[63]
Ponimatkin, G., Cífka, M., Souˇcek, T., Fourmy, M., Labbé, Y ., Petrik, V ., Sivic, J.: 6d object pose tracking in internet videos for robotic manipulation. arXiv preprint (2025)
work page 2025
-
[64]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
Potamias, R.A., Zhang, J., Deng, J., Zafeiriou, S.: Wilor: End-to-end 3d hand localization and recon- struction in-the-wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
work page 2025
-
[65]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)
Rad, M., Lepetit, V .: Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2017)
work page 2017
-
[66]
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: Modeling and capturing hands and bodies together. arXiv preprint (2022)
work page 2022
-
[67]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Shugurov, I., Li, F., Busam, B., Ilic, S.: Osop: A multi-stage one shot object pose estimation framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
work page 2022
-
[68]
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021)
Shugurov, I., Zakharov, S., Ilic, S.: Dpodv2: Dense correspondence-based 6 dof pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021)
work page 2021
-
[69]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Song, C., Song, J., Huang, Q.: Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
work page 2020
-
[70]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
Song, X., Jin, L., Zhang, Z., Li, J., Zhong, F., Zhang, G., Qin, X.: Prior-free 3d object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
work page 2025
-
[71]
In: 2012 IEEE/RSJ international conference on intelligent robots and systems
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of rgb-d slam systems. In: 2012 IEEE/RSJ international conference on intelligent robots and systems. pp. 573–580. IEEE (2012)
work page 2012
-
[72]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Su, Y ., Saleh, M., Fetzer, T., Rambach, J., Navab, N., Busam, B., Stricker, D., Tombari, F.: Zebrapose: Coarse to fine surface encoding for 6dof object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
work page 2022
-
[73]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Sun, J., Wang, Z., Zhang, S., He, X., Zhao, H., Zhang, G., Zhou, X.: Onepose: One-shot object pose estimation without cad models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
work page 2022
-
[74]
In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Tian, M., Ang Jr, M.H., Lee, G.H.: Shape prior deformation for categorical 6d object pose and size estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
work page 2020
-
[75]
IEEE Transactions on pattern analysis and machine intelligence13(4), 376–380 (2002)
Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on pattern analysis and machine intelligence13(4), 376–380 (2002)
work page 2002
-
[76]
In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2020)
Wang, C., Martín-Martín, R., Xu, D., Lv, J., Lu, C., Fei-Fei, L., Savarese, S., Zhu, Y .: 6-pack: Category- level 6d pose tracker with anchor-based keypoints. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2020)
work page 2020
-
[77]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Wang, C., Xu, D., Zhu, Y ., Martín-Martín, R., Lu, C., Fei-Fei, L., Savarese, S.: Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
work page 2019
-
[78]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Wang, G., Manhardt, F., Tombari, F., Ji, X.: Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
work page 2021
-
[79]
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019) 13
work page 2019
-
[80]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.