AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment
Pith reviewed 2026-05-22 12:28 UTC · model grok-4.3
The pith
AlignPose estimates a single consistent 6D object pose from multiple extrinsically calibrated RGB views by minimizing feature discrepancies between rendered and observed images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AlignPose aggregates information from multiple extrinsically calibrated RGB views and produces a single consistent world-frame object pose without object-specific training or symmetry annotation. Its key component is a multi-view feature-metric refinement that optimizes the pose by minimizing the feature discrepancy between on-the-fly rendered object features and the observed image features across all views at the same time.
What carries the argument
Multi-view feature-metric refinement: a procedure that optimizes one shared world-frame object pose by minimizing feature discrepancy between rendered object features and observed image features from every calibrated view simultaneously.
If this is right
- The approach generalizes to objects never seen during training because no per-object model is required.
- Performance improves most on industrial datasets where multiple calibrated views are already present in the capture setup.
- It reduces the impact of single-view failures such as depth ambiguity and heavy occlusions by enforcing consistency across views.
- The same refinement procedure can be applied on top of any initial pose estimates obtained from single-view networks.
Where Pith is reading between the lines
- Jointly estimating camera poses along with the object pose could remove the need for pre-calibration in less controlled settings.
- The feature-metric objective may transfer to other multi-view tasks such as scene reconstruction or dynamic object tracking.
- Replacing the current feature extractor with stronger self-supervised backbones could further improve accuracy on textureless industrial parts.
Load-bearing premise
The input views must be extrinsically calibrated with known relative camera poses.
What would settle it
Running the method on a multi-view dataset where relative camera poses are deliberately perturbed or withheld would show whether the reported accuracy gains require precise extrinsic calibration.
Figures
read the original abstract
Single-view RGB model-based object pose estimation methods achieve strong generalization but are fundamentally limited by depth ambiguity, clutter, and occlusions. Multi-view pose estimation methods have the potential to solve these issues, but existing works rely on precise single-view pose estimates or lack generalization to unseen objects. We address these challenges via the following three contributions. First, we introduce AlignPose, a 6D object pose estimation method that aggregates information from multiple extrinsically calibrated RGB views and does not require any object-specific training or symmetry annotation. Second, the key component of this approach is a new multi-view feature-metric refinement specifically designed for object pose. It optimizes a single, consistent world-frame object pose by minimizing the feature discrepancy between on-the-fly rendered object features and observed image features across all views simultaneously. Third, we report extensive experiments on six datasets (YCB-V, T-LESS, HouseCat6D, ITODD-MV, IPD, XYZ-IBD) using the BOP benchmark evaluation and show that AlignPose outperforms other published methods, especially on challenging industrial datasets where multiple views are readily available in practice.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AlignPose, a generalizable 6D object pose estimation approach that aggregates information from multiple extrinsically calibrated RGB views without requiring object-specific training or symmetry annotations. Its core contribution is a multi-view feature-metric refinement procedure that optimizes a single consistent world-frame pose by simultaneously minimizing feature discrepancies between on-the-fly rendered object features and observed image features across all views. The authors report extensive BOP-benchmark experiments on six datasets (YCB-V, T-LESS, HouseCat6D, ITODD-MV, IPD, XYZ-IBD) showing outperformance over prior published methods, with the largest gains on challenging industrial datasets.
Significance. If the central claims hold, the work provides a practical route to multi-view pose estimation that improves robustness to depth ambiguity, clutter, and occlusions while remaining generalizable to unseen objects. The scale of the evaluation—six datasets under the standard BOP protocol—strengthens the empirical case for the multi-view refinement strategy in settings where calibrated views are available.
major comments (1)
- [Abstract] Abstract: The reported outperformance on industrial datasets (ITODD-MV, IPD, XYZ-IBD) rests on the multi-view feature-metric refinement that projects and compares features in a shared world frame. This construction requires perfectly known relative camera poses, yet the manuscript provides no ablation or sensitivity analysis under realistic calibration noise. Even modest errors in the supplied extrinsics could shift the joint discrepancy minimum to an incorrect pose, directly affecting the BOP scores that constitute the central empirical claim.
minor comments (2)
- [Results] Results section: The abstract and experimental summary omit error bars, exact baseline implementations, and the contribution of the refinement step versus the initial single-view estimates; adding these details would improve reproducibility without altering the core claims.
- [Method] Method description: Clarify the precise feature extractor and rendering pipeline used for on-the-fly feature generation, including any hyper-parameters that control the discrepancy minimization.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comment below and will revise the paper accordingly to strengthen the empirical evaluation.
read point-by-point responses
-
Referee: The reported outperformance on industrial datasets (ITODD-MV, IPD, XYZ-IBD) rests on the multi-view feature-metric refinement that projects and compares features in a shared world frame. This construction requires perfectly known relative camera poses, yet the manuscript provides no ablation or sensitivity analysis under realistic calibration noise. Even modest errors in the supplied extrinsics could shift the joint discrepancy minimum to an incorrect pose, directly affecting the BOP scores that constitute the central empirical claim.
Authors: We agree that the multi-view feature-metric alignment in AlignPose operates under the assumption of known extrinsics and that the reported gains on the industrial datasets rely on this. The BOP evaluations use the provided ground-truth calibrations as per the benchmark protocol. We acknowledge that the current manuscript lacks an explicit sensitivity analysis to calibration noise, which is a valid point. In the revised manuscript we will add a dedicated ablation study that perturbs the relative camera poses with realistic levels of Gaussian noise and reports the resulting changes in BOP scores on ITODD-MV, IPD, and XYZ-IBD. This will quantify the method's sensitivity and clarify the practical requirements on calibration accuracy. revision: yes
Circularity Check
No circularity: empirical optimization method with independent benchmark evaluation
full rationale
The paper introduces AlignPose as an algorithmic procedure for multi-view pose refinement that optimizes a single world-frame pose by minimizing feature discrepancy between rendered and observed features across extrinsically calibrated views. This is a standard iterative optimization construction rather than a derivation that reduces to its own inputs by construction. Performance results are reported as empirical comparisons on six external BOP benchmark datasets, not as predictions forced by fitted parameters or self-citations. No load-bearing step in the abstract or described contributions matches the enumerated circularity patterns; the method remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Input views are extrinsically calibrated with known relative camera poses
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
optimizes a single, consistent world-frame object pose by minimizing the feature discrepancy between on-the-fly rendered object features and observed image features across all views simultaneously
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
LC_FE(T_CO) = sum ρ(p_i - F_q(π_C(T_CO x_i)))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Super-fibonacci spirals: Fast, low-discrepancy sampling of so (3)
Marc Alexa. Super-fibonacci spirals: Fast, low-discrepancy sampling of so (3). InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 8291–8300, 2022. 12
work page 2022
-
[2]
Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying framework.International journal of computer vision, 56(3):221–255, 2004. 2
work page 2004
-
[3]
A general and adaptive robust loss function
Jonathan T Barron. A general and adaptive robust loss function. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4331–4339,
-
[4]
Freeze: Training-free zero-shot 6d pose estimation with geometric and vision foundation models
Andrea Caraffa, Davide Boscaini, Amir Hamza, and Fabio Poiesi. Freeze: Training-free zero-shot 6d pose estimation with geometric and vision foundation models. InEuropean Conference on Computer Vision, pages 414–431. Springer,
-
[5]
Clearpose: Large-scale trans- parent object dataset and benchmark
Xiaotong Chen, Huijie Zhang, Zeren Yu, Anthony Opipari, and Odest Chadwicke Jenkins. Clearpose: Large-scale trans- parent object dataset and benchmark. InEuropean confer- ence on computer vision, pages 381–396. Springer, 2022. 1
work page 2022
-
[6]
Introducing mvtec itodd — a dataset for 3d object recognition in industry
Bertram Drost, Markus Ulrich, Paul Bergmann, Philipp H¨artinger, and Carsten Steger. Introducing mvtec itodd — a dataset for 3d object recognition in industry. In2017 IEEE International Conference on Computer Vision Work- shops (ICCVW), pages 2200–2208, 2017. 5
work page 2017
-
[7]
Lsd- slam: Large-scale direct monocular slam
Jakob Engel, Thomas Sch ¨ops, and Daniel Cremers. Lsd- slam: Large-scale direct monocular slam. InEuropean con- ference on computer vision, pages 834–849. Springer, 2014. 2
work page 2014
-
[8]
Integra- tion of probabilistic pose estimates from multiple views
¨Ozg¨ur Erkent, Dadhichi Shukla, and Justus Piater. Integra- tion of probabilistic pose estimates from multiple views. In European Conference on Computer Vision, pages 154–170. Springer, 2016. 2
work page 2016
-
[9]
Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 2
work page 1981
-
[10]
A multi-hypothesis approach to pose ambiguity in object-based slam
Jiahui Fu, Qiangqiang Huang, Kevin Doherty, Yue Wang, and John J Leonard. A multi-hypothesis approach to pose ambiguity in object-based slam. In2021 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), pages 7639–7646. IEEE, 2021. 2
work page 2021
-
[11]
Multi-view object pose estimation from correspon- dence distributions and epipolar geometry
Rasmus Laurvig Haugaard and Thorbjorn Mosekjaer Iversen. Multi-view object pose estimation from correspon- dence distributions and epipolar geometry. In2023 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 1786–1792, 2023. 7, 8
work page 2023
-
[12]
Greg Heinrich, Mike Ranzinger, Hongxu Yin, Yao Lu, Jan Kautz, Andrew Tao, Bryan Catanzaro, and Pavlo Molchanov. Radiov2. 5: Improved baselines for agglomerative vision foundation models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22487–22497,
-
[13]
On evalu- ation of 6d object pose estimation
Tom ´aˇs Hodaˇn, Jiˇr´ı Matas, and ˇStˇep´an Obdrˇz´alek. On evalu- ation of 6d object pose estimation. InEuropean conference on computer vision, pages 606–619. Springer, 2016. 1
work page 2016
-
[14]
Tom ´aˇs Hoda ˇn, Pavel Haluza, ˇStˇep´an Obdrˇz´alek, Jiˇr´ı Matas, Manolis Lourakis, and Xenophon Zabulis. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less ob- jects.IEEE Winter Conference on Applications of Computer Vision (WACV), 2017. 5
work page 2017
-
[15]
Bop: Benchmark for 6d object pose estimation
Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders GlentBuch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis, et al. Bop: Benchmark for 6d object pose estimation. InProceedings of the European conference on computer vision (ECCV), pages 19–34, 2018. 2
work page 2018
-
[16]
Bop challenge 2020 on 6d object localization
Tom ´aˇs Hoda ˇn, Martin Sundermeyer, Bertram Drost, Yann Labb´e, Eric Brachmann, Frank Michel, Carsten Rother, and Jiˇr´ı Matas. Bop challenge 2020 on 6d object localization. InComputer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 577–
work page 2020
-
[17]
Bop challenge 2023 on detection segmentation and pose estimation of seen and unseen rigid objects
Tomas Hodan, Martin Sundermeyer, Yann Labbe, Van Nguyen Nguyen, Gu Wang, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, and Jiri Matas. Bop challenge 2023 on detection segmentation and pose estimation of seen and unseen rigid objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pag...
work page 2023
-
[18]
Michal Irani and Prabu Anandan. About direct methods. In International Workshop on Vision Algorithms, pages 267–
-
[19]
Shun Iwase, Xingyu Liu, Rawal Khirodkar, Rio Yokota, and Kris M. Kitani. Repose: Fast 6d object pose refinement via deep texture rendering. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 3303–3312, 2021. 2
work page 2021
-
[20]
HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp, Guangyao Zhai, Hannah Schieber, Giulia Rizzoli, Pengyuan Wang, Hongcheng Zhao, Lorenzo Garattoni, Sven Meier, Daniel Roth, Nassir Navab, and Benjamin Busam. House- cat6d - a large-scale multi-modal category level 6d object perception dataset with household objects in realistic scenar- ios. InProceedings of the...
work page 2024
-
[21]
6 dof pose estimation of textureless objects from 9 multiple rgb frames
Roman Kaskman, Ivan Shugurov, Sergey Zakharov, and Slo- bodan Ilic. 6 dof pose estimation of textureless objects from 9 multiple rgb frames. InComputer Vision–ECCV 2020 Work- shops: Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 612–630. Springer, 2020. 2
work page 2020
-
[22]
Cosypose: Consistent multi-view multi-object 6d pose estimation
Yann Labb ´e, Justin Carpentier, Mathieu Aubry, and Josef Sivic. Cosypose: Consistent multi-view multi-object 6d pose estimation. InComputer Vision–ECCV 2020: 16th Euro- pean Conference, Glasgow, UK, August 23–28, 2020, Pro- ceedings, Part XVII 16, pages 574–591. Springer, 2020. 1, 2, 5, 7, 8, 14
work page 2020
-
[23]
Megapose: 6d pose estimation of novel objects via render & compare
Yann Labb ´e, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpen- tier, Mathieu Aubry, Dieter Fox, and Josef Sivic. Megapose: 6d pose estimation of novel objects via render & compare. InProceedings of the 6th Conference on Robot Learning (CoRL), 2022. 1, 2, 3, 5, 7, 12, 14
work page 2022
-
[24]
Kenneth Levenberg. A method for the solution of certain non-linear problems in least squares.Quarterly of applied mathematics, 2(2):164–168, 1944. 4
work page 1944
-
[25]
A unified frame- work for multi-view multi-class object pose estimation
Chi Li, Jin Bai, and Gregory D Hager. A unified frame- work for multi-view multi-class object pose estimation. In Proceedings of the european conference on computer vision (eccv), pages 254–269, 2018. 2
work page 2018
-
[26]
Deepim: Deep iterative matching for 6d pose estimation
Yi Li, Gu Wang, Xiangyang Ji, Yu Xiang, and Dieter Fox. Deepim: Deep iterative matching for 6d pose estimation. In Proceedings of the European conference on computer vision (ECCV), pages 683–698, 2018. 2
work page 2018
-
[27]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 4
work page 2014
-
[28]
Pixel-perfect structure-from- motion with featuremetric refinement
Philipp Lindenberger, Paul-Edouard Sarlin, Viktor Lars- son, and Marc Pollefeys. Pixel-perfect structure-from- motion with featuremetric refinement. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5987–5997, 2021. 1, 2
work page 2021
-
[29]
Grounding dino: Marrying dino with grounded pre-training for open-set object detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InEuro- pean conference on computer vision, pages 38–55. Springer,
-
[30]
Xingyu Liu, Ruida Zhang, Chenyangguang Zhang, Gu Wang, Jiwen Tang, Zhigang Li, and Xiangyang Ji. Gdrnpp: A geometry-guided and fully learning-based object pose es- timator.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2
work page 2025
-
[31]
Object recognition from local scale-invariant features
David G Lowe. Object recognition from local scale-invariant features. InProceedings of the seventh IEEE interna- tional conference on computer vision, pages 1150–1157. Ieee, 1999. 8
work page 1999
-
[32]
Adapting pre-trained vision mod- els for novel instance detection and segmentation, 2024
Yangxiao Lu, Jishnu Jaykumar P, Yunhui Guo, Nicholas Ruozzi, and Yu Xiang. Adapting pre-trained vision mod- els for novel instance detection and segmentation, 2024. 5, 12
work page 2024
-
[33]
Donald W Marquardt. An algorithm for least-squares esti- mation of nonlinear parameters.Journal of the society for Industrial and Applied Mathematics, 11(2):431–441, 1963. 4
work page 1963
-
[34]
Co-op: Correspondence-based novel object pose estimation
Sungphill Moon, Hyeontae Son, Dongcheol Hur, and Sang- wook Kim. Co-op: Correspondence-based novel object pose estimation. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 11622–11632, 2025. 5, 13, 14
work page 2025
-
[35]
Cnos: A strong base- line for cad-based novel object segmentation
Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin, Vincent Lepetit, and Tomas Hodan. Cnos: A strong base- line for cad-based novel object segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 2134–2140, 2023. 12
work page 2023
-
[36]
Gigapose: Fast and robust novel object pose estimation via one correspondence
Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann, and Vincent Lepetit. Gigapose: Fast and robust novel object pose estimation via one correspondence. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9903–9913, 2024. 2, 5, 14
work page 2024
-
[37]
Maxime Oquab, Timoth ´ee Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Rus- sell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang- Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nico- las Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patri...
work page 2023
-
[38]
Found- pose: Unseen object pose estimation with foundation fea- tures
Evin Pınar ¨Ornek, Yann Labb ´e, Bugra Tekin, Lingni Ma, Cem Keskin, Christian Forster, and Tomas Hodan. Found- pose: Unseen object pose estimation with foundation fea- tures. InEuropean Conference on Computer Vision, pages 163–182. Springer, 2024. 1, 2, 3, 5, 8, 12, 13, 14
work page 2024
-
[39]
Learning general and dis- tinctive 3d local deep descriptors for point cloud registration
Fabio Poiesi and Davide Boscaini. Learning general and dis- tinctive 3d local deep descriptors for point cloud registration. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(3):3979–3985, 2022. 2
work page 2022
-
[40]
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. 12
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[41]
Kornia: an open source differentiable computer vision library for pytorch
Edgar Riba, Dmytro Mishkin, Daniel Ponsa, Ethan Rublee, and Gary Bradski. Kornia: an open source differentiable computer vision library for pytorch. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3674–3683, 2020. 13
work page 2020
-
[42]
Slam++: Si- multaneous localisation and mapping at the level of objects
Renato F Salas-Moreno, Richard A Newcombe, Hauke Strasdat, Paul HJ Kelly, and Andrew J Davison. Slam++: Si- multaneous localisation and mapping at the level of objects. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1352–1359, 2013. 2
work page 2013
-
[43]
Back to the feature: Learning robust camera localization from pixels to pose
Paul-Edouard Sarlin, Ajaykumar Unagar, Mans Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, et al. Back to the feature: Learning robust camera localization from pixels to pose. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3247–3257, 2021. 1,...
work page 2021
-
[44]
Feature-metric loss for self-supervised learning of depth and egomotion
Chang Shu, Kun Yu, Zhixiang Duan, and Kuiyuan Yang. Feature-metric loss for self-supervised learning of depth and egomotion. InEuropean Conference on Computer Vision, pages 572–588. Springer, 2020. 2
work page 2020
-
[45]
Ivan Shugurov, Ivan Pavlov, Sergey Zakharov, and Slobodan Ilic. Multi-view object pose refinement with differentiable renderer.IEEE Robotics and Automation Letters, 6(2):2579– 2586, 2021. 2
work page 2021
-
[46]
Ivan Shugurov, Sergey Zakharov, and Slobodan Ilic. Dpodv2: Dense correspondence-based 6 dof pose estima- tion.IEEE transactions on pattern analysis and machine intelligence, 44(11):7417–7435, 2021. 1, 7, 8
work page 2021
-
[47]
Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 5, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[48]
Multi-view 6d object pose estimation and camera motion planning using rgbd images
Juil Sock, S Hamidreza Kasaei, Luis Seabra Lopes, and Tae- Kyun Kim. Multi-view 6d object pose estimation and camera motion planning using rgbd images. InProceedings of the IEEE International Conference on Computer Vision Work- shops, pages 2228–2235, 2017. 2
work page 2017
-
[49]
Fit-ngp: Fitting object models to neural graphics primitives
Marwan Taher, Ignacio Alzugaray, and Andrew J Davison. Fit-ngp: Fitting object models to neural graphics primitives. In2024 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 18186–18192. IEEE, 2024. 2
work page 2024
-
[50]
The unreasonable effectiveness of pre- trained features for camera pose refinement
Gabriele Trivigno, Carlo Masone, Barbara Caputo, and Torsten Sattler. The unreasonable effectiveness of pre- trained features for camera pose refinement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 12786–12798, 2024. 2
work page 2024
-
[51]
Bop challenge 2024 on model-based and model-free 6d object pose estimation.CoRR, 2025
Nguyen Van Nguyen, Stephen Tyree, Andrew Guo, Med- eric Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay, Eric Brachmann, et al. Bop challenge 2024 on model-based and model-free 6d object pose estimation.CoRR, 2025. 2, 5
work page 2024
-
[52]
Lm-reloc: Levenberg-marquardt based direct vi- sual relocalization
Lukas von Stumberg, Patrick Wenzel, Nan Yang, and Daniel Cremers. Lm-reloc: Levenberg-marquardt based direct vi- sual relocalization. In2020 International Conference on 3D Vision (3DV), pages 968–977. IEEE Computer Society,
-
[53]
Morefusion: Multi-object reasoning for 6d pose estimation from volumetric fusion
Kentaro Wada, Edgar Sucar, Stephen James, Daniel Lenton, and Andrew J Davison. Morefusion: Multi-object reasoning for 6d pose estimation from volumetric fusion. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14540–14549, 2020. 2
work page 2020
-
[54]
Normalized object coordinate space for category-level 6d object pose and size estimation
He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, and Leonidas J Guibas. Normalized object coordinate space for category-level 6d object pose and size estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2642–2651,
-
[55]
Foundationpose: Unified 6d pose estimation and tracking of novel objects
Bowen Wen, Wei Yang, Jan Kautz, and Stan Birchfield. Foundationpose: Unified 6d pose estimation and tracking of novel objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17868– 17879, 2024. 2
work page 2024
-
[56]
Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes
Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. InProceedings of Robotics: Science and Systems (RSS), 2018. 2, 5
work page 2018
-
[57]
6d pose estimation for textureless objects on rgb frames using multi-view optimization
Jun Yang, Wenjie Xue, Sahar Ghavidel, and Steven L Waslander. 6d pose estimation for textureless objects on rgb frames using multi-view optimization. In2023 IEEE in- ternational conference on robotics and automation (ICRA), pages 2905–2912. IEEE, 2023. 2
work page 2023
-
[58]
Kateryna Zorina, V ojtech Priban, Mederic Fourmy, Josef Sivic, and Vladimir Petrik. Temporally consistent object 6d pose estimation for robot control.IEEE Robotics and Au- tomation Letters, 2024. 2 11 Appendix This appendix contains additional implementation details for our method (Sec. A) and supplementary experimental results supporting our design choic...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.