Pose Anything Anywhere:Model-free Object Poses from Arbitrary References
Pith reviewed 2026-06-26 09:15 UTC · model grok-4.3
The pith
A multi-view transformer allows model-free 6D pose estimation from arbitrary sparse reference views.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that moving beyond pairwise matching with a multi-view transformer geometry backbone that learns stable view-consistent geometry and cross-view alignment cues enables accurate model-free 6D pose estimation for novel objects from arbitrary single or sparse references, with further improvement from pose-graph canonical registration of assist views.
What carries the argument
multi-view transformer geometry backbone learning view-consistent geometry and cross-view alignment cues stable under wide baselines and limited overlap
If this is right
- Outperforms existing model-free methods with +12% pose accuracy on YCB-V and over +20% on LM-O.
- Consistently performs well in both single-reference and sparse-reference settings.
- Handles both RGB and RGB-D inputs seamlessly.
- Aggregates unposed assist views to increase geometric coverage when available.
- Generalizes to novel objects in real-world environments with robustness to occlusion and viewpoint changes.
Where Pith is reading between the lines
- Such view-consistent learning could be applied to other multi-view tasks like dense reconstruction.
- A natural extension would be testing on dynamic scenes or with moving objects.
- The method may lower barriers for deploying pose estimation in unstructured environments without pre-scanned models.
Load-bearing premise
The multi-view transformer geometry backbone can learn view-consistent geometry and cross-view alignment cues that remain stable under wide baselines and limited overlap.
What would settle it
Running PANY on a new benchmark consisting of object instances with query-reference image pairs that have very low overlap and wide baselines, and observing no improvement over pairwise model-free baselines, would falsify the key mechanism.
Figures
read the original abstract
Estimating the 6D pose of unseen objects is a fundamental yet challenging problem for open-world robotics and embodied perception. Model-based methods are accurate but depend on CAD assets or heavy onboarding, while most model-free approaches are still limited to pairwise single-anchor matching and thus fail under occlusion and large viewpoint changes with low query-reference overlap. Therefore, we present PANY, a unified model-free framework that seamlessly supports both RGB and RGB-D inputs, operates on one or sparse pose-free reference views, and generalizes effectively to novel objects. Built on a multi-view transformer geometry backbone, PANY moves beyond pairwise matching by learning view-consistent geometry and cross-view alignment cues that remain stable under wide baselines and limited overlap. When additional unposed assist views are available, PANY aggregates them via pose-graph canonical registration to increase geometric coverage and reinforce the final pose. Extensive experiments show that PANY achieves state-of-the-art performance across multiple benchmarks, substantially outperforming existing model-free methods, improving pose accuracy by +12% on YCB-V and over +20% on LM-O. Furthermore, PANY consistently performs well under both single-reference and sparse-reference settings, demonstrating strong robustness in real-world environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents PANY, a unified model-free framework for 6D pose estimation of unseen objects. It employs a multi-view transformer geometry backbone to learn view-consistent geometry and cross-view alignment cues that remain stable under wide baselines and limited overlap, supporting RGB/RGB-D inputs, single or sparse pose-free reference views, and optional pose-graph canonical registration for additional assist views. Extensive experiments are reported to demonstrate state-of-the-art performance, with claimed improvements of +12% on YCB-V and over +20% on LM-O relative to prior model-free methods, along with robustness in single- and sparse-reference settings.
Significance. If the quantitative results hold under rigorous evaluation, the work would advance model-free pose estimation by addressing key limitations of pairwise matching approaches in handling occlusion, large viewpoint changes, and low overlap. This has clear relevance for open-world robotics and embodied perception. The multi-view transformer design for geometry learning is a promising direction that could generalize beyond the evaluated benchmarks.
major comments (2)
- [Abstract] Abstract: the central SOTA claim rests on specific quantitative gains (+12% on YCB-V, +20% on LM-O) but supplies no information on the precise metrics (e.g., ADD-S, AUC), baseline methods, dataset splits, number of runs, or error bars. This information is load-bearing for assessing whether the reported improvements are statistically meaningful and reproducible.
- The description of the multi-view transformer geometry backbone states that it 'learns view-consistent geometry and cross-view alignment cues that remain stable under wide baselines and limited overlap,' yet the provided text contains no architectural equations, loss formulations, or ablation results isolating this mechanism from standard transformer components. Without these details the generalization claim cannot be verified.
minor comments (1)
- [Abstract] The abstract mentions 'multiple benchmarks' but only names YCB-V and LM-O; a complete list of evaluated datasets and settings (single-reference vs. sparse-reference) would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment point by point below, indicating planned revisions to improve clarity and verifiability while preserving the manuscript's technical content.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central SOTA claim rests on specific quantitative gains (+12% on YCB-V, +20% on LM-O) but supplies no information on the precise metrics (e.g., ADD-S, AUC), baseline methods, dataset splits, number of runs, or error bars. This information is load-bearing for assessing whether the reported improvements are statistically meaningful and reproducible.
Authors: We agree the abstract would benefit from greater specificity on the evaluation protocol. In the revised version we will expand the relevant sentence to state that gains are measured in ADD-S AUC on the standard YCB-V and LM-O splits, relative to the strongest prior model-free baselines, with full per-method numbers, run counts, and error bars reported in Section 4. This keeps the abstract concise while making the central claim self-contained. revision: yes
-
Referee: The description of the multi-view transformer geometry backbone states that it 'learns view-consistent geometry and cross-view alignment cues that remain stable under wide baselines and limited overlap,' yet the provided text contains no architectural equations, loss formulations, or ablation results isolating this mechanism from standard transformer components. Without these details the generalization claim cannot be verified.
Authors: Section 3.1 already defines the multi-view transformer with explicit equations for the geometry backbone and cross-view attention (Eqs. 2–5), Section 3.3 gives the composite loss (Eq. 7), and Section 4.3 presents ablations (Table 4) that isolate the alignment module. We will add direct parenthetical references to these equations and the table immediately after the descriptive sentence in the revised manuscript so that the support for the generalization claim is immediately verifiable. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper describes an empirical multi-view transformer framework for model-free 6D pose estimation, with claims resting on concrete benchmark gains (+12% YCB-V, +20% LM-O) rather than any closed derivation. No equations, fitted parameters, or self-referential definitions appear that would reduce outputs to inputs by construction. The core mechanism (view-consistent geometry cues stable under wide baselines) is presented as a learned property validated externally on standard datasets, without self-citation load-bearing steps, uniqueness theorems, or ansatzes that collapse to the target result. The work is self-contained against independent benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: 2024 IEEE In- ternational Conference on Robotics and Automation (ICRA)
Ausserlechner, P., Haberger, D., Thalhammer, S., Weibel, J.B., Vincze, M.: Zs6d: Zero-shot 6d object pose estimation using vision transformers. In: 2024 IEEE In- ternational Conference on Robotics and Automation (ICRA). pp. 463–469. IEEE (2024)
2024
-
[2]
In: European conference on computer vision
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learn- ing 6d object pose estimation using 3d object coordinates. In: European conference on computer vision. pp. 536–551. Springer (2014)
2014
-
[3]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Brachmann, E., Rother, C.: Learning less is more-6d camera localization via 3d surface regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4654–4662 (2018)
2018
-
[4]
In: 2025 International Conference on 3D Vision (3DV)
Cai, D., Heikkilä, J., Rahtu, E.: Gs-pose: Generalizable segmentation-based 6d ob- ject pose estimation with 3d gaussian splatting. In: 2025 International Conference on 3D Vision (3DV). pp. 1001–1011. IEEE (2025)
2025
-
[5]
In: European Conference on Computer Vision
Caraffa, A., Boscaini, D., Hamza, A., Poiesi, F.: Freeze: Training-free zero-shot 6d pose estimation with geometric and vision foundation models. In: European Conference on Computer Vision. pp. 414–431. Springer (2024)
2024
-
[6]
arXiv preprint arXiv:2305.179342(6), 7 (2023)
Chen, J., Sun, M., Bao, T., Zhao, R., Wu, L., He, Z.: Zeropose: Cad-model-based zero-shot pose estimation. arXiv preprint arXiv:2305.179342(6), 7 (2023)
arXiv 2023
-
[7]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Corsetti, J., Boscaini, D., Oh, C., Cavallaro, A., Poiesi, F.: Open-vocabulary object 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18071–18080 (2024)
2024
-
[8]
In: 2024 IEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS)
Di Felice, F., Remus, A., Gasperini, S., Busam, B., Ott, L., Tombari, F., Siegwart, R., Avizzano, C.A.: Zero123-6d: Zero-shot novel view synthesis for rgb category- level 6d pose estimation. In: 2024 IEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS). pp. 14204–14211. IEEE (2024)
2024
-
[9]
International journal of computer assisted radiology and surgery11(9), 1561–1571 (2016)
Esposito, M., Busam, B., Hennersperger, C., Rackerseder, J., Navab, N., Frisch, B.: Multimodal us–gamma imaging using collaborative robotics for cancer staging biopsies. International journal of computer assisted radiology and surgery11(9), 1561–1571 (2016)
2016
-
[10]
arXiv preprint arXiv:2509.07978 (2025)
Geng, Z., Wang, N., Xu, S., Ye, C., Li, B., Chen, Z., Peng, S., Zhao, H.: One view, many worlds: Single-image to 3d object meets generative domain randomization for one-shot 6d pose estimation. arXiv preprint arXiv:2509.07978 (2025)
arXiv 2025
-
[11]
In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition
Gümeli, C., Dai, A., Nießner, M.: Objectmatch: Robust registration using canoni- cal object correspondences. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 13082–13091 (2023)
2023
-
[12]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Haugaard,R.L.,Buch,A.G.:Surfemb:Denseandcontinuouscorrespondencedistri- butions for object pose estimation with learnt surface embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6749–6758 (2022)
2022
-
[13]
Advances in Neural In- formation Processing Systems35, 35103–35115 (2022)
He, X., Sun, J., Wang, Y., Huang, D., Bao, H., Zhou, X.: Onepose++: Keypoint- free one-shot object pose estimation without cad models. Advances in Neural In- formation Processing Systems35, 35103–35115 (2022)
2022
-
[14]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
He, Y., Wang, Y., Fan, H., Sun, J., Chen, Q.: Fs6d: Few-shot 6d pose estimation of novel objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6814–6824 (2022)
2022
-
[15]
In: 2011 international conference on computer vision
Hinterstoisser,S.,Holzer,S.,Cagniart,C.,Ilic,S.,Konolige,K.,Navab,N.,Lepetit, V.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 international conference on computer vision. pp. 858–865. IEEE (2011) 16 H. Xu et al
2011
-
[16]
In: Proceedings of the European conference on computer vision (ECCV)
Hodan, T., Michel, F., Brachmann, E., Kehl, W., GlentBuch, A., Kraft, D., Drost, B., Vidal, J., Ihrke, S., Zabulis, X., et al.: Bop: Benchmark for 6d object pose esti- mation. In: Proceedings of the European conference on computer vision (ECCV). pp. 19–34 (2018)
2018
-
[17]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Hodan, T., Sundermeyer, M., Labbe, Y., Nguyen, V.N., Wang, G., Brachmann, E., Drost, B., Lepetit, V., Rother, C., Matas, J.: Bop challenge 2023 on detection segmentation and pose estimation of seen and unseen rigid objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5610–5619 (2024)
2023
-
[18]
Iclr1(2), 3 (2022)
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. Iclr1(2), 3 (2022)
2022
-
[19]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Huang, J., Vutukur, S.R., Yu, P.K., Navab, N., Ilic, S., Busam, B.: Raypose: Ray bundling diffusion for template views in unseen 6d object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9102–9112 (2025)
2025
-
[20]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Huang, J., Yu, H., Yu, K.T., Navab, N., Ilic, S., Busam, B.: Matchu: Matching unseen objects for 6d pose estimation from rgb-d images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10095– 10105 (2024)
2024
-
[21]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Jin, Y., Prasad, V., Jauhri, S., Franzius, M., Chalvatzaki, G.: 6dope-gs: Online 6d object pose estimation using gaussian splatting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8032–8043 (2025)
2025
-
[22]
ACM Trans
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G., et al.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023)
2023
-
[23]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Kim, J., Park, J., Lee, K., Cho, N.I.: Refpose: Leveraging reference geometric correspondences for accurate 6d pose estimation of unseen objects. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6447–6456 (2025)
2025
-
[24]
arXiv preprint arXiv:1412.6980 (2014)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Pith/arXiv arXiv 2014
-
[25]
arXiv preprint arXiv:2212.06870 (2022)
Labbé, Y., Manuelli, L., Mousavian, A., Tyree, S., Birchfield, S., Tremblay, J., Carpentier, J., Aubry, M., Fox, D., Sivic, J.: Megapose: 6d pose estimation of novel objects via render & compare. arXiv preprint arXiv:2212.06870 (2022)
arXiv 2022
-
[26]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Lee, T., Wen, B., Kang, M., Kang, G., Kweon, I.S., Yoon, K.J.: Any6d: Model-free 6d pose estimation of novel objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11633–11643 (2025)
2025
-
[27]
In: European conference on computer vision
Leroy, V., Cabon, Y., Revaud, J.: Grounding image matching in 3d with mast3r. In: European conference on computer vision. pp. 71–91. Springer (2024)
2024
-
[28]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Li, F., Vutukur, S.R., Yu, H., Shugurov, I., Busam, B., Yang, S., Ilic, S.: Nerf- pose: A first-reconstruct-then-regress approach for weakly-supervised 6d object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 2123–2133 (2023)
2023
-
[29]
In: Proceedings of the European conference on computer vision (ECCV)
Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: Deepim: Deep iterative matching for 6d pose estimation. In: Proceedings of the European conference on computer vision (ECCV). pp. 683–698 (2018)
2018
-
[30]
In: 2024 International Conference on 3D Vision (3DV)
Lin, A., Zhang, J.Y., Ramanan, D., Tulsiani, S.: Relpose++: Recovering 6d poses from sparse-view observations. In: 2024 International Conference on 3D Vision (3DV). pp. 106–115. IEEE (2024)
2024
-
[31]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Lin, J., Liu, L., Lu, D., Jia, K.: Sam-6d: Segment anything model meets zero- shot 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 27906–27916 (2024) Pose Anything Anywhere 17
2024
-
[32]
In: European Conference on Computer Vision
Lin, J., Wei, Z., Ding, C., Jia, K.: Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks. In: European Conference on Computer Vision. pp. 19–34. Springer (2022)
2022
-
[33]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Liu, M., Li, S., Chhatkuli, A., Truong, P., Van Gool, L., Tombari, F.: One2any: One-reference 6d pose estimation for any object. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6457–6467 (2025)
2025
-
[34]
In: European Conference on Computer Vision
Liu, Y., Wen, Y., Peng, S., Lin, C., Long, X., Komura, T., Wang, W.: Gen6d: Gen- eralizable model-free 6-dof object pose estimation from rgb images. In: European Conference on Computer Vision. pp. 298–315. Springer (2022)
2022
-
[35]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Long, X., Guo, Y.C., Lin, C., Liu, Y., Dou, Z., Liu, L., Ma, Y., Zhang, S.H., Habermann, M., Theobalt, C., et al.: Wonder3d: Single image to 3d using cross- domain diffusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9970–9980 (2024)
2024
-
[36]
In: European Conference on Computer Vision
Matteo, B., Tsesmelis, T., James, S., Poiesi, F., Del Bue, A.: 6dgs: 6d pose es- timation from a single image and a 3d gaussian splatting model. In: European Conference on Computer Vision. pp. 420–436. Springer (2024)
2024
-
[37]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Moon, S., Son, H., Hur, D., Kim, S.: Genflow: Generalizable recurrent flow for 6d pose refinement of novel objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10039–10049 (2024)
2024
-
[38]
In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition
Nguyen, V.N., Groueix, T., Ponimatkin, G., Hu, Y., Marlet, R., Salzmann, M., Lepetit, V.: Nope: Novel object pose estimation from a single image. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17923–17932 (2024)
2024
-
[39]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Nguyen, V.N., Groueix, T., Ponimatkin, G., Lepetit, V., Hodan, T.: Cnos: A strong baseline for cad-based novel object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 2134–2140 (2023)
2023
-
[40]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Nguyen, V.N., Groueix, T., Salzmann, M., Lepetit, V.: Gigapose: Fast and ro- bust novel object pose estimation via one correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9903– 9913 (2024)
2024
-
[41]
arXiv preprint arXiv:2304.07193 (2023)
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)
Pith/arXiv arXiv 2023
-
[42]
In: European Conference on Computer Vision
Örnek, E.P., Labbé, Y., Tekin, B., Ma, L., Keskin, C., Forster, C., Hodan, T.: Foundpose: Unseen object pose estimation with foundation features. In: European Conference on Computer Vision. pp. 163–182. Springer (2024)
2024
-
[43]
In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition
Park, K., Mousavian, A., Xiang, Y., Fox, D.: Latentfusion: End-to-end differen- tiable reconstruction and rendering for unseen object pose estimation. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10710–10719 (2020)
2020
-
[44]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Qin, Z., Yu, H., Wang, C., Guo, Y., Peng, Y., Xu, K.: Geometric transformer for fast and robust point cloud registration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11143–11152 (2022)
2022
-
[45]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: Learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4938–4947 (2020)
2020
-
[46]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Shugurov, I., Li, F., Busam, B., Ilic, S.: Osop: A multi-stage one shot object pose estimation framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6835–6844 (2022) 18 H. Xu et al
2022
-
[47]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: Detector-free local fea- ture matching with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8922–8931 (2021)
2021
-
[48]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Sun, J., Wang, Z., Zhang, S., He, X., Zhao, H., Zhang, G., Zhou, X.: Onepose: One- shot object pose estimation without cad models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6825–6834 (2022)
2022
-
[49]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Sundermeyer, M., Hodaň, T., Labbe, Y., Wang, G., Brachmann, E., Drost, B., Rother, C., Matas, J.: Bop challenge 2022 on detection, segmentation and pose estimation of specific rigid objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2785–2794 (2023)
2022
-
[50]
arXiv preprint arXiv:2310.00463 (2023)
Tremblay, J., Wen, B., Blukis, V., Sundaralingam, B., Tyree, S., Birchfield, S.: Diff- dope: Differentiable deep object pose estimation. arXiv preprint arXiv:2310.00463 (2023)
arXiv 2023
-
[51]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Wang, G., Manhardt, F., Tombari, F., Ji, X.: Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16611– 16621 (2021)
2021
-
[52]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 2642–2651 (2019)
2019
-
[53]
Advances in Neural Information Processing Systems38, 25400–25437 (2026)
Wang, J., Zhu, H., Guo, H., Al Mamun, A., Xiang, C., Lee, T.H.: Singref6d: Monoc- ular novel object pose estimation with a single rgb reference. Advances in Neural Information Processing Systems38, 25400–25437 (2026)
2026
-
[54]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5294–5306 (2025)
2025
-
[55]
In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition
Wang, J., Karaev, N., Rupprecht, C., Novotny, D.: Vggsfm: Visual geometry grounded deep structure from motion. In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. pp. 21686–21697 (2024)
2024
-
[56]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Wang, J., Rupprecht, C., Novotny, D.: Posediffusion: Solving pose estimation via diffusion-aided bundle adjustment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9773–9783 (2023)
2023
-
[57]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20697–20709 (2024)
2024
-
[58]
In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Wen, B., Bekris, K.: Bundletrack: 6d pose tracking for novel objects without in- stance or category-level 3d models. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 8067–8074. IEEE (2021)
2021
-
[59]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Wen, B., Tremblay, J., Blukis, V., Tyree, S., Müller, T., Evans, A., Fox, D., Kautz, J., Birchfield, S.: Bundlesdf: Neural 6-dof tracking and 3d reconstruction of un- known objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 606–617 (2023)
2023
-
[60]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Wen, B., Yang, W., Kautz, J., Birchfield, S.: Foundationpose: Unified 6d pose esti- mation and tracking of novel objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17868–17879 (2024)
2024
-
[61]
arXiv preprint arXiv:1711.00199 (2017) Pose Anything Anywhere 19
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: A convolutional neu- ral network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017) Pose Anything Anywhere 19
Pith/arXiv arXiv 2017
-
[62]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Yang, J., Sax, A., Liang, K.J., Henaff, M., Tang, H., Cao, A., Chai, J., Meier, F., Feiszli, M.: Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 21924–21935 (2025)
2025
-
[63]
In: International Conference on Learning Representations
Zhang, J., Lin, A., Kumar, M., Yang, T.H., Ramanan, D., Tulsiani, S.: Cameras as rays: Pose estimation via ray diffusion. In: International Conference on Learning Representations. vol. 2024, pp. 23345–23366 (2024)
2024
-
[64]
In: European Conference on Computer Vision
Zhang, J.Y., Ramanan, D., Tulsiani, S.: Relpose: Predicting probabilistic relative rotation for single objects in the wild. In: European Conference on Computer Vision. pp. 592–611. Springer (2022)
2022
-
[65]
In: European Conference on Computer Vision
Zhang, J., Huang, W., Peng, B., Wu, M., Hu, F., Chen, Z., Zhao, B., Dong, H.: Omni6dpose: A benchmark and model for universal 6d object pose estimation and tracking. In: European Conference on Computer Vision. pp. 199–216. Springer (2024)
2024
-
[66]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhao, C., Zhang, T., Dang, Z., Salzmann, M.: Dvmnet: Computing relative pose for unseen objects beyond hypotheses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20485–20495 (2024)
2024
-
[67]
In: International Conference on Learning Representations
Zhao, C., Zhang, T., Salzmann, M.: 3d-aware hypothesis & verification for gener- alizable relative object pose estimation. In: International Conference on Learning Representations. vol. 2024, pp. 45563–45578 (2024)
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.