Recognition: unknown
MAPRPose: Mask-Aware Proposal and Amodal Refinement for Multi-Object 6D Pose Estimation
Pith reviewed 2026-05-09 23:56 UTC · model grok-4.3
The pith
MAPRPose improves multi-object 6D pose estimation accuracy and speed by using mask-aware proposals and amodal refinement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that lifting mask-aware 2D correspondences to 3D space generates reliable pose proposals, and that integrating amodal mask prediction with ROI re-alignment in a tensorized refinement pipeline corrects errors from occlusion and noise, yielding a state-of-the-art 76.5% average recall on the BOP benchmark along with a 43-fold speedup for multi-object cases.
What carries the argument
The Mask-Aware Pose Proposal (MAPP) that scores and lifts 2D-3D correspondences plus the Amodal Mask Prediction and ROI Re-Alignment (AMPR) module that enables batch refinement via render-and-compare.
If this is right
- The method achieves higher average recall than previous approaches like FoundationPose on standard benchmarks.
- It delivers substantially faster inference when estimating poses for many objects simultaneously.
- The use of amodal masks allows correction of localization errors that occur under heavy occlusion.
- GPU tensorization permits processing all object and hypothesis combinations in a single pass.
Where Pith is reading between the lines
- One could test whether the same mask-lifting idea applies to other correspondence-based tasks like optical flow.
- The speedup suggests the framework could support real-time multi-object tracking in video streams.
- An extension might involve replacing the mask predictor with a more advanced segmentation model to further boost performance in noisy conditions.
Load-bearing premise
The approach relies on the assumption that mask predictions remain accurate enough under severe occlusion and sensor noise to produce useful 2D-to-3D correspondences and effective amodal refinements.
What would settle it
If experiments on the BOP benchmark with increased occlusion levels show the average recall falling below 70%, that would indicate the mask-aware and amodal components do not provide the claimed robustness.
Figures
read the original abstract
6D object pose estimation in cluttered scenes remains challenging due to severe occlusion and sensor noise. We propose MAPRPose, a two-stage framework that leverages mask-aware correspondences for pose proposal and amodal-driven Region-of-Interest (ROI) prediction for robust refinement. In the Mask-Aware Pose Proposal (MAPP) stage, we lift 2D correspondences into 3D space to establish reliable keypoint matches and generate geometrically consistent pose hypotheses based on correspondence-level scoring, from which the top-$K$ candidates are selected. In the refinement stage, we introduce a tensorized render-and-compare pipeline integrated with an Amodal Mask Prediction and ROI Re-Alignment (AMPR) module. By reconstructing complete object geometry and dynamically adjusting the ROI, AMPR mitigates localization errors and spatial misalignment under heavy occlusion. Furthermore, our GPU-accelerated RGB-XYZ reprojection enables simultaneous refinement of all $N \times B$ pose hypotheses in a single forward pass. Evaluated on the BOP benchmark, MAPRPose achieves a state-of-the-art Average Recall (AR) of 76.5%, outperforming FoundationPose by 3.1% AR while delivering a 43x speedup in multi-object inference.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MAPRPose, a two-stage framework for multi-object 6D pose estimation in cluttered scenes. The Mask-Aware Pose Proposal (MAPP) stage lifts 2D mask-aware correspondences to 3D to generate geometrically consistent pose hypotheses and selects top-K candidates via correspondence-level scoring. The refinement stage integrates a tensorized render-and-compare pipeline with an Amodal Mask Prediction and ROI Re-Alignment (AMPR) module to reconstruct complete geometry, dynamically adjust ROIs, and mitigate occlusion-induced misalignment. A GPU-accelerated RGB-XYZ reprojection enables simultaneous refinement of all hypotheses. On the BOP benchmark, MAPRPose reports 76.5% Average Recall (AR), outperforming FoundationPose by 3.1% AR with a 43x speedup in multi-object inference.
Significance. If the results hold under rigorous verification, the work would be significant for practical 6D pose estimation by combining improved accuracy with substantial inference speedup in multi-object settings. The tensorized render-and-compare and amodal ROI re-alignment address occlusion and noise in a computationally efficient manner, which is a strength for real-world applications. The use of a public benchmark (BOP) allows direct comparison, though the absence of ablations and stratified analysis limits attribution of gains to the proposed components.
major comments (2)
- [Abstract and Evaluation] Abstract and Evaluation section: The headline 76.5% AR and +3.1% gain over FoundationPose are presented without ablation studies isolating MAPP (mask-aware correspondences and top-K selection) or AMPR (amodal mask prediction and dynamic ROI re-alignment), without error bars, and without occlusion-stratified results on BOP subsets. This makes it impossible to confirm that the reported performance is attributable to the claimed innovations rather than implementation details or baseline differences, directly undermining the central empirical claim.
- [Method] Method section (MAPP and AMPR descriptions): No details are provided on how the top-K value, correspondence scoring thresholds, or AMPR parameters (e.g., amodal mask prediction network, ROI re-alignment criteria) were selected or tuned. The abstract states these enable reliable hypothesis generation and error correction under severe occlusion, but without sensitivity analysis or justification, the robustness of the pipeline cannot be assessed.
minor comments (1)
- [Abstract] The abstract uses LaTeX notation (top-$K$, $N$ x $B$) that should be rendered consistently in the main text for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects for strengthening the empirical validation and methodological transparency of our work. We address each major comment below and have revised the manuscript accordingly to incorporate additional experiments, analyses, and details.
read point-by-point responses
-
Referee: [Abstract and Evaluation] Abstract and Evaluation section: The headline 76.5% AR and +3.1% gain over FoundationPose are presented without ablation studies isolating MAPP (mask-aware correspondences and top-K selection) or AMPR (amodal mask prediction and dynamic ROI re-alignment), without error bars, and without occlusion-stratified results on BOP subsets. This makes it impossible to confirm that the reported performance is attributable to the claimed innovations rather than implementation details or baseline differences, directly undermining the central empirical claim.
Authors: We agree that the original manuscript would benefit from explicit ablations, error bars, and stratified results to better attribute gains to MAPP and AMPR. In the revised version, we have added Section 4.3 with a full ablation study incrementally enabling MAPP (including mask-aware correspondences and top-K selection) and AMPR (amodal mask prediction and ROI re-alignment) on the BOP benchmark, showing their individual and combined contributions to the 76.5% AR. We also report error bars as standard deviations over five independent runs. Additionally, we include occlusion-stratified AR results on BOP subsets grouped by occlusion ratio, confirming larger gains under heavy occlusion. These changes use the same public benchmark protocol as the FoundationPose comparison and directly support that the +3.1% improvement arises from the proposed components rather than baseline differences. revision: yes
-
Referee: [Method] Method section (MAPP and AMPR descriptions): No details are provided on how the top-K value, correspondence scoring thresholds, or AMPR parameters (e.g., amodal mask prediction network, ROI re-alignment criteria) were selected or tuned. The abstract states these enable reliable hypothesis generation and error correction under severe occlusion, but without sensitivity analysis or justification, the robustness of the pipeline cannot be assessed.
Authors: We acknowledge that the original submission omitted explicit details on hyperparameter selection and sensitivity. We have revised the Method section by adding Subsection 3.4, which describes the tuning procedure: a grid search over top-K (values 5-50), correspondence scoring thresholds, and AMPR parameters including the amodal mask prediction network and ROI re-alignment criteria. We include sensitivity analysis results and plots demonstrating stable performance across reasonable ranges, with our selected values yielding robust AR under varying occlusion levels. This addition provides the necessary justification and allows assessment of pipeline reliability without altering the core claims. revision: yes
Circularity Check
No circularity: empirical pipeline evaluated on external benchmark
full rationale
The paper presents a two-stage algorithmic framework (MAPP for mask-aware pose proposals via 2D-to-3D lifting and correspondence scoring, followed by AMPR for amodal mask prediction and tensorized render-and-compare refinement) whose performance is measured by Average Recall on the independent BOP benchmark. No equations, derivations, or parameter-fitting steps are described that reduce any claimed prediction or result to the inputs by construction. Claims of 76.5% AR and speedup rest on benchmark evaluation rather than self-referential math or self-citation chains. The derivation chain is self-contained against external data.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A novel depth and color feature fusion framework for 6d object pose estimation,
G. Zhou, Y . Yan, D. Wang, and Q. Chen, “A novel depth and color feature fusion framework for 6d object pose estimation,”IEEE Transactions on Multimedia, vol. 23, pp. 1630–1639, 2021
2021
-
[2]
A comprehensive review on 3d object detection and 6d pose estimation with deep learning,
S. Hoque, M. Y . Arafat, S. Xu, A. Maiti, and Y . Wei, “A comprehensive review on 3d object detection and 6d pose estimation with deep learning,” IEEE Access, vol. 9, pp. 143 746–143 770, 2021
2021
-
[3]
Semi-supervised 6d object pose estimation without using real annotations,
G. Zhou, D. Wang, Y . Yan, H. Chen, and Q. Chen, “Semi-supervised 6d object pose estimation without using real annotations,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 8, pp. 5163– 5174, 2022
2022
-
[4]
Confidence-based 6d object pose estimation,
W.-L. Huang, C.-Y . Hung, and I.-C. Lin, “Confidence-based 6d object pose estimation,”IEEE Transactions on Multimedia, vol. 24, pp. 3025– 3035, 2022
2022
-
[5]
Hff6d: Hierarchical feature fusion network for robust 6d object pose tracking,
J. Liu, W. Sun, C. Liu, X. Zhang, S. Fan, and W. Wu, “Hff6d: Hierarchical feature fusion network for robust 6d object pose tracking,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7719–7731, 2022
2022
-
[6]
A review on six degrees of freedom (6d) pose estimation for robotic applications,
C. Yuanwei, M. Hairi Mohd Zaman, and M. Faisal Ibrahim, “A review on six degrees of freedom (6d) pose estimation for robotic applications,” IEEE Access, vol. 12, pp. 161 002–161 017, 2024
2024
-
[7]
Tg-pose: Delving into topology and geometry for category-level object pose estimation,
Y . Zhan, X. Wang, L. Nie, Y . Zhao, T. Yang, and Q. Ruan, “Tg-pose: Delving into topology and geometry for category-level object pose estimation,”IEEE Transactions on Multimedia, vol. 26, pp. 9749–9762, 2024
2024
-
[8]
Language-embedded 6d pose estimation for tool manipulation,
Y . Tu, Y . Wang, H. Zhang, W. Chen, and J. Zhang, “Language-embedded 6d pose estimation for tool manipulation,”IEEE Robotics and Automation Letters, vol. 10, no. 9, pp. 8618–8625, 2025
2025
-
[9]
Any6d: Model-free 6d pose estimation of novel objects,
T. Lee, B. Wen, M. Kang, G. Kang, I. Kweon, and K.-J. Yoon, “Any6d: Model-free 6d pose estimation of novel objects,” 06 2025, pp. 11 633– 11 643
2025
-
[10]
Deep learning-based object pose estimation: A comprehensive survey,
J. Liu, W. Sun, H. Yang, Z. Zeng, C. Liu, J. Zheng, X. Liu, H. Rahmani, N. Sebe, and A. Mian, “Deep learning-based object pose estimation: A comprehensive survey,”International Journal of Computer Vision, pp. 1–45, 2026, accepted by IJCV; arXiv:2405.07801 [cs.CV], 45 pages. [Online]. Available: https://arxiv.org/abs/2405.07801 12
-
[11]
Large vision-language models enabled novel objects 6d pose estimation for human-robot collaboration,
W. Xia, H. Zheng, W. Xu, and X. Xu, “Large vision-language models enabled novel objects 6d pose estimation for human-robot collaboration,” 01 2024
2024
-
[12]
Activepose: Active 6d object pose estimation and tracking for robotic manipulation,
S. Liu, Z. Li, W. Wang, H. Sun, H. Zhang, H. Chen, Y . Qin, A. Ajoudani, and Y . Wang, “Activepose: Active 6d object pose estimation and tracking for robotic manipulation,” 09 2025
2025
-
[13]
6d pose estimation with correlation fusion,
Y . Cheng, H. Zhu, Y . Sun, C. Acar, W. Jing, Y . Wu, L. Li, C. Tan, and J.-H. Lim, “6d pose estimation with correlation fusion,” in2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 2988–2994
2021
-
[14]
Real-time ai-driven 6d pose estimation for robotic picking in cluttered bins via two- stage segmentation and cad-guided alignment,
S. Tsvetanov, T. Boyadzhiev, and D. Chikurtev, “Real-time ai-driven 6d pose estimation for robotic picking in cluttered bins via two- stage segmentation and cad-guided alignment,” in2025 International Conference on Cybersecurity and AI-Based Systems (Cyber-AI), 2025, pp. 285–290
2025
-
[15]
Active 6d pose estimation for textureless objects using multi-view rgb frames,
J. Yang, W. Xue, S. Ghavidel, and S. Waslander, “Active 6d pose estimation for textureless objects using multi-view rgb frames,” 03 2025
2025
-
[16]
Enhanced rgb-d feature extraction for 6d pose estimation,
H. Zhang, J. Tong, L. Wei, H. Zhang, and J. Chen, “Enhanced rgb-d feature extraction for 6d pose estimation,”Scientific Reports, vol. 16, 01 2026
2026
-
[17]
Gcm-pose: Generalizable 6d object pose estimation based on cross-modal feature matching,
P. Liu, F. Wang, Y . Liu, and J. Cheng, “Gcm-pose: Generalizable 6d object pose estimation based on cross-modal feature matching,”IEEE Transactions on Instrumentation and Measurement, vol. 75, pp. 1–13, 2026
2026
-
[18]
Dynamicpose: Real-time and robust 6d object pose tracking for fast-moving cameras and objects,
T. Liang, Y . Zeng, J. Xie, and B. Zhou, “Dynamicpose: Real-time and robust 6d object pose tracking for fast-moving cameras and objects,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, pp. 2424–2431
2025
-
[19]
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
Y . Xiang, T. Schmidt, V . Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,” inProceedings of Robotics: Science and Systems (RSS), June 2018, arXiv:1711.00199. [Online]. Available: https://arxiv.org/abs/1711.00199
work page Pith review arXiv 2018
-
[20]
One2any: One-reference 6d pose estimation for any object,
M. Liu, S. Li, A. Chhatkuli, P. Truong, L. V . Gool, and F. Tombari, “One2any: One-reference 6d pose estimation for any object,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 6457–6467
2025
-
[21]
So- pose: Exploiting self-occlusion for direct 6d pose estimation,
Y . Di, F. Manhardt, G. Wang, X. Ji, N. Navab, and F. Tombari, “So- pose: Exploiting self-occlusion for direct 6d pose estimation,” in2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 12 376–12 385
2021
-
[22]
Occlusion-aware self-supervised monocular 6d object pose estimation,
G. Wang, F. Manhardt, X. Liu, X. Ji, and F. Tombari, “Occlusion-aware self-supervised monocular 6d object pose estimation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1788– 1803, 2024
2024
-
[23]
arXiv preprint arXiv:2212.06870 (2022)
Y . Labb´e, L. Manuelli, A. Mousavian, S. Tyree, S. Birchfield, J. Tremblay, J. Carpentier, M. Aubry, D. Fox, and J. Sivic, “Megapose: 6d pose estimation of novel objects via render & compare,” inProceedings of the 6th Conference on Robot Learning (CoRL), ser. Proceedings of Machine Learning Research, vol. 205. PMLR, 2022, pp. 715–725, arXiv:2212.06870. [...
-
[24]
Foundationpose: Unified 6d pose estimation and tracking of novel objects,
B. Wen, W. Yang, J. Kautz, and S. Birchfield, “Foundationpose: Unified 6d pose estimation and tracking of novel objects,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 17 868–17 879
2024
-
[25]
Gigapose: Fast and robust novel object pose estimation via one correspondence,
V . N. Nguyen, T. Groueix, M. Salzmann, and V . Lepetit, “Gigapose: Fast and robust novel object pose estimation via one correspondence,” in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9903–9913
2024
-
[26]
Co-op: Correspondence- based novel object pose estimation,
S. Moon, H. Son, D. Hur, and S. Kim, “Co-op: Correspondence- based novel object pose estimation,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 11 622– 11 632
2025
-
[27]
Accurate and efficient zero-shot 6d pose estimation with frozen foundation models,
A. Caraffa, D. Boscaini, and F. Poiesi, “Accurate and efficient zero-shot 6d pose estimation with frozen foundation models,” 2025. [Online]. Available: https://arxiv.org/abs/2506.09784
-
[28]
Densefusion: 6d object pose estimation by iterative dense fusion,
C. Wang, D. Xu, Y . Zhu, R. Mart ´ın-Mart´ın, C. Lu, L. Fei-Fei, and S. Savarese, “Densefusion: 6d object pose estimation by iterative dense fusion,” in2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3338–3347
2019
-
[29]
Dpod: 6d pose object detector and refiner,
S. Zakharov, I. Shugurov, and S. Ilic, “Dpod: 6d pose object detector and refiner,” in2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1941–1950
2019
-
[30]
R. L. Haugaard and A. G. Buch, “Surfemb: Dense and continuous correspondence distributions for object pose estimation with learnt surface embeddings,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 6744–6753. [Online]. Available: https://arxiv.org/abs/2111.13489
-
[31]
Geopose: Dense re- construction guided 6d object pose estimation with geometric consistency,
D. Wang, G. Zhou, Y . Yan, H. Chen, and Q. Chen, “Geopose: Dense re- construction guided 6d object pose estimation with geometric consistency,” vol. 24, 2022, pp. 4394–4408
2022
-
[32]
Learning symmetry-aware geometry correspondences for 6d object pose estimation,
H. Zhao, S. Wei, D. Shi, W. Tanet al., “Learning symmetry-aware geometry correspondences for 6d object pose estimation,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 14 045–14 054, gCPose. [Online]. Available: https://github.com/hikvision-research/GCPose
2023
-
[33]
Foundpose: Unseen object pose estimation with foundation features,
E. P. ¨Ornek, Y . Labb ´e, B. Tekin, L. Ma, and et al., “Foundpose: Unseen object pose estimation with foundation features,” inEuropean Conference on Computer Vision (ECCV), 2024. [Online]. Available: https://arxiv.org/abs/2311.18809
-
[34]
Pos3r: 6d pose estimation for unseen objects made easy,
W. Deng, D. Campbell, C. Sun, J. Zhang, S. Kanitkar, M. E. Shaffer, and S. Gould, “Pos3r: 6d pose estimation for unseen objects made easy,” in 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 16 818–16 828
2025
-
[35]
Normalized object coordinate space for category-level 6d object pose and size estimation,
H. Wang, S. Sridhar, J. Huang, J. Valentin, and et al., “Normalized object coordinate space for category-level 6d object pose and size estimation,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2637–2646
2019
-
[36]
Epnp: An accurate o(n) solution to the pnp problem,
V . Lepetit, F. Moreno-Noguer, and P. Fua, “Epnp: An accurate o(n) solution to the pnp problem,”International Journal of Computer Vision, vol. 81, 02 2009
2009
-
[37]
Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,
M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,”Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981
1981
-
[38]
Sam-6d: Segment anything model meets zero-shot 6d object pose estimation,
J. Lin, L. Liu, D. Lu, and K. Jia, “Sam-6d: Segment anything model meets zero-shot 6d object pose estimation,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 27 906– 27 916
2024
-
[39]
Freeze: Training-free zero-shot 6d pose estimation with geometric and vision foundation models,
A. Caraffa, D. Boscaini, A. Hamza, and F. Poiesi, “Freeze: Training-free zero-shot 6d pose estimation with geometric and vision foundation models,” inEuropean Conference on Computer Vision (ECCV), 2024, pp. 414–431, arXiv:2312.00947. [Online]. Available: https://arxiv.org/abs/2312.00947
-
[40]
Matchu: Matching unseen objects for 6d pose estimation from rgb-d images,
J. Huang, H. Yu, K.-T. Yu, N. Navab, S. Ilic, and B. Busam, “Matchu: Matching unseen objects for 6d pose estimation from rgb-d images,” in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 10 095–10 105
2024
-
[41]
Epos: Estimating 6d pose of objects with symmetries,
T. Hodaˇn, D. Bar ´ath, and J. Matas, “Epos: Estimating 6d pose of objects with symmetries,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 700–11 709
2020
-
[42]
Cosypose: Consistent multi-view multi-object 6d pose estimation,
Y . Labb ´e, J. Carpentier, M. Aubry, and J. Sivic, “Cosypose: Consistent multi-view multi-object 6d pose estimation,” inEuropean Conference on Computer Vision (ECCV), 2020. [Online]. Available: https://arxiv.org/abs/2008.08465
-
[43]
Modular primitives for high-performance differentiable rendering,
S. Laine, J. Hellsten, T. Karras, Y . Seol, and et al., “Modular primitives for high-performance differentiable rendering,” inACM Transactions on Graphics (SIGGRAPH Asia), vol. 39, no. 6, 2020, pp. 1–14, nvdiffrast. [Online]. Available: https://arxiv.org/abs/2011.03277
-
[44]
Pvnet: Pixel-wise voting network for 6dof pose estimation,
S. Peng, Y . Liu, Q. Huang, X. Zhou, and H. Bao, “Pvnet: Pixel-wise voting network for 6dof pose estimation,” in2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4556– 4565
2019
-
[45]
Onda-pose: Occlusion-aware neural domain adaptation for self-supervised 6d object pose estimation,
T. Tan and Q. Dong, “Onda-pose: Occlusion-aware neural domain adaptation for self-supervised 6d object pose estimation,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 16 829–16 838
2025
-
[46]
Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation,
G. Wang, F. Manhardt, F. Tombari, and X. Ji, “Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16 606–16 616
2021
-
[47]
OA-Pose: Occlusion-aware monocular 6-DoF object pose estimation under geometry alignment for robot manipulation,
J. Wang, L. Luo, W. Liang, and Z.-X. Yang, “OA-Pose: Occlusion-aware monocular 6-DoF object pose estimation under geometry alignment for robot manipulation,”Pattern Recognition, vol. 154, p. 110576, 2024
2024
-
[48]
Mask6d: Masked pose priors for 6d object pose estimation,
Y . Xie, H. Jiang, and J. Xie, “Mask6d: Masked pose priors for 6d object pose estimation,” inICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 3545–3549
2024
-
[49]
Occlusion-aware 6d pose estimation with depth-guided graph encoding and cross-semantic fusion for robotic grasping,
J. Liu, Z. Lu, L. Chen, J. Yang, and C. Yang, “Occlusion-aware 6d pose estimation with depth-guided graph encoding and cross-semantic fusion for robotic grasping,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 5011–5017. 13
2025
-
[50]
Ua-pose: Uncertainty-aware 6d object pose estimation and online object completion with partial references,
M.-F. Li, X. Yang, F.-E. Wang, H. Basak, Y . Sun, S. Gayaka, M. Sun, and C.-H. Kuo, “Ua-pose: Uncertainty-aware 6d object pose estimation and online object completion with partial references,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 1180–1189
2025
-
[51]
Amodal3R: Amodal 3D reconstruction from occluded 2D images,
T. Wu, C. Zheng, F. Guan, A. Vedaldiet al., “Amodal3R: Amodal 3D reconstruction from occluded 2D images,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 9181–9193
2025
-
[52]
Open- vocabulary object 6d pose estimation,
J. Corsetti, D. Boscaini, C. Oh, A. Cavallaro, and F. Poiesi, “Open- vocabulary object 6d pose estimation,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 18 071– 18 080
2024
-
[53]
DINOv2: Learning Robust Visual Features without Supervision
M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, and et al., “Dinov2: Learning robust visual features without supervision,”Transactions on Machine Learning Research (TMLR), 2024, dINOv2. [Online]. Available: https://arxiv.org/abs/2304.07193
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[54]
Opengl sc implementation over an opengl es 1.1 graphics board,
N. Baek and H. Lee, “Opengl sc implementation over an opengl es 1.1 graphics board,” in2012 IEEE International Conference on Multimedia and Expo Workshops, 2012, pp. 671–671
2012
-
[55]
X. Ma, V . Hegde, and L. Yolyan, 2022
2022
-
[56]
Boosting video object segmentation via space-time correspondence learning,
Y . Zhang, L. Li, W. Wang, R. Xie, and et al., “Boosting video object segmentation via space-time correspondence learning,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. [Online]. Available: https://arxiv.org/abs/2304.06211
-
[57]
Segnetres-crf: A deep convolutional encoder-decoder architecture for semantic image segmentation,
L. A. de Oliveira Junior, H. R. Medeiros, D. Mac ˆedo, C. Zanchettin, A. L. I. Oliveira, and T. Ludermir, “Segnetres-crf: A deep convolutional encoder-decoder architecture for semantic image segmentation,” in2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–6
2018
-
[58]
K. Park, A. Mousavian, Y . Xiang, and D. Fox, “Latentfusion: End-to- end differentiable reconstruction and rendering for unseen object pose estimation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10 710–10 719, also available as arXiv:1912.00416 [cs.CV]
-
[59]
Gen6D: Generalizable model-free 6-DoF object pose estimation from RGB images,
Y . Liu, Y . Wen, S. Peng, C. Linet al., “Gen6D: Generalizable model-free 6-DoF object pose estimation from RGB images,” inProceedings of the European Conference on Computer Vision (ECCV). Springer, 2022, pp. 298–315
2022
-
[60]
Onepose: One-shot object pose estimation without cad models,
J. Sun, Z. Wang, S. Zhang, X. He, H. Zhao, G. Zhang, and X. Zhou, “Onepose: One-shot object pose estimation without cad models,” in2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 6815–6824
2022
-
[61]
Fs6d: Few-shot 6d pose estimation of novel objects,
Y . He, Y . Wang, H. Fan, J. Sun, and Q. Chen, “Fs6d: Few-shot 6d pose estimation of novel objects,” in2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 6804–6814
2022
-
[62]
OnePose++: Keypoint-free one-shot object pose estimation without CAD models,
X. He, J. Sun, Y . Wang, D. Huanget al., “OnePose++: Keypoint-free one-shot object pose estimation without CAD models,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 35, 2022, pp. 35 103–35 115
2022
-
[63]
Gs-pose: Generalizable segmentation- based 6d object pose estimation with 3d gaussian splatting,
D. Cai, J. Heikkil ¨a, and E. Rahtu, “Gs-pose: Generalizable segmentation- based 6d object pose estimation with 3d gaussian splatting,” in2025 International Conference on 3D Vision (3DV), 2025, pp. 1001–1011
2025
-
[64]
J. Chen, M. Sun, Y . Zheng, T. Bao, Z. He, D. Li, G. Jin, Z. Rui, L. Wu, and X. Jiang,IEEE Transactions on Multimedia, vol. 27, pp. 5770–5783, 2025
2025
-
[65]
Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes,
S. Hinterstoisser, S. Holzer, V . Lepetit, S. Ilicet al., “Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes,” inComputer Vision – ACCV 2012, ser. Lecture Notes in Computer Science, vol. 7724. Springer, 2013, pp. 548–562
2012
-
[66]
Bop: Benchmark for 6d object pose estimation,
T. Hodaˇn, F. Michel, E. Brachmann, W. Kehlet al., “Bop: Benchmark for 6d object pose estimation,” inEuropean Conference on Computer Vision (ECCV), 2018, pp. 19–34, arXiv:1808.08319. [Online]. Available: https://arxiv.org/abs/1808.08319
-
[67]
Deepim: Deep iterative matching for 6d pose estimation,
Y . Li, G. Wang, X. Ji, Y . Xiang, and D. Fox, “Deepim: Deep iterative matching for 6d pose estimation,” inProceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 695–711. [Online]. Available: https://arxiv.org/abs/1804.00175
-
[68]
Mask r-cnn,
K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,” in2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988
2017
-
[69]
Soft rasterizer: A differentiable renderer for image-based 3d reasoning,
S. Liu, W. Chen, T. Li, and H. Li, “Soft rasterizer: A differentiable renderer for image-based 3d reasoning,” in2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7707–7716. APPENDIXA RGB-XYZ RE-PROJECTION ANDVIEWSYNTHESIS To synthesize scale-normalized and pose-aligned representa- tions, we define a differentiable warping operato...
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.