Recognition: unknown
Disentangled Point Diffusion for Precise Object Placement
Pith reviewed 2026-05-10 15:55 UTC · model grok-4.3
The pith
A disentangled point diffusion framework separates global scene priors from local object geometry and frame diffusion to achieve more precise robotic placement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TAX-DPD models global scene-level placements through a novel feed-forward Dense Gaussian Mixture Model that yields a spatially dense prior over global placements, then models the local object-level configuration through a novel disentangled point cloud diffusion module that separately diffuses the object geometry and the placement frame. This enables precise local geometric reasoning and achieves substantially higher accuracy than prior SE(3)-diffusion approaches even for rigid objects, while extending to non-rigid objects as shown in cloth tasks.
What carries the argument
The disentangled point cloud diffusion module that separately diffuses the object geometry and the placement frame, supported by a feed-forward Dense GMM for global scene-level priors.
If this is right
- Achieves state-of-the-art performance in placement precision on high-precision industrial insertion tasks.
- Delivers improved multi-modal coverage of possible placement options.
- Generalizes to variations in object geometries and scene configurations.
- Extends applicability to non-rigid objects as demonstrated on simulated cloth-hanging tasks.
Where Pith is reading between the lines
- The separation of geometry and frame diffusion could be tested in other point-cloud-based robotic planning problems beyond placement.
- The framework might integrate with perception pipelines that already output point clouds to reduce end-to-end training data needs.
- Further experiments could check whether the dense GMM prior remains stable when scene objects move during execution.
- The approach opens a path to hybrid systems that combine the diffusion output with local optimization for even tighter tolerances.
Load-bearing premise
That a feed-forward dense GMM supplies an effective global prior and that separately diffusing object geometry and placement frame in point clouds will produce substantially more precise local geometric reasoning than unified SE(3)-diffusion methods.
What would settle it
A controlled side-by-side test on the same suite of novel object geometries where the disentangled method shows no higher insertion success rate or lower placement error than an SE(3)-diffusion baseline would falsify the claimed precision gain.
Figures
read the original abstract
Recent advances in robotic manipulation have highlighted the effectiveness of learning from demonstration. However, while end-to-end policies excel in expressivity and flexibility, they struggle both in generalizing to novel object geometries and in attaining a high degree of precision. An alternative, object-centric approach frames the task as predicting the placement pose of the target object, providing a modular decomposition of the problem. Building on this goal-prediction paradigm, we propose TAX-DPD, a hierarchical, disentangled point diffusion framework that achieves state-of-the-art performance in placement precision, multi-modal coverage, and generalization to variations in object geometries and scene configurations. We model global scene-level placements through a novel feed-forward Dense Gaussian Mixture Model (GMM) that yields a spatially dense prior over global placements; we then model the local object-level configuration through a novel disentangled point cloud diffusion module that separately diffuses the object geometry and the placement frame, enabling precise local geometric reasoning. Interestingly, we demonstrate that our point cloud diffusion achieves substantially higher accuracy than a prior approach based on SE(3)-diffusion, even in the context of rigid object placement. We validate our approach across a suite of challenging tasks in simulation and in the real-world on high-precision industrial insertion tasks. Furthermore, we present results on a cloth-hanging task in simulation, indicating that our framework can further relax assumptions on object rigidity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes TAX-DPD, a hierarchical disentangled point diffusion framework for precise object placement in robotics. It models global placements with a feed-forward Dense Gaussian Mixture Model and local configurations with a disentangled point cloud diffusion module that separately diffuses object geometry and the placement frame. The authors claim this achieves state-of-the-art results in placement precision, multi-modal coverage, and generalization to novel object geometries and scene configurations, with validation on simulation and real-world high-precision insertion tasks as well as a cloth-hanging task.
Significance. If the empirical claims hold after proper validation, the work would advance object-centric robotic manipulation by providing a modular diffusion approach that improves precision and generalization over end-to-end policies, with particular relevance for high-precision insertions and non-rigid objects.
major comments (2)
- [Abstract] Abstract: The claim that 'our point cloud diffusion achieves substantially higher accuracy than a prior approach based on SE(3)-diffusion, even in the context of rigid object placement' is central to the novelty but lacks any reported ablation isolating the disentanglement of geometry and placement-frame diffusion from confounding factors such as the Dense GMM prior, network capacity, or training schedule.
- [§5] §5 (Experimental Results): The abstract asserts SOTA performance across precision, coverage, and generalization but supplies no quantitative metrics, baselines, error bars, or experimental details, so the support for the central empirical claim cannot be assessed from the provided information.
minor comments (1)
- [Abstract] The acronym TAX-DPD is introduced in the abstract without expansion or definition.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We have revised the manuscript to address the concerns about isolating the contribution of disentanglement and providing clearer quantitative experimental support. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that 'our point cloud diffusion achieves substantially higher accuracy than a prior approach based on SE(3)-diffusion, even in the context of rigid object placement' is central to the novelty but lacks any reported ablation isolating the disentanglement of geometry and placement-frame diffusion from confounding factors such as the Dense GMM prior, network capacity, or training schedule.
Authors: We agree that the abstract claim requires stronger isolation of the disentanglement effect. In the revised manuscript, we have added an ablation study (new Section 5.4) that compares the full disentangled point cloud diffusion against an SE(3)-diffusion baseline while holding the Dense GMM prior, network capacity, and training schedule fixed. The results show the accuracy improvement persists and is attributable to the separate diffusion of geometry and placement frame. We have also updated the abstract to reference this ablation. revision: yes
-
Referee: [§5] §5 (Experimental Results): The abstract asserts SOTA performance across precision, coverage, and generalization but supplies no quantitative metrics, baselines, error bars, or experimental details, so the support for the central empirical claim cannot be assessed from the provided information.
Authors: We apologize for any lack of prominence of the experimental details in the reviewed version. Section 5 of the manuscript does contain quantitative results, including Tables 1–3 with precision, coverage, and generalization metrics, multiple baselines (end-to-end policies and SE(3)-diffusion), and error bars from repeated runs, along with task descriptions in Section 5.1. To ensure the claims are fully assessable, we have expanded Section 5 with additional implementation details for baselines, explicit definitions of all metrics, statistical significance tests, and a consolidated summary table of SOTA comparisons. revision: yes
Circularity Check
No circularity; empirical claims rest on novel architecture validated against external baselines
full rationale
The paper introduces TAX-DPD as a hierarchical framework with a feed-forward Dense GMM for global scene placements and a disentangled point-cloud diffusion module (separately diffusing geometry and placement frame) for local reasoning. It reports empirical superiority over prior SE(3)-diffusion methods on precision, multi-modal coverage, and generalization tasks, including rigid-object and cloth-hanging scenarios. No equations, parameters, or results are shown to reduce by construction to the method's own definitions or fitted inputs. The comparison to SE(3)-diffusion is presented as an external benchmark result rather than a self-referential necessity. No load-bearing self-citations or ansatz smuggling appear in the provided text. The derivation chain is therefore self-contained against independent experimental validation.
Axiom & Free-Parameter Ledger
invented entities (2)
-
Dense Gaussian Mixture Model (GMM)
no independent evidence
-
disentangled point cloud diffusion module
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Learning fine-grained bimanual manipulation with low-cost hardware,
T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” inICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems
-
[2]
Aloha unleashed: A simple recipe for robot dexterity,
T. Z. Zhao, J. Tompson, D. Driess, P. Florence, S. K. S. Ghasemipour, C. Finn, and A. Wahid, “Aloha unleashed: A simple recipe for robot dexterity,” in8th Annual Conference on Robot Learning
-
[3]
Tidybot++: An open-source holonomic mobile manipulator for robot learning,
J. Wu, W. Chong, R. Holmberg, A. Prasad, Y . Gao, O. Khatib, S. Song, S. Rusinkiewicz, and J. Bohg, “Tidybot++: An open-source holonomic mobile manipulator for robot learning,” in8th Annual Conference on Robot Learning
-
[4]
Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots
C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots.” Robotics: Science and Systems, 2024
2024
-
[5]
Gello: A general, low- cost, and intuitive teleoperation framework for robot manipulators,
P. Wu, Y . Shentu, Z. Yi, X. Lin, and P. Abbeel, “Gello: A general, low- cost, and intuitive teleoperation framework for robot manipulators,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 12 156–12 163
2024
-
[6]
Open-television: Teleoperation with immersive active visual feedback,
X. Cheng, J. Li, S. Yang, G. Yang, and X. Wang, “Open-television: Teleoperation with immersive active visual feedback,” in8th Annual Conference on Robot Learning
-
[7]
Diffusion policy: Visuomotor policy learning via action diffusion,
C. Chi, S. Feng, Y . Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” in Robotics: Science and Systems, 2023
2023
-
[8]
3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations,
Y . Ze, G. Zhang, K. Zhang, C. Hu, M. Wang, and H. Xu, “3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations,” in2nd Workshop on Dexterous Manipulation: Design, Perception and Control (RSS)
-
[9]
Behavior transformers: Cloningkmodes with one stone,
N. M. Shafiullah, Z. Cui, A. A. Altanzaya, and L. Pinto, “Behavior transformers: Cloningkmodes with one stone,”Advances in neural information processing systems, vol. 35, pp. 22 955–22 968, 2022
2022
-
[10]
Behavior generation with latent actions,
S. Lee, Y . Wang, H. Etukuru, H. J. ovKim, N. M. M. Shafiullah, and L. Pinto, “Behavior generation with latent actions,” inInternational Conference on Machine Learning. PMLR, 2024, pp. 26 991–27 008
2024
-
[11]
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter,et al., “π 0: A vision- language-action flow model for general robot control,”arXiv preprint arXiv:2410.24164, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Non-rigid relative place- ment through 3d dense diffusion,
E. Cai, O. Donca, B. Eisner, and D. Held, “Non-rigid relative place- ment through 3d dense diffusion,” inConference on Robot Learning (CoRL), 2024
2024
-
[13]
Tax-pose: Task- specific cross-pose estimation for robot manipulation,
C. Pan, B. Okorn, H. Zhang, B. Eisner, and D. Held, “Tax-pose: Task- specific cross-pose estimation for robot manipulation,” inConference on Robot Learning. PMLR, 2023, pp. 1783–1792
2023
-
[14]
Imagination policy: Using generative point cloud models for learning manipulation policies,
H. Huang, K. Schmeckpeper, D. Wang, O. Biza, Y . Qian, H. Liu, M. Jia, R. Platt, and R. Walters, “Imagination policy: Using generative point cloud models for learning manipulation policies,” inProceedings of the Conference on Robot Learning, 2024
2024
-
[15]
Shelving, stacking, hanging: Rela- tional pose diffusion for multi-modal rearrangement,
A. Simeonov, A. Goyal, L. Manuelli, Y .-C. Lin, A. Sarmiento, A. R. Garcia, P. Agrawal, and D. Fox, “Shelving, stacking, hanging: Rela- tional pose diffusion for multi-modal rearrangement,” inConference on Robot Learning. PMLR, 2023, pp. 2030–2069
2023
-
[16]
Anyplace: Learning general- ized object placement for robot manipulation,
Y . Zhao, M. Bogdanovic, C. Luo, S. Tohme, K. Darvish, A. Aspuru- Guzik, F. Shkurti, and A. Garg, “Anyplace: Learning general- ized object placement for robot manipulation,”arXiv preprint arXiv:2502.04531, 2025
-
[17]
Structdif- fusion: Language-guided creation of physically-valid structures using unseen objects,
W. Liu, Y . Du, T. Hermans, S. Chernova, and C. Paxton, “Structdif- fusion: Language-guided creation of physically-valid structures using unseen objects,” inRobotics: Science and Systems, 2023
2023
-
[18]
Learning distributional demon- stration spaces for task-specific cross-pose estimation,
J. Wang, O. Donca, and D. Held, “Learning distributional demon- stration spaces for task-specific cross-pose estimation,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 15 054–15 060
2024
-
[19]
Deep se (3)-equivariant geometric reasoning for precise placement tasks,
B. Eisner, Y . Yang, T. Davchev, M. Vecerik, J. Scholz, and D. Held, “Deep se (3)-equivariant geometric reasoning for precise placement tasks,” inThe Twelfth International Conference on Learning Repre- sentations
-
[20]
Neural descriptor fields: Se (3)- equivariant object representations for manipulation,
A. Simeonov, Y . Du, A. Tagliasacchi, J. B. Tenenbaum, A. Rodriguez, P. Agrawal, and V . Sitzmann, “Neural descriptor fields: Se (3)- equivariant object representations for manipulation,” in2022 Interna- tional Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 6394–6400
2022
-
[21]
Dense object nets: Learn- ing dense visual object descriptors by and for robotic manipulation,
P. R. Florence, L. Manuelli, and R. Tedrake, “Dense object nets: Learn- ing dense visual object descriptors by and for robotic manipulation,” inConference on Robot Learning. PMLR, 2018, pp. 373–385
2018
-
[22]
Dap: Diffusion-based affordance prediction for multi- modality storage,
H. Chang, K. Boyalakuntla, Y . Liu, X. Zhang, L. Schramm, and A. Boularias, “Dap: Diffusion-based affordance prediction for multi- modality storage,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 9476–9481
2024
-
[23]
Generative and discriminative voxel modeling with convolutional neural networks,
A. Brock, T. Lim, J. M. Ritchie, and N. Weston, “Generative and discriminative voxel modeling with convolutional neural networks,” arXiv preprint arXiv:1608.04236, 2016
-
[24]
Setvae: Learning hierarchical composition for generative modeling of set-structured data,
J. Kim, J. Yoo, J. Lee, and S. Hong, “Setvae: Learning hierarchical composition for generative modeling of set-structured data,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 059–15 068
2021
-
[25]
Learning representations and generative models for 3d point clouds,
P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas, “Learning representations and generative models for 3d point clouds,” inInter- national conference on machine learning. PMLR, 2018, pp. 40–49
2018
-
[26]
3d point cloud generative adversarial network based on tree structured graph convolutions,
D. W. Shu, S. W. Park, and J. Kwon, “3d point cloud generative adversarial network based on tree structured graph convolutions,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3859–3868
2019
-
[27]
Cpcgan: A controllable 3d point cloud generative adversarial network with semantic label generating,
X. Yang, Y . Wu, K. Zhang, and C. Jin, “Cpcgan: A controllable 3d point cloud generative adversarial network with semantic label generating,” inProceedings of the AAAI conference on artificial intelligence, vol. 35, no. 4, 2021, pp. 3154–3162
2021
-
[28]
Diffusion probabilistic models for 3d point cloud generation,
S. Luo and W. Hu, “Diffusion probabilistic models for 3d point cloud generation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2837–2845
2021
-
[29]
Learning to generate realistic lidar point clouds,
V . Zyrianov, X. Zhu, and S. Wang, “Learning to generate realistic lidar point clouds,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 17–35
2022
-
[30]
Dit- 3d: Exploring plain diffusion transformers for 3d shape generation,
S. Mo, E. Xie, R. Chu, L. Hong, M. Niessner, and Z. Li, “Dit- 3d: Exploring plain diffusion transformers for 3d shape generation,” Advances in neural information processing systems, vol. 36, pp. 67 960–67 971, 2023
2023
-
[31]
Fast training of diffusion transformer with extreme masking for 3d point clouds generation,
S. Mo, E. Xie, Y . Wu, J. Chen, M. Nießner, and Z. Li, “Fast training of diffusion transformer with extreme masking for 3d point clouds generation,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 354–370
2024
-
[32]
City3d: Large-scale building reconstruction from airborne lidar point clouds,
J. Huang, J. Stoter, R. Peters, and L. Nan, “City3d: Large-scale building reconstruction from airborne lidar point clouds,”Remote Sensing, vol. 14, no. 9, p. 2254, 2022
2022
-
[33]
Coherent 3d scene diffusion from a single rgb image,
M. Dahnert, A. Dai, N. M ¨uller, and M. Nießner, “Coherent 3d scene diffusion from a single rgb image,”Advances in Neural Information Processing Systems, vol. 37, pp. 23 435–23 463, 2024
2024
-
[34]
Point-E: A System for Generating 3D Point Clouds from Complex Prompts
A. Nichol, H. Jun, P. Dhariwal, P. Mishkin, and M. Chen, “Point-e: A system for generating 3d point clouds from complex prompts,”arXiv preprint arXiv:2212.08751, 2022
work page internal anchor Pith review arXiv 2022
-
[35]
Clip-forge: Towards zero-shot text- to-shape generation,
A. Sanghi, H. Chu, J. G. Lambourne, Y . Wang, C.-Y . Cheng, M. Fumero, and K. R. Malekshan, “Clip-forge: Towards zero-shot text- to-shape generation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 603–18 613
2022
-
[36]
Diffusion probabilistic models for scene-scale 3d categorical data,
J. Lee, W. Im, S. Lee, and S.-E. Yoon, “Diffusion probabilistic models for scene-scale 3d categorical data,”arXiv preprint arXiv:2301.00527, 2023
-
[37]
Towards realistic scene gener- ation with lidar diffusion models,
H. Ran, V . Guizilini, and Y . Wang, “Towards realistic scene gener- ation with lidar diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 738–14 748
2024
-
[38]
3d shape generation and completion through point-voxel diffusion,
L. Zhou, Y . Du, and J. Wu, “3d shape generation and completion through point-voxel diffusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 5826–5835
2021
-
[39]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020
2020
-
[40]
Improved denoising diffusion prob- abilistic models,
A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion prob- abilistic models,” inInternational conference on machine learning. PMLR, 2021, pp. 8162–8171
2021
-
[41]
3d-vla: A 3d vision-language-action generative world model,
H. Zhen, X. Qiu, P. Chen, J. Yang, X. Yan, Y . Du, Y . Hong, and C. Gan, “3d-vla: A 3d vision-language-action generative world model,” inInternational Conference on Machine Learning. PMLR, 2024, pp. 61 229–61 245
2024
-
[42]
Pointnet++: Deep hierarchical feature learning on point sets in a metric space,
C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,”Advances in neural information processing systems, vol. 30, 2017
2017
-
[43]
Scalable diffusion models with transformers,
W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4195–4205
2023
-
[44]
Pybullet, a python module for physics simulation for games, robotics, and machine learning,
E. Coumans and Y . Bai, “Pybullet, a python module for physics simulation for games, robotics, and machine learning,” 2016-2020. [Online]. Available: http://pybullet.org
2016
-
[45]
Assembly Performance Metrics and Test Meth- ods,
“Assembly Performance Metrics and Test Meth- ods,”NIST, May 2018. [Online]. Avail- able: https://www.nist.gov/el/intelligent-systems-division-73500/ robotic-grasping-and-manipulation-assembly/assembly
2018
-
[46]
Dynamic environments with deformable objects,
R. Antonova, P. Shi, H. Yin, Z. Weng, and D. K. Jensfelt, “Dynamic environments with deformable objects,” inThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021
2021
-
[47]
Articubot: Learning universal articulated object manipulation policy via large scale simulation,
Y . Wang, Z. Wang, M. Nakura, P. Bhowal, C.-L. Kuo, Y .-T. Chen, Z. Erickson, and D. Held, “Articubot: Learning universal articulated object manipulation policy via large scale simulation,”arXiv preprint arXiv:2503.03045, 2025
-
[48]
Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation,
Z. Xian and N. Gkanatsios, “Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation,” inConfer- ence on Robot Learning/Proceedings of Machine Learning Research. Proceedings of Machine Learning Research, 2023
2023
-
[49]
On the continuity of rotation representations in neural networks,
Y . Zhou, C. Barnes, J. Lu, J. Yang, and H. Li, “On the continuity of rotation representations in neural networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5745–5753
2019
-
[50]
Iterative geometry encoding volume for stereo matching,
G. Xu, X. Wang, X. Ding, and X. Yang, “Iterative geometry encoding volume for stereo matching,” inProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2023, pp. 21 919– 21 928. Supplementary Material APPENDIXI EXTENSION TONON-RIGIDPLACEMENTTASKS A. Experimental Setup Since our point cloud–based formulation for goal predic- ...
2023
-
[51]
To address this and ensure consistency across tasks, we standardize the input space by adaptively scaling both the scene and object point clouds based on task- specific statistics
Dataset Scaling:As one of the motivating issues of our method, diffusing point clouds in the object placement setting is less feasible due to the scale mismatch between the object and the scene. To address this and ensure consistency across tasks, we standardize the input space by adaptively scaling both the scene and object point clouds based on task- sp...
-
[52]
Point Cloud Downsampling:For training, both the ob- ject and scene point clouds are downsampled using furthest point sampling. Depending on the scene complexity, we vary the number of sampled points to ensure minimal geometric information loss, preserving the structural features necessary for inferring the goal object configuration. We document the number...
-
[53]
token mixing
Augmentation:We additionally augment the scene, initial object, and goal configuration point clouds with the same rotation, which is uniformly sampled from[0,2π]about thez-axis. B. Hyper-parameters We provide the hyper-parameters used for training and model configuration in Table V. These include both optimiza- tion settings (e.g., batch size, learning ra...
-
[54]
samples a set of three correspondences{(p i,ˆp∗ i )},
-
[55]
estimates a candidate transformationTfrom these correspondences,
-
[56]
AfterNiterations, we select the transform with the largest inlier set, then re-estimate the final SE(3) transform using SVD over all inliers
evaluates inlier support by counting correspondences that satisfy∥T p j −ˆp∗ j ∥2 < τ, whereτis a distance threshold. AfterNiterations, we select the transform with the largest inlier set, then re-estimate the final SE(3) transform using SVD over all inliers. APPENDIXIV REALWORLDEXPERIMENTSDETAILS A. Setup Descriptions
-
[57]
Waterproof
Hardware:The hardware setup is shown in Figure 8. The robot uses a 6-DOF arm and a gripper to manipulate the objects. To get visual input for insertion pose estimation, our setup contains two cameras: one at the end effector (wrist camera, Intel D405), and the other fixed to the table on the side of the workspace (side camera, ZEDX-Mini)). The objects cho...
-
[58]
When capturing the plug part of the connector (i.e
Capturing Point Cloud:We use images from the stereo cameras and the deep stereo method IGEV [50] to provide depth estimation and therefore scene point clouds. When capturing the plug part of the connector (i.e. the objectO), which is movable, the robot grasping the objectOmoves to an object-capturing pose such that the side camera, fixed to the ground, ca...
-
[59]
Waterproof
Generating Demonstration Data.:Data is collected by making the robot perform the insertion task multiple times with pre-programmed poses and human instructions. Figure 7 shows how each cycle of data collection works, where we randomly add variations to the initial gripper pose to simulate various initial object configurations. Specifically, the variations...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.