Learning Motion Feasibility from Point Clouds in Cluttered Environments

Antony Thomas; Arthi; Girish Varma; Sajid Ansari

arxiv: 2606.26700 · v1 · pith:5GW3SCLVnew · submitted 2026-06-25 · 💻 cs.RO · cs.AI

Learning Motion Feasibility from Point Clouds in Cluttered Environments

Sajid Ansari , Arthi , Girish Varma , Antony Thomas This is my paper

Pith reviewed 2026-06-26 05:13 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords motion feasibilitypoint cloud transformercluttered scenesrobot manipulationsampling-based planningRGB-D datagrasp prediction7-DOF arm

0 comments

The pith

A point-cloud transformer predicts 7-DOF robot grasp feasibility from raw RGB-D point clouds in clutter at 0.996 AUROC.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to learn motion feasibility directly from raw point-cloud observations rather than relying on repeated calls to sampling-based planners. It constructs a benchmark of 2.7 million labels across 88 scanned objects and 190 cluttered tabletop scenes, then trains and compares MLP, volumetric CNN, and point-cloud transformer classifiers under identical conditions. The central claim is that the transformer model reaches high accuracy on objects never seen in training while delivering predictions far faster than the planners used to generate its labels.

Core claim

GRASPFC-PTX, a point-cloud transformer, achieves an AUROC of 0.996 on novel objects for predicting whether a grasp motion is feasible for a 7-DOF manipulator, using only raw RGB-D point clouds of realistic cluttered scenes, and produces each prediction substantially faster than sampling-based motion planners.

What carries the argument

GRASPFC-PTX, a point-cloud transformer that ingests raw RGB-D point clouds and outputs a binary feasibility label for a candidate grasp.

If this is right

Feasibility prediction can be moved from repeated planner calls into a single forward pass on sensor data.
The same architecture works for novel objects without retraining or scene simplification.
Planning pipelines that currently spend most time on infeasible samples can replace that work with fast learned checks.
The 2.7 million label benchmark supplies matched training and test splits for comparing future models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the model generalizes to moving obstacles or non-tabletop scenes, it could support online replanning during manipulation.
The benchmark construction could be extended to other robot arms or sensor types to test transfer.
High accuracy on novel objects suggests the learned representation captures geometric constraints that are independent of specific object identity.

Load-bearing premise

Labels produced by sampling-based motion planners on the scanned scenes are accurate enough to serve as ground truth.

What would settle it

Collect a new set of cluttered scenes, label each candidate grasp with both the trained model and an exact motion planner that is guaranteed to be complete, and measure whether their feasibility decisions diverge on more than a small fraction of cases.

Figures

Figures reproduced from arXiv: 2606.26700 by Antony Thomas, Arthi, Girish Varma, Sajid Ansari.

**Figure 1.** Figure 1: Methodology overview. Top: data construction (Section 3.2) segmentation, qualitydiverse grasp extraction, RRT-Connect labelling. Middle: the three classifier families compared (Section 3.3) GraspFC-NNet, GraspFC-Conv3D, GraspFC-PTX each sharing a 17-D pose descriptor but differing in scene representation. Bottom: evaluation on the in-distribution Seen/Similar/Novel splits and two out-of-distribution setti… view at source ↗

**Figure 2.** Figure 2: Data construction pipeline (left to right): (A) raw RGB-D scene cloud in the table frame; [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Scene representations used as classifier inputs. (A) point cloud with table mesh; (B) fore [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Out-of-distribution settings, top-down view. (a,b) [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Model-side representations consumed by each architecture and the planner-verified trajec [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

Motion feasibility prediction plays a central role in robotics, particularly in task and motion planning and manipulation. A major bottleneck for this problem in cluttered environments is that infeasible planning attempts by Sampling-based motion planners (SBMPs) can incur substantial computational cost. Also existing approaches for infeasibility certification are limited to low-dimensional configuration spaces and often assume simplified geometric environments represented by primitive objects with known parameters. We study the complementary problem of learning motion feasibility prediction directly from raw RGB-D observations for a 7-DOF manipulator operating in realistic cluttered scenes. We introduce the first large-scale benchmark for this setting, comprising 2.7M grasp feasibility labels over 88 scanned objects and 190 cluttered tabletop scenes. We benchmark three representative classifier families spanning MLP- based, volumetric-CNN, and point-cloud-based Transformer architectures under matched training conditions. Our best model, GRASPFC-PTX (a point-cloud transformer), achieves an AUROC of 0.996 on Novel objects while providing predictions significantly faster than SBMPs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper builds the first large benchmark for learning 7-DOF motion feasibility from raw point clouds in cluttered scenes and shows a transformer can match SBMP labels at high speed, but the labels themselves carry noise from incomplete planners.

read the letter

The one or two things to know are that this is the first large-scale benchmark for learning motion feasibility from raw point clouds for 7-DOF arms in clutter, and their transformer model gets 0.996 AUROC on novel objects while being faster than the planners used to label the data.

They collected 2.7M labels over 190 scenes from scanned objects, which is a real step up in scale from earlier low-dimensional or primitive-based work. Running matched experiments on MLP, volumetric CNN, and point-cloud transformer architectures is solid, and the transformer performing best fits the input type. The practical payoff is clear: quick predictions that can replace slow SBMP calls in planning.

The labels are the soft spot. Since they come from SBMP runs, "infeasible" means the planner didn't succeed within budget, not that no path exists. The stress-test concern holds up here because 7-DOF cluttered scenes are exactly where SBMPs can have high false-negative rates on feasibility. The model could be overfitting to planner-specific behavior. If the full paper has any cross-checks against that, it would help, but based on the abstract it's not mentioned.

No major issues with the math or citations from what's shown. The approach is direct.

This paper is for robotics researchers focused on manipulation planning who could use the benchmark or the fast predictor. A reader looking for data-driven alternatives to sampling planners would find it relevant.

It deserves a serious referee. The benchmark size and the architecture comparison are enough to warrant review, even with the label quality discussion needed.

Referee Report

2 major / 0 minor

Summary. The paper introduces the first large-scale benchmark for learning motion feasibility prediction from raw RGB-D point clouds for a 7-DOF manipulator in cluttered tabletop scenes. The benchmark comprises 2.7M grasp feasibility labels generated by sampling-based motion planners (SBMPs) across 88 scanned objects and 190 scenes. It evaluates three classifier families (MLP-based, volumetric-CNN, point-cloud transformer) under matched conditions and reports that the best model, GRASPFC-PTX, achieves an AUROC of 0.996 on novel objects while running significantly faster than SBMPs.

Significance. If the central results hold under more reliable labeling, the work would provide a valuable public benchmark and demonstrate that point-cloud transformers can deliver fast, high-accuracy feasibility predictions in realistic clutter, directly addressing the computational bottleneck of repeated infeasible SBMP calls in task-and-motion planning.

major comments (2)

[Abstract] Abstract, benchmark construction: the feasibility labels are produced by SBMPs, yet the manuscript does not quantify or bound the incompleteness of these planners in 7-DOF cluttered scenes. Because failure to return a path within a time budget does not certify true infeasibility, a non-negligible fraction of negative labels may be false negatives; this directly undermines the interpretation of the reported 0.996 AUROC as a measure of motion feasibility rather than agreement with one particular planner.
[Abstract] Abstract: the headline performance figure is given without reference to training/validation splits, class balance, error bars across random seeds, or ablation on label noise, making it impossible to determine whether the AUROC reflects genuine generalization or sensitivity to the particular SBMP timeout and sampling parameters used to create the benchmark.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the reliability of the benchmark labels and the clarity of the reported results. We agree that both major comments identify areas requiring revision and address them point-by-point below.

read point-by-point responses

Referee: [Abstract] Abstract, benchmark construction: the feasibility labels are produced by SBMPs, yet the manuscript does not quantify or bound the incompleteness of these planners in 7-DOF cluttered scenes. Because failure to return a path within a time budget does not certify true infeasibility, a non-negligible fraction of negative labels may be false negatives; this directly undermines the interpretation of the reported 0.996 AUROC as a measure of motion feasibility rather than agreement with one particular planner.

Authors: We agree that SBMPs are incomplete and that negative labels may contain false negatives; the reported AUROC therefore measures agreement with a specific planner rather than absolute motion feasibility. In the revised manuscript we will update the abstract and introduction to explicitly frame the task as predicting SBMP outcomes (a practically relevant proxy for avoiding expensive planning calls) and will state the planner timeout and sampling parameters used for label generation. We will also add a limitations paragraph discussing incompleteness. Precisely bounding the false-negative rate is not feasible without a complete 7-DOF planner, which lies outside the scope of this benchmark. revision: yes
Referee: [Abstract] Abstract: the headline performance figure is given without reference to training/validation splits, class balance, error bars across random seeds, or ablation on label noise, making it impossible to determine whether the AUROC reflects genuine generalization or sensitivity to the particular SBMP timeout and sampling parameters used to create the benchmark.

Authors: We will revise the abstract to include the essential reporting details: object-wise split (70 objects for training, 18 held-out novel objects for testing), class balance (~45% positive), mean AUROC over five random seeds with standard deviation, and a reference to supplementary ablations on sensitivity to SBMP timeout and sampling parameters. These elements already appear in Sections 4 and 5 and the supplement; the abstract will now foreground them. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a standard supervised learning pipeline: SBMP-generated labels on scanned scenes serve as training targets for classifiers (MLP, CNN, transformer) that take point clouds as input, with performance measured by AUROC on held-out novel objects and scenes. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the abstract or described benchmark construction. The reported AUROC measures generalization to unseen data rather than any reduction of outputs to inputs by construction, rendering the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no derivation, parameters, or new entities are visible. The central claim rests on the unstated premise that SBMP-generated labels are reliable ground truth.

axioms (1)

domain assumption Sampling-based motion planners produce reliable feasibility labels for training data
The benchmark is built from labels generated by SBMPs; this assumption is required for the supervised learning setup described in the abstract.

pith-pipeline@v0.9.1-grok · 5708 in / 1249 out tokens · 22691 ms · 2026-06-26T05:13:14.214904+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 6 canonical work pages

[1]

Orthey, C

A. Orthey, C. Chamzas, and L. E. Kavraki. Sampling-based motion planning: A comparative review.Annual Review of Control, Robotics, and Autonomous Systems, 7:285–310, 2024

2024
[2]

Li and N

S. Li and N. T. Dantam. A sampling and learning framework to prove motion planning infeasi- bility.The International Journal of Robotics Research, 0(0):02783649231154674, 2023. doi: 10.1177/02783649231154674. URLhttps://doi.org/10.1177/02783649231154674

work page doi:10.1177/02783649231154674 2023
[3]

Karaman and E

S. Karaman and E. Frazzoli. Sampling-based algorithms for optimal motion planning.The International Journal of Robotics Research, 30(7):846–894, 2011

2011
[4]

Zhang, Y

L. Zhang, Y . J. Kim, and D. Manocha. Efficient cell labelling and path non-existence com- putation using c-obstacle query.The International Journal of Robotics Research, 27(11-12): 1246–1257, 2008

2008
[5]

Li and N

S. Li and N. T. Dantam. Scaling infeasibility proofs via concurrent, codimension-one, locally- updated coxeter triangulation.IEEE Robotics and Automation Letters, 8(12):8303–8310, 2023

2023
[6]

Thomas, F

A. Thomas, F. Mastrogiovanni, and M. Baglietto. An Incremental Sampling and Segmentation- Based Approach for Motion Planning Infeasibility.arXiv preprint arXiv:2501.11434, 2025

arXiv 2025
[7]

L. P. Kaelbling and T. Lozano-P´erez. Integrated task and motion planning in belief space.The International Journal of Robotics Research, 32(9-10):1194–1227, 2013

2013
[8]

N. T. Dantam, Z. K. Kingston, S. Chaudhuri, and L. E. Kavraki. An Incremental Constraint- Based Framework for Task and Motion Planning.International Journal of Robotics Research, Special Issue on the 2016 Robotics: Science and Systems Conference, 37(10):1134–1151, 2018

2016
[9]

C. R. Garrett, T. Lozano-Perez, and L. P. Kaelbling. FFRob: Leveraging symbolic planning for efficient task and motion planning.The International Journal of Robotics Research, 37(1): 104–136, 2018

2018
[10]

Thomas, F

A. Thomas, F. Mastrogiovanni, and M. Baglietto. MPTP: Motion-planning-aware task plan- ning for navigation in belief space.Robotics and Autonomous Systems, 141:103786, 2021. ISSN 0921-8890. doi:https://doi.org/10.1016/j.robot.2021.103786. URLhttps://www. sciencedirect.com/science/article/pii/S0921889021000713

work page doi:10.1016/j.robot.2021.103786 2021
[11]

Stilman, J.-U

M. Stilman, J.-U. Schamburek, J. Kuffner, and T. Asfour. Manipulation planning among mov- able obstacles. InProceedings 2007 IEEE international conference on robotics and automa- tion, pages 3327–3332. IEEE, 2007

2007
[12]

Dogar and S

M. Dogar and S. Srinivasa. A framework for push-grasping in clutter. In N. R. Hugh Durrant- Whyte and P. Abbeel, editors,Proceedings of Robotics: Science and Systems VII, Los Angeles, CA, USA, June 2011. MIT Press. doi:10.15607/RSS.2011.VII.009

work page doi:10.15607/rss.2011.vii.009 2011
[13]

InProceedings of Robotics: Science and Systems, DOI: 10.15607/RSS

A. Krontiris and K. E. Bekris. Dealing with Difficult Instances of Object Rearrangement. In Proceedings of Robotics: Science and Systems XI, Rome, Italy, July 2015. doi:10.15607/RSS. 2015.XI.045

work page doi:10.15607/rss 2015
[14]

Karami, A

H. Karami, A. Thomas, and F. Mastrogiovanni. Task Allocation for Multi-robot Task and Mo- tion Planning: A Case for Object Picking in Cluttered Workspaces. InAIxIA 2021 – Advances in Artificial Intelligence, pages 3–17, Cham, 2022. Springer International Publishing. ISBN 978-3-031-08421-8

2021
[15]

Stilman and J

M. Stilman and J. J. Kuffner. Navigation among movable obstacles: Real-time reasoning in complex environments.International Journal of Humanoid Robotics, 2(04):479–503, 2005. 9

2005
[16]

In: IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023

J. Muguira-Iturralde, A. Curtis, Y . Du, L. P. Kaelbling, and T. Lozano-P´erez. Visibility-Aware Navigation Among Movable Obstacles. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 10083–10089, 2023. doi:10.1109/ICRA48891.2023.10160865

work page doi:10.1109/icra48891.2023.10160865 2023
[17]

H.-S. Fang, C. Wang, M. Gou, and C. Lu. GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11444–11453, 2020

2020
[18]

J. J. Kuffner and S. M. LaValle. Rrt-connect: An efficient approach to single-query path planning. InRobotics and Automation, 2000. Proceedings. ICRA’00. IEEE International Con- ference on, volume 2, pages 995–1001. IEEE, 2000

2000
[19]

A. M. Wells, N. T. Dantam, A. Shrivastava, and L. E. Kavraki. Learning feasibility for task and motion planning in tabletop environments.IEEE robotics and automation letters, 4(2): 1255–1262, 2019

2019
[20]

B. Kim, Z. Wang, L. P. Kaelbling, and T. Lozano-P ´erez. Learning to guide task and motion planning using score-space representation.The International Journal of Robotics Research, 38 (7):793–812, 2019

2019
[21]

Silver, R

T. Silver, R. Chitnis, A. Curtis, J. B. Tenenbaum, T. Lozano-P ´erez, and L. P. Kaelbling. Plan- ning with learned object importance in large problem instances using graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11962–11971, 2021

2021
[22]

M. J. McDonald and D. Hadfield-Menell. Guided imitation of task and motion planning. In Conference on Robot Learning, pages 630–640. PMLR, 2022

2022
[23]

Ait Bouhsain, R

S. Ait Bouhsain, R. Alami, and T. Simeon. Learning to predict action feasibility for task and motion planning in 3d environments. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3736–3742. IEEE, 2023

2023
[24]

Ait Bouhsain, R

S. Ait Bouhsain, R. Alami, and T. Simeon. Extending task and motion planning with fea- sibility prediction: Towards multi-robot manipulation planning of realistic objects. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10318– 10325. IEEE, 2024

2024
[25]

Z. Yang, C. R. Garrett, T. Lozano-Perez, L. Kaelbling, and D. Fox. Sequence-Based Plan Feasibility Prediction for Efficient Task and Motion Planning. InProceedings of Robotics: Science and Systems, Daegu, Republic of Korea, July 2023. doi:10.15607/RSS.2023.XIX.061

work page doi:10.15607/rss.2023.xix.061 2023
[26]

Driess, J.-S

D. Driess, J.-S. Ha, and M. Toussaint. Learning to solve sequential physical reasoning prob- lems from a scene image.The International Journal of Robotics Research, 40(12-14):1435– 1466, 2021

2021
[27]

Coumans and Y

E. Coumans and Y . Bai. PyBullet, a Python module for physics simulation for games, robotics and machine learning.http://pybullet.org, 2016–2021

2016
[28]

P. J. Besl and N. D. McKay. A method for registration of 3-d shapes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):239–256, 1992

1992
[29]

G. Qian, Y . Li, H. Peng, J. Mai, H. Hammoud, M. Elhoseiny, and B. Ghanem. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. InAdvances in Neural Information Processing Systems, volume 35, pages 23192–23204, 2022

2022
[30]

Jiang, Y

Z. Jiang, Y . Zhu, M. Svetlik, K. Fang, and Y . Zhu. Synergies between Affordance and Geom- etry: 6-DoF Grasp Detection via Implicit Representations. InRobotics: Science and Systems, 2021. 10

2021
[31]

Liang, X

H. Liang, X. Ma, S. Li, M. G ¨orner, S. Tang, B. Fang, F. Sun, and J. Zhang. PointNetGPD: Detecting Grasp Configurations from Point Sets. In2019 International Conference on Robotics and Automation (ICRA), pages 3629–3635. IEEE, 2019

2019
[32]

X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y . Qiao, W. Ouyang, T. He, and H. Zhao. Point Transformer V3: Simpler, Faster, Stronger. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4840–4851, 2024

2024
[33]

Darcet, M

T. Darcet, M. Oquab, J. Mairal, and P. Bojanowski. Vision Transformers Need Registers. In International Conference on Learning Representations, 2024

2024
[34]

Y . Zhou, C. Barnes, J. Lu, J. Yang, and H. Li. On the Continuity of Rotation Representations in Neural Networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019

2019
[35]

C. R. Qi, H. Su, K. Mo, and L. J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660, 2017

2017
[36]

Breyer, J

M. Breyer, J. J. Chung, L. Ott, R. Siegwart, and J. Nieto. V olumetric Grasping Network: Real- time 6 DOF Grasp Detection in Clutter. InConference on Robot Learning, pages 1602–1611, 2021

2021
[37]

L. Xu, T. Ren, G. Chalvatzaki, and J. Peters. Accelerating Integrated Task and Motion Planning with Neural Feasibility Checking.arXiv preprint arXiv:2203.10568, 2022

arXiv 2022
[38]

C. Deng, O. Litany, Y . Duan, A. Poulenard, A. Tagliasacchi, and L. Guibas. Vector Neurons: A General Framework for SO(3)-Equivariant Networks. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2021

2021
[39]

X. Wu, D. DeTone, D. Frost, T. Shen, C. Xie, N. Yang, J. Engel, R. Newcombe, H. Zhao, and J. Straub. Sonata: Self-Supervised Learning of Reliable Point Representations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 11 A Extended Ablations and Design Studies This appendix details the design studies behind the ma...

2025

[1] [1]

Orthey, C

A. Orthey, C. Chamzas, and L. E. Kavraki. Sampling-based motion planning: A comparative review.Annual Review of Control, Robotics, and Autonomous Systems, 7:285–310, 2024

2024

[2] [2]

Li and N

S. Li and N. T. Dantam. A sampling and learning framework to prove motion planning infeasi- bility.The International Journal of Robotics Research, 0(0):02783649231154674, 2023. doi: 10.1177/02783649231154674. URLhttps://doi.org/10.1177/02783649231154674

work page doi:10.1177/02783649231154674 2023

[3] [3]

Karaman and E

S. Karaman and E. Frazzoli. Sampling-based algorithms for optimal motion planning.The International Journal of Robotics Research, 30(7):846–894, 2011

2011

[4] [4]

Zhang, Y

L. Zhang, Y . J. Kim, and D. Manocha. Efficient cell labelling and path non-existence com- putation using c-obstacle query.The International Journal of Robotics Research, 27(11-12): 1246–1257, 2008

2008

[5] [5]

Li and N

S. Li and N. T. Dantam. Scaling infeasibility proofs via concurrent, codimension-one, locally- updated coxeter triangulation.IEEE Robotics and Automation Letters, 8(12):8303–8310, 2023

2023

[6] [6]

Thomas, F

A. Thomas, F. Mastrogiovanni, and M. Baglietto. An Incremental Sampling and Segmentation- Based Approach for Motion Planning Infeasibility.arXiv preprint arXiv:2501.11434, 2025

arXiv 2025

[7] [7]

L. P. Kaelbling and T. Lozano-P´erez. Integrated task and motion planning in belief space.The International Journal of Robotics Research, 32(9-10):1194–1227, 2013

2013

[8] [8]

N. T. Dantam, Z. K. Kingston, S. Chaudhuri, and L. E. Kavraki. An Incremental Constraint- Based Framework for Task and Motion Planning.International Journal of Robotics Research, Special Issue on the 2016 Robotics: Science and Systems Conference, 37(10):1134–1151, 2018

2016

[9] [9]

C. R. Garrett, T. Lozano-Perez, and L. P. Kaelbling. FFRob: Leveraging symbolic planning for efficient task and motion planning.The International Journal of Robotics Research, 37(1): 104–136, 2018

2018

[10] [10]

Thomas, F

A. Thomas, F. Mastrogiovanni, and M. Baglietto. MPTP: Motion-planning-aware task plan- ning for navigation in belief space.Robotics and Autonomous Systems, 141:103786, 2021. ISSN 0921-8890. doi:https://doi.org/10.1016/j.robot.2021.103786. URLhttps://www. sciencedirect.com/science/article/pii/S0921889021000713

work page doi:10.1016/j.robot.2021.103786 2021

[11] [11]

Stilman, J.-U

M. Stilman, J.-U. Schamburek, J. Kuffner, and T. Asfour. Manipulation planning among mov- able obstacles. InProceedings 2007 IEEE international conference on robotics and automa- tion, pages 3327–3332. IEEE, 2007

2007

[12] [12]

Dogar and S

M. Dogar and S. Srinivasa. A framework for push-grasping in clutter. In N. R. Hugh Durrant- Whyte and P. Abbeel, editors,Proceedings of Robotics: Science and Systems VII, Los Angeles, CA, USA, June 2011. MIT Press. doi:10.15607/RSS.2011.VII.009

work page doi:10.15607/rss.2011.vii.009 2011

[13] [13]

InProceedings of Robotics: Science and Systems, DOI: 10.15607/RSS

A. Krontiris and K. E. Bekris. Dealing with Difficult Instances of Object Rearrangement. In Proceedings of Robotics: Science and Systems XI, Rome, Italy, July 2015. doi:10.15607/RSS. 2015.XI.045

work page doi:10.15607/rss 2015

[14] [14]

Karami, A

H. Karami, A. Thomas, and F. Mastrogiovanni. Task Allocation for Multi-robot Task and Mo- tion Planning: A Case for Object Picking in Cluttered Workspaces. InAIxIA 2021 – Advances in Artificial Intelligence, pages 3–17, Cham, 2022. Springer International Publishing. ISBN 978-3-031-08421-8

2021

[15] [15]

Stilman and J

M. Stilman and J. J. Kuffner. Navigation among movable obstacles: Real-time reasoning in complex environments.International Journal of Humanoid Robotics, 2(04):479–503, 2005. 9

2005

[16] [16]

In: IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023

J. Muguira-Iturralde, A. Curtis, Y . Du, L. P. Kaelbling, and T. Lozano-P´erez. Visibility-Aware Navigation Among Movable Obstacles. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 10083–10089, 2023. doi:10.1109/ICRA48891.2023.10160865

work page doi:10.1109/icra48891.2023.10160865 2023

[17] [17]

H.-S. Fang, C. Wang, M. Gou, and C. Lu. GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11444–11453, 2020

2020

[18] [18]

J. J. Kuffner and S. M. LaValle. Rrt-connect: An efficient approach to single-query path planning. InRobotics and Automation, 2000. Proceedings. ICRA’00. IEEE International Con- ference on, volume 2, pages 995–1001. IEEE, 2000

2000

[19] [19]

A. M. Wells, N. T. Dantam, A. Shrivastava, and L. E. Kavraki. Learning feasibility for task and motion planning in tabletop environments.IEEE robotics and automation letters, 4(2): 1255–1262, 2019

2019

[20] [20]

B. Kim, Z. Wang, L. P. Kaelbling, and T. Lozano-P ´erez. Learning to guide task and motion planning using score-space representation.The International Journal of Robotics Research, 38 (7):793–812, 2019

2019

[21] [21]

Silver, R

T. Silver, R. Chitnis, A. Curtis, J. B. Tenenbaum, T. Lozano-P ´erez, and L. P. Kaelbling. Plan- ning with learned object importance in large problem instances using graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11962–11971, 2021

2021

[22] [22]

M. J. McDonald and D. Hadfield-Menell. Guided imitation of task and motion planning. In Conference on Robot Learning, pages 630–640. PMLR, 2022

2022

[23] [23]

Ait Bouhsain, R

S. Ait Bouhsain, R. Alami, and T. Simeon. Learning to predict action feasibility for task and motion planning in 3d environments. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3736–3742. IEEE, 2023

2023

[24] [24]

Ait Bouhsain, R

S. Ait Bouhsain, R. Alami, and T. Simeon. Extending task and motion planning with fea- sibility prediction: Towards multi-robot manipulation planning of realistic objects. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10318– 10325. IEEE, 2024

2024

[25] [25]

Z. Yang, C. R. Garrett, T. Lozano-Perez, L. Kaelbling, and D. Fox. Sequence-Based Plan Feasibility Prediction for Efficient Task and Motion Planning. InProceedings of Robotics: Science and Systems, Daegu, Republic of Korea, July 2023. doi:10.15607/RSS.2023.XIX.061

work page doi:10.15607/rss.2023.xix.061 2023

[26] [26]

Driess, J.-S

D. Driess, J.-S. Ha, and M. Toussaint. Learning to solve sequential physical reasoning prob- lems from a scene image.The International Journal of Robotics Research, 40(12-14):1435– 1466, 2021

2021

[27] [27]

Coumans and Y

E. Coumans and Y . Bai. PyBullet, a Python module for physics simulation for games, robotics and machine learning.http://pybullet.org, 2016–2021

2016

[28] [28]

P. J. Besl and N. D. McKay. A method for registration of 3-d shapes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):239–256, 1992

1992

[29] [29]

G. Qian, Y . Li, H. Peng, J. Mai, H. Hammoud, M. Elhoseiny, and B. Ghanem. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. InAdvances in Neural Information Processing Systems, volume 35, pages 23192–23204, 2022

2022

[30] [30]

Jiang, Y

Z. Jiang, Y . Zhu, M. Svetlik, K. Fang, and Y . Zhu. Synergies between Affordance and Geom- etry: 6-DoF Grasp Detection via Implicit Representations. InRobotics: Science and Systems, 2021. 10

2021

[31] [31]

Liang, X

H. Liang, X. Ma, S. Li, M. G ¨orner, S. Tang, B. Fang, F. Sun, and J. Zhang. PointNetGPD: Detecting Grasp Configurations from Point Sets. In2019 International Conference on Robotics and Automation (ICRA), pages 3629–3635. IEEE, 2019

2019

[32] [32]

X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y . Qiao, W. Ouyang, T. He, and H. Zhao. Point Transformer V3: Simpler, Faster, Stronger. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4840–4851, 2024

2024

[33] [33]

Darcet, M

T. Darcet, M. Oquab, J. Mairal, and P. Bojanowski. Vision Transformers Need Registers. In International Conference on Learning Representations, 2024

2024

[34] [34]

Y . Zhou, C. Barnes, J. Lu, J. Yang, and H. Li. On the Continuity of Rotation Representations in Neural Networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019

2019

[35] [35]

C. R. Qi, H. Su, K. Mo, and L. J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660, 2017

2017

[36] [36]

Breyer, J

M. Breyer, J. J. Chung, L. Ott, R. Siegwart, and J. Nieto. V olumetric Grasping Network: Real- time 6 DOF Grasp Detection in Clutter. InConference on Robot Learning, pages 1602–1611, 2021

2021

[37] [37]

L. Xu, T. Ren, G. Chalvatzaki, and J. Peters. Accelerating Integrated Task and Motion Planning with Neural Feasibility Checking.arXiv preprint arXiv:2203.10568, 2022

arXiv 2022

[38] [38]

C. Deng, O. Litany, Y . Duan, A. Poulenard, A. Tagliasacchi, and L. Guibas. Vector Neurons: A General Framework for SO(3)-Equivariant Networks. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2021

2021

[39] [39]

X. Wu, D. DeTone, D. Frost, T. Shen, C. Xie, N. Yang, J. Engel, R. Newcombe, H. Zhao, and J. Straub. Sonata: Self-Supervised Learning of Reliable Point Representations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 11 A Extended Ablations and Design Studies This appendix details the design studies behind the ma...

2025