Towards a Multi-Embodied Grasping Agent
Pith reviewed 2026-05-18 03:00 UTC · model grok-4.3
The pith
A grasp synthesis method handles any gripper by deducing its full kinematics from shape and scene geometry alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a data-efficient, flow-based, equivariant grasp synthesis architecture can handle different gripper types with variable degrees of freedom by successfully exploiting the underlying kinematic model, with all necessary information deduced solely from the gripper and scene geometry. The method translates every module from the ground up to JAX to support batching over scenes, grippers, and grasps, which improves learning smoothness, performance, and inference speed. Supporting evidence comes from a dataset spanning humanoid hands to parallel yaw grippers, 25,000 scenes, and 20 million grasps.
What carries the argument
The flow-based equivariant grasp synthesis architecture that deduces the gripper kinematic model directly from gripper geometry and scene geometry inputs.
If this is right
- The same model can be applied to grippers with varying numbers of degrees of freedom without retraining or new kinematic labels.
- Batching over scenes, grippers, and grasps in the JAX implementation produces faster inference and smoother optimization than prior equivariant methods.
- A single trained system achieves grasping success across humanoid hands and parallel yaw grippers.
- Training data requirements drop because the architecture does not need embodiment-specific large-scale datasets.
Where Pith is reading between the lines
- The geometry-only deduction could extend to other manipulation primitives such as in-hand reorientation or tool use.
- Real-robot experiments with previously unseen gripper designs would provide a direct check on whether the inferred kinematics transfer without fine-tuning.
- Combining the architecture with online scene reconstruction might allow a robot to select and use a new gripper on the fly.
Load-bearing premise
The full kinematic model of any gripper can be accurately recovered from static gripper and scene geometry without explicit parameters or extra supervision.
What would settle it
A test set of grippers whose motion cannot be inferred from geometry alone, such as those with hidden joints or non-rigid compliance, where the model produces invalid or unsafe grasp predictions.
Figures
read the original abstract
Multi-embodiment grasping focuses on developing approaches that exhibit generalist behavior across diverse gripper designs. Existing methods often learn the kinematic structure of the robot implicitly and face challenges due to the difficulty of sourcing the required large-scale data. In this work, we present a data-efficient, flow-based, equivariant grasp synthesis architecture that can handle different gripper types with variable degrees of freedom and successfully exploit the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry. Unlike previous equivariant grasping methods, we translated all modules from the ground up to JAX and provide a model with batching capabilities over scenes, grippers, and grasps, resulting in smoother learning, improved performance and faster inference time. Our dataset encompasses grippers ranging from humanoid hands to parallel yaw grippers and includes 25,000 scenes and 20 million grasps.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a data-efficient, flow-based, equivariant grasp synthesis architecture for multi-embodiment grasping. It claims to handle grippers with variable degrees of freedom by exploiting the underlying kinematic model, deducing all necessary information solely from gripper and scene geometry inputs. The work includes a JAX reimplementation with batching over scenes, grippers, and grasps for improved performance and inference speed, supported by a dataset of 25,000 scenes and 20 million grasps spanning humanoid hands to parallel yaw grippers.
Significance. If the central claims are substantiated, the approach could meaningfully advance generalist grasping by reducing reliance on embodiment-specific data and explicit kinematic parameters, enabling better generalization across diverse grippers through geometric inputs alone. The JAX-based batching and large-scale dataset are practical strengths that could support reproducible follow-up work.
major comments (1)
- [Abstract] Abstract: The load-bearing claim that the architecture 'successfully exploit[s] the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry' without explicit kinematic parameters requires stronger substantiation. Static geometry (meshes or point clouds) does not encode joint axes, limits, or configuration spaces for variable-DOF grippers; any kinematic exploitation must therefore be shown to arise from geometry rather than implicit learning on the 20M-grasp training set. Generalization experiments on unseen gripper topologies would directly test this distinction.
minor comments (1)
- The abstract references improved performance and faster inference but does not include quantitative metrics, baselines, or ablation results; adding these details with specific numbers and comparisons would improve clarity without altering the core contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for identifying the need for stronger substantiation of the central claim in the abstract. We address this point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The load-bearing claim that the architecture 'successfully exploit[s] the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry' without explicit kinematic parameters requires stronger substantiation. Static geometry (meshes or point clouds) does not encode joint axes, limits, or configuration spaces for variable-DOF grippers; any kinematic exploitation must therefore be shown to arise from geometry rather than implicit learning on the 20M-grasp training set. Generalization experiments on unseen gripper topologies would directly test this distinction.
Authors: We appreciate the referee highlighting this distinction. Our model receives only geometric inputs (gripper meshes or point clouds together with scene geometry) and no explicit kinematic parameters such as joint axes, limits, or configuration spaces at any stage. The flow-based equivariant architecture is trained to produce grasp distributions that respect the feasible motions of each gripper by learning from the geometric structure and the associated successful grasps in the dataset. Results across the range of embodiments (humanoid hands to parallel yaw grippers) show that the network generates kinematically plausible outputs for each gripper geometry without being supplied joint information. We acknowledge that this capability is acquired through training on the 20 million grasps rather than from an analytic kinematic model. To strengthen the presentation, we will revise the manuscript to include a clearer discussion of how geometric inputs alone enable the model to infer valid grasp configurations and to add further analysis of performance under gripper variations. revision: partial
Circularity Check
No circularity; derivation relies on learned equivariant flow model from explicit geometry inputs and large dataset
full rationale
The paper's central claim is that a flow-based equivariant architecture, trained on 20M grasps across 25k scenes and diverse grippers, can exploit kinematics implicitly from gripper/scene geometry alone. This is presented as an empirical capability of the JAX-implemented model rather than a mathematical derivation that reduces to its inputs by construction. No self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided abstract and claims. The architecture is described as translating modules from the ground up with batching, yielding performance gains, but the kinematic deduction is an assumption about what the trained model achieves, not a step that equates output to input definitionally. The approach is self-contained against external benchmarks via the dataset and equivariance properties.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Kinematic structure of a gripper can be deduced solely from its geometry and the scene geometry
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present a data-efficient, flow-based, equivariant grasp synthesis architecture that can handle different gripper types with variable degrees of freedom and successfully exploit the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Gendexgrasp: Generalizable dexterous grasping,
P. Liet al., “Gendexgrasp: Generalizable dexterous grasping,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 8068–8074
work page 2023
-
[2]
Geometry matching for multi-embodiment grasping,
M. Attarianet al., “Geometry matching for multi-embodiment grasping,” inConference on Robot Learning. PMLR, 2023, pp. 1242–1256
work page 2023
-
[3]
J. Urain, N. Funk, J. Peters, and G. Chalvatzaki, “Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5923–5930
work page 2023
-
[4]
H. Ryuet al., “Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 007–18 018
work page 2024
-
[5]
Orbitgrasp: Se (3)-equivariant grasp learning,
B. Huet al., “Orbitgrasp: Se (3)-equivariant grasp learning,” in8th Annual Conference on Robot Learning, 2024
work page 2024
-
[6]
RiEMann: Near real-time SE(3)-equivariant robot manip- ulation without point cloud segmentation,
C. Gaoet al., “RiEMann: Near real-time SE(3)-equivariant robot manip- ulation without point cloud segmentation,” in8th Annual Conference on Robot Learning, 2024
work page 2024
-
[7]
Equibot: Sim(3)-equivariant diffusion policy for generalizable and data efficient learning,
J. Yang, Z.-a. Cao, C. Deng, R. Antonova, S. Song, and J. Bohg, “Equibot: Sim(3)-equivariant diffusion policy for generalizable and data efficient learning,” in8th Annual Conference on Robot Learning, 2024
work page 2024
-
[8]
SE(3)-equivariant diffusion policy in spherical fourier space,
X. Zhu, F. Wang, R. Walters, and J. Shi, “SE(3)-equivariant diffusion policy in spherical fourier space,” inF orty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=U5nRMOs8Ed
work page 2025
-
[9]
JAX: composable transformations of Python+NumPy programs,
J. Bradburyet al., “JAX: composable transformations of Python+NumPy programs,” 2018. [Online]. Available: http://github.com/jax-ml/jax
work page 2018
-
[10]
Diffusion for multi-embodiment grasping,
R. Freiberg, A. Qualmann, N. A. Vien, and G. Neumann, “Diffusion for multi-embodiment grasping,”IEEE Robotics and Automation Letters, 2025
work page 2025
-
[11]
Contact- graspnet: Efficient 6-dof grasp generation in cluttered scenes,
M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact- graspnet: Efficient 6-dof grasp generation in cluttered scenes,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 13 438–13 444
work page 2021
-
[12]
V olumetric grasping network: Real-time 6 dof grasp detection in clutter,
M. Breyer, J. J. Chung, L. Ott, S. Roland, and N. Juan, “V olumetric grasping network: Real-time 6 dof grasp detection in clutter,” in Conference on Robot Learning, 2020
work page 2020
-
[13]
Synergies between affordance and geometry: 6-dof grasp detection via implicit representations,
Z. Jiang, Y . Zhu, M. Svetlik, K. Fang, and Y . Zhu, “Synergies between affordance and geometry: 6-dof grasp detection via implicit representations,”Robotics: science and systems, 2021
work page 2021
-
[14]
ACRONYM: A large-scale grasp dataset based on simulation,
C. Eppner, A. Mousavian, and D. Fox, “ACRONYM: A large-scale grasp dataset based on simulation,” in2021 IEEE Int. Conf. on Robotics and Automation, ICRA, 2020
work page 2020
-
[15]
Graspnet-1billion: A large-scale benchmark for general object grasping,
H.-S. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-scale benchmark for general object grasping,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 441–11 450
work page 2020
-
[16]
Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes,
J. Zhanget al., “Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes,” in8th Annual Conference on Robot Learning, 2024
work page 2024
-
[17]
Fast-grasp’d: Dexterous multi-finger grasp generation through differentiable simulation,
D. Turpinet al., “Fast-grasp’d: Dexterous multi-finger grasp generation through differentiable simulation,” inICRA, 2023
work page 2023
-
[18]
W. Wanet al., “Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3891–3902
work page 2023
-
[19]
Ugg: Unified generative grasping,
J. Luet al., “Ugg: Unified generative grasping,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 414–433
work page 2024
-
[20]
Y .-L. Weiet al., “Afforddexgrasp: Open-set language-guided dexter- ous grasp with generalizable-instructive affordance,”arXiv preprint arXiv:2503.07360, 2025
-
[21]
Dexgraspvla: A vision-language-action framework towards general dexterous grasping,
Y . Zhonget al., “Dexgraspvla: A vision-language-action framework towards general dexterous grasping,”arXiv preprint arXiv:2502.20900, 2025
-
[22]
Dexgrasp anything: Towards universal robotic dexterous grasping with physics awareness,
Y . Zhong, Q. Jiang, J. Yu, and Y . Ma, “Dexgrasp anything: Towards universal robotic dexterous grasping with physics awareness,” inPro- ceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 22 584–22 594
work page 2025
-
[23]
Multi- grippergrasp: A dataset for robotic grasping from parallel jaw grippers to dexterous hands,
L. F. Casas, N. Khargonkar, B. Prabhakaran, and Y . Xiang, “Multi- grippergrasp: A dataset for robotic grasping from parallel jaw grippers to dexterous hands,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 2978–2984
work page 2024
-
[24]
J. Chen, Y . Ke, and H. Wang, “Bodex: Scalable and efficient robotic dexterous grasp synthesis using bilevel optimization,”arXiv preprint arXiv:2412.16490, 2024. 8
-
[25]
Get a grip: Multi-finger grasp evaluation at scale enables robust sim-to-real transfer,
T. G. W. Lumet al., “Get a grip: Multi-finger grasp evaluation at scale enables robust sim-to-real transfer,” in8th Annual Conference on Robot Learning, 2024
work page 2024
-
[26]
Z. Weiet al., “D (r, o) grasp: A unified representation of robot and object interaction for cross-embodiment dexterous grasping,”CoRR, 2024
work page 2024
-
[27]
Robotfinger- print: Unified gripper coordinate space for multi-gripper grasp synthesis,
N. Khargonkar, L. F. Casas, , B. Prabhakaran, and Y . Xiang, “Robotfinger- print: Unified gripper coordinate space for multi-gripper grasp synthesis,” arXiv preprint arXiv:2409.14519, 2024
-
[28]
Z. Wu, R. A. Potamias, X. Zhang, Z. Zhang, J. Deng, and S. Luo, “Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations,”arXiv preprint arXiv:2509.24661, 2025
-
[29]
Adagrasp: Learning an adaptive gripper-aware grasping policy,
Z. Xu, B. Qi, S. Agrawal, and S. Song, “Adagrasp: Learning an adaptive gripper-aware grasping policy,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 4620–4626
work page 2021
-
[30]
Cross-embodiment dexterous hand articulation generation via morphology-aware learning,
H. Zhang, K. Y . Ma, M. Z. Shou, W. Lin, and Y . Wu, “Cross-embodiment dexterous hand articulation generation via morphology-aware learning,” arXiv preprint arXiv:2510.06068, 2025
-
[31]
H.-S. Fang, H. Yan, Z. Tang, H. Fang, C. Wang, and C. Lu, “Anydexgrasp: General dexterous grasping for different hands with human-level learning efficiency,”arXiv preprint arXiv:2502.16420, 2025
-
[32]
M. Bonyani, M. Soleymani, and C. Wang, “Multi-agent deep reinforce- ment learning for variable-finger dexterous grasping through multi-stream embedding fusion,” inICRA 2025 Workshop”Handy Moves: Dexterity in Multi-Fingered Hands”Paper Submission, 2025
work page 2025
-
[33]
Planning with diffusion for flexible behavior synthesis,
M. Janner, Y . Du, J. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” inInternational Conference on Machine Learning, 2022
work page 2022
-
[34]
Actionflow: Equivariant, accurate, and efficient policies with spatially symmetric flow matching,
N. Funk, J. Urain, J. Carvalho, V . Prasad, G. Chalvatzaki, and J. Peters, “Actionflow: Equivariant, accurate, and efficient policies with spatially symmetric flow matching,” 2024
work page 2024
-
[35]
Graspldm: Generative 6-dof grasp synthesis using latent diffusion models,
K. R. Barad, A. Orsula, A. Richard, J. Dentler, M. Olivares-Mendez, and C. Martinez, “Graspldm: Generative 6-dof grasp synthesis using latent diffusion models,”IEEE Access, 2024
work page 2024
-
[36]
Don’t Start From Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion,
K. Chen, E. Lim, L. Kelvin, Y . Chen, and H. Soh, “Don’t Start From Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion,” inProceedings of Robotics: Science and Systems, Delft, Netherlands, July 2024
work page 2024
-
[37]
Equigraspflow: Se(3)- equivariant 6-dof grasp pose generative flows,
B. Lim, J. Kim, J. Kim, Y . Lee, and F. C. Park, “Equigraspflow: Se(3)- equivariant 6-dof grasp pose generative flows,” in8th Annual Conference on Robot Learning, 2024
work page 2024
-
[38]
Se (3)-stochastic flow matching for protein backbone generation,
J. Boseet al., “Se (3)-stochastic flow matching for protein backbone generation,” inThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[39]
Se (3) diffusion model with application to protein backbone generation,
J. Yimet al., “Se (3) diffusion model with application to protein backbone generation,”arXiv preprint arXiv:2302.02277, 2023
-
[40]
Improved motif-scaffolding with SE(3) flow matching,
Y . Jasonet al., “Improved motif-scaffolding with SE(3) flow matching,” Transactions on Machine Learning Research, 2024. [Online]. Available: https://openreview.net/forum?id=fa1ne8xDGn
work page 2024
-
[41]
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
K. Wuet al., “Robomind: Benchmark on multi-embodiment intelligence normative data for robot manipulation,”arXiv preprint arXiv:2412.13877, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[42]
Droid: A large-scale in-the-wild robot manipulation dataset,
A. Khazatskyet al., “Droid: A large-scale in-the-wild robot manipulation dataset,” inRSS 2024 Workshop: Data Generation for Robotics, 2024
work page 2024
-
[43]
Q. Buet al., “Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems,”arXiv preprint arXiv:2503.06669, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
OpenVLA: An Open-Source Vision-Language-Action Model
M. Kimet al., “Openvla: An open-source vision-language-action model,” arXiv preprint arXiv:2406.09246, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
Octo: An open-source generalist robot policy,
Octo Model Teamet al., “Octo: An open-source generalist robot policy,” inProceedings of Robotics: Science and Systems, Delft, Netherlands, 2024
work page 2024
-
[46]
Open x-embodiment: Robotic learning datasets and RT-x models,
Q. Vuonget al., “Open x-embodiment: Robotic learning datasets and RT-x models,” inTowards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition @ CoRL2023, 2023
work page 2023
-
[47]
Pushing the limits of cross-embodiment learning for manipulation and navigation,
J. Yanget al., “Pushing the limits of cross-embodiment learning for manipulation and navigation,” inProceedings of Robotics: Science and Systems, Delft, Netherlands, 07 2024
work page 2024
-
[48]
Real-Time Execution of Action Chunking Flow Policies
K. Black, M. Y . Galliker, and S. Levine, “Real-time execution of action chunking flow policies,”arXiv preprint arXiv:2506.07339, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[49]
Latent policy steering with embodiment-agnostic pretrained world models,
Y . Wang, M. Verghese, and J. Schneider, “Latent policy steering with embodiment-agnostic pretrained world models,”arXiv preprint arXiv:2507.13340, 2025
-
[50]
One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,
N. Bohlingeret al., “One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,” in8th Annual Conference on Robot Learning, 2025
work page 2025
-
[51]
S. Yanget al., “Multi-loco: Unifying multi-embodiment legged loco- motion via reinforcement learning augmented diffusion,”arXiv preprint arXiv:2506.11470, 2025
-
[52]
Towards embodiment scaling laws in robot locomotion,
B. Aiet al., “Towards embodiment scaling laws in robot locomotion,” arXiv preprint arXiv:2505.05753, 2025
-
[53]
Unpaired image-to-image translation using cycle-consistent adversarial networkss,
J.-Y . Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networkss,” inComputer Vision (ICCV), 2017 IEEE International Conference on, 2017
work page 2017
-
[54]
Mirage: Cross-embodiment zero-shot policy transfer with cross-painting,
C. Lawrence, H. Kush, D. Karthik, X. Chenfeng, V . Quan, and G. Ken, “Mirage: Cross-embodiment zero-shot policy transfer with cross-painting,” inRobotics: Science and Systems, 2024
work page 2024
-
[55]
M. Lepert, R. Doshi, and J. Bohg, “Shadow: Leveraging segmen- tation masks for cross-embodiment policy transfer,”arXiv preprint arXiv:2503.00774, 2025
-
[56]
Group equivariant convolutional networks,
T. Cohen and M. Welling, “Group equivariant convolutional networks,” inInternational conference on machine learning. PMLR, 2016, pp. 2990–2999
work page 2016
-
[57]
Harmonic networks: Deep translation and rotation equivariance,
D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow, “Harmonic networks: Deep translation and rotation equivariance,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5028–5037
work page 2017
-
[58]
E. J. Bekkers, “B-spline cnns on lie groups,” inInternational Conference on Learning Representations, 2020
work page 2020
-
[59]
M. Weiler, P. Forr´e, E. Verlinde, and M. Welling, “Coordinate independent convolutional networks–isometry and gauge equivariant convolutions on riemannian manifolds,”arXiv preprint arXiv:2106.06020, 2021
-
[60]
Deepgcns: Can gcns go as deep as cnns?
G. Li, M. M ¨uller, A. Thabet, and B. Ghanem, “Deepgcns: Can gcns go as deep as cnns?” inThe IEEE International Conference on Computer Vision (ICCV), 2019
work page 2019
-
[61]
Learning local equivariant representations for large- scale atomistic dynamics,
A. Musaelianet al., “Learning local equivariant representations for large- scale atomistic dynamics,”Nature Communications, vol. 14, no. 1, p. 579, 2023
work page 2023
-
[62]
Foundationstereo: Zero-shot stereo matching,
B. Wen, M. Trepte, J. Aribido, J. Kautz, O. Gallo, and S. Birchfield, “Foundationstereo: Zero-shot stereo matching,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5249– 5260
work page 2025
-
[63]
Vggt: Visual geometry grounded transformer,
J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
work page 2025
-
[64]
Reducing so (3) convolutions to so (2) for efficient equivariant gnns,
S. Passaro and C. L. Zitnick, “Reducing so (3) convolutions to so (2) for efficient equivariant gnns,” inInternational conference on machine learning. PMLR, 2023, pp. 27 420–27 438
work page 2023
-
[65]
EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations,
Y .-L. Liao, B. Wood, A. Das*, and T. Smidt*, “EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https://openreview.net/forum?id=mCOBKZmrzD
work page 2024
-
[66]
Sonata: Self-supervised learning of reliable point representations,
X. Wuet al., “Sonata: Self-supervised learning of reliable point representations,” inCVPR, 2025
work page 2025
-
[67]
Mujoco: A physics engine for model- based control,
E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model- based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033
work page 2012
-
[68]
Google scanned objects: A high-quality dataset of 3d scanned household items,
L. Downset al., “Google scanned objects: A high-quality dataset of 3d scanned household items,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2553–2560
work page 2022
-
[69]
The ycb object and model set: Towards common benchmarks for manipulation research,
B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar, “The ycb object and model set: Towards common benchmarks for manipulation research,” in2015 international conference on advanced robotics (ICAR). IEEE, 2015, pp. 510–517
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.