Towards a Multi-Embodied Grasping Agent

Alexander Qualmann; Gerhard Neumann; Ngo Anh Vien; Roman Freiberg

arxiv: 2510.27420 · v3 · submitted 2025-10-31 · 💻 cs.RO

Towards a Multi-Embodied Grasping Agent

Roman Freiberg , Alexander Qualmann , Ngo Anh Vien , Gerhard Neumann This is my paper

Pith reviewed 2026-05-18 03:00 UTC · model grok-4.3

classification 💻 cs.RO

keywords multi-embodiment graspingequivariant grasp synthesisflow-based architecturegripper geometrykinematic model deductionrobotic manipulationdata-efficient learning

0 comments

The pith

A grasp synthesis method handles any gripper by deducing its full kinematics from shape and scene geometry alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a grasping approach that works across many different robot hands and grippers without large custom datasets for each design. It builds a flow-based equivariant architecture that reads the gripper geometry and the scene to determine how the gripper can move and what grasps are possible. This replaces the need to supply explicit joint parameters or train separately on each embodiment. A reader would care because most current grasping systems are locked to one robot hand and demand enormous amounts of new data when the hardware changes. If the claim holds, robots could switch between humanoid hands, parallel jaws, and other designs using the same learned model.

Core claim

The central claim is that a data-efficient, flow-based, equivariant grasp synthesis architecture can handle different gripper types with variable degrees of freedom by successfully exploiting the underlying kinematic model, with all necessary information deduced solely from the gripper and scene geometry. The method translates every module from the ground up to JAX to support batching over scenes, grippers, and grasps, which improves learning smoothness, performance, and inference speed. Supporting evidence comes from a dataset spanning humanoid hands to parallel yaw grippers, 25,000 scenes, and 20 million grasps.

What carries the argument

The flow-based equivariant grasp synthesis architecture that deduces the gripper kinematic model directly from gripper geometry and scene geometry inputs.

If this is right

The same model can be applied to grippers with varying numbers of degrees of freedom without retraining or new kinematic labels.
Batching over scenes, grippers, and grasps in the JAX implementation produces faster inference and smoother optimization than prior equivariant methods.
A single trained system achieves grasping success across humanoid hands and parallel yaw grippers.
Training data requirements drop because the architecture does not need embodiment-specific large-scale datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The geometry-only deduction could extend to other manipulation primitives such as in-hand reorientation or tool use.
Real-robot experiments with previously unseen gripper designs would provide a direct check on whether the inferred kinematics transfer without fine-tuning.
Combining the architecture with online scene reconstruction might allow a robot to select and use a new gripper on the fly.

Load-bearing premise

The full kinematic model of any gripper can be accurately recovered from static gripper and scene geometry without explicit parameters or extra supervision.

What would settle it

A test set of grippers whose motion cannot be inferred from geometry alone, such as those with hidden joints or non-rigid compliance, where the model produces invalid or unsafe grasp predictions.

Figures

Figures reproduced from arXiv: 2510.27420 by Alexander Qualmann, Gerhard Neumann, Ngo Anh Vien, Roman Freiberg.

**Figure 1.** Figure 1: Equivariant Gripper Embeddings. An initial gripper configuration (a) is represented by a learned feature embedding z. After a physical joint rotation ∆R, the gripper is in a new configuration (b). Our method ensures the features are correspondingly transformed via the Wigner-D matrices, z ′ = D(∆R)z, keeping the representation consistent with the physical state. II. RELATED WORK Grasp detection approaches … view at source ↗

**Figure 2.** Figure 2: Method Overview. (Left) Grippers are represented with per-joint equivariant embeddings. (a) Full Pipeline. A scene point cloud is encoded into a multi-scale equivariant feature pyramid. Time-conditioned joint features query this pyramid to extract pose and joint information. These scene-aware queries are then decoded to predict flow gradients, which generate the final pre-grasp configuration. (b) Kinematic… view at source ↗

**Figure 3.** Figure 3: Multi-Embodiment Grasp Synthesis Examples. Renderings of three sampled pre-grasp configurations for five distinct grippers in cluttered scenes. Included grippers (a) ViperX 300s parallel gripper, (b) Franka Emika parallel gripper, (c) DEX-EE dexterous hand, (d) Allegro Hand, and (e) Shadow Hand. B. Geometric Gripper Encoding The gripper encoder produces configuration-aware, equivariant query features repre… view at source ↗

read the original abstract

Multi-embodiment grasping focuses on developing approaches that exhibit generalist behavior across diverse gripper designs. Existing methods often learn the kinematic structure of the robot implicitly and face challenges due to the difficulty of sourcing the required large-scale data. In this work, we present a data-efficient, flow-based, equivariant grasp synthesis architecture that can handle different gripper types with variable degrees of freedom and successfully exploit the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry. Unlike previous equivariant grasping methods, we translated all modules from the ground up to JAX and provide a model with batching capabilities over scenes, grippers, and grasps, resulting in smoother learning, improved performance and faster inference time. Our dataset encompasses grippers ranging from humanoid hands to parallel yaw grippers and includes 25,000 scenes and 20 million grasps.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The JAX reimplementation with batching and the large multi-gripper dataset are the practical parts worth noting, but the geometry-only kinematic deduction claim rests on shaky ground.

read the letter

Colleague, the main thing to know is that this paper ports equivariant grasping to JAX with batching over scenes, grippers, and grasps, and releases a dataset of 25,000 scenes and 20 million grasps spanning humanoid hands to parallel yaw grippers. That infrastructure work is the clearest addition to prior equivariant methods. They also claim the model exploits the kinematic structure by pulling everything needed from gripper and scene geometry alone, without explicit parameters. The JAX translation and batching support are straightforward engineering wins that should improve training stability and speed up inference. Building a dataset at that scale for variable-DOF grippers is also useful for anyone trying to move beyond single-robot setups. Those elements address a real practical issue in getting generalist grasping to work across hardware. The soft spot is the kinematic deduction part. Static geometry inputs like point clouds or meshes do not contain joint axes, limits, or actuation ranges. Any success on different gripper topologies almost certainly comes from patterns picked up across the 20 million training grasps rather than direct geometric reasoning. The stress-test concern holds up on the available details. Without ablations that isolate geometry-only performance or tests on truly unseen gripper designs, the claim that the model deduces the full kinematic model remains hard to evaluate. This paper is aimed at robotics researchers working on multi-embodiment grasping and data-efficient methods. A reader who needs code or data for variable grippers could extract value from the implementation choices and dataset construction. I would send it to peer review. The reimplementation details and scale of the data make it worth a closer look from referees who can check the experiments.

Referee Report

1 major / 1 minor

Summary. The manuscript presents a data-efficient, flow-based, equivariant grasp synthesis architecture for multi-embodiment grasping. It claims to handle grippers with variable degrees of freedom by exploiting the underlying kinematic model, deducing all necessary information solely from gripper and scene geometry inputs. The work includes a JAX reimplementation with batching over scenes, grippers, and grasps for improved performance and inference speed, supported by a dataset of 25,000 scenes and 20 million grasps spanning humanoid hands to parallel yaw grippers.

Significance. If the central claims are substantiated, the approach could meaningfully advance generalist grasping by reducing reliance on embodiment-specific data and explicit kinematic parameters, enabling better generalization across diverse grippers through geometric inputs alone. The JAX-based batching and large-scale dataset are practical strengths that could support reproducible follow-up work.

major comments (1)

[Abstract] Abstract: The load-bearing claim that the architecture 'successfully exploit[s] the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry' without explicit kinematic parameters requires stronger substantiation. Static geometry (meshes or point clouds) does not encode joint axes, limits, or configuration spaces for variable-DOF grippers; any kinematic exploitation must therefore be shown to arise from geometry rather than implicit learning on the 20M-grasp training set. Generalization experiments on unseen gripper topologies would directly test this distinction.

minor comments (1)

The abstract references improved performance and faster inference but does not include quantitative metrics, baselines, or ablation results; adding these details with specific numbers and comparisons would improve clarity without altering the core contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for identifying the need for stronger substantiation of the central claim in the abstract. We address this point below.

read point-by-point responses

Referee: [Abstract] Abstract: The load-bearing claim that the architecture 'successfully exploit[s] the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry' without explicit kinematic parameters requires stronger substantiation. Static geometry (meshes or point clouds) does not encode joint axes, limits, or configuration spaces for variable-DOF grippers; any kinematic exploitation must therefore be shown to arise from geometry rather than implicit learning on the 20M-grasp training set. Generalization experiments on unseen gripper topologies would directly test this distinction.

Authors: We appreciate the referee highlighting this distinction. Our model receives only geometric inputs (gripper meshes or point clouds together with scene geometry) and no explicit kinematic parameters such as joint axes, limits, or configuration spaces at any stage. The flow-based equivariant architecture is trained to produce grasp distributions that respect the feasible motions of each gripper by learning from the geometric structure and the associated successful grasps in the dataset. Results across the range of embodiments (humanoid hands to parallel yaw grippers) show that the network generates kinematically plausible outputs for each gripper geometry without being supplied joint information. We acknowledge that this capability is acquired through training on the 20 million grasps rather than from an analytic kinematic model. To strengthen the presentation, we will revise the manuscript to include a clearer discussion of how geometric inputs alone enable the model to infer valid grasp configurations and to add further analysis of performance under gripper variations. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation relies on learned equivariant flow model from explicit geometry inputs and large dataset

full rationale

The paper's central claim is that a flow-based equivariant architecture, trained on 20M grasps across 25k scenes and diverse grippers, can exploit kinematics implicitly from gripper/scene geometry alone. This is presented as an empirical capability of the JAX-implemented model rather than a mathematical derivation that reduces to its inputs by construction. No self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided abstract and claims. The architecture is described as translating modules from the ground up with batching, yielding performance gains, but the kinematic deduction is an assumption about what the trained model achieves, not a step that equates output to input definitionally. The approach is self-contained against external benchmarks via the dataset and equivariance properties.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that geometry alone suffices to recover kinematics for variable-DoF grippers; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Kinematic structure of a gripper can be deduced solely from its geometry and the scene geometry
Explicitly stated in the abstract as the basis for handling different gripper types without additional inputs.

pith-pipeline@v0.9.0 · 5678 in / 1197 out tokens · 34068 ms · 2026-05-18T03:00:41.588586+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present a data-efficient, flow-based, equivariant grasp synthesis architecture that can handle different gripper types with variable degrees of freedom and successfully exploit the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 4 internal anchors

[1]

Gendexgrasp: Generalizable dexterous grasping,

P. Liet al., “Gendexgrasp: Generalizable dexterous grasping,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 8068–8074

work page 2023
[2]

Geometry matching for multi-embodiment grasping,

M. Attarianet al., “Geometry matching for multi-embodiment grasping,” inConference on Robot Learning. PMLR, 2023, pp. 1242–1256

work page 2023
[3]

Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,

J. Urain, N. Funk, J. Peters, and G. Chalvatzaki, “Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5923–5930

work page 2023
[4]

Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,

H. Ryuet al., “Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 007–18 018

work page 2024
[5]

Orbitgrasp: Se (3)-equivariant grasp learning,

B. Huet al., “Orbitgrasp: Se (3)-equivariant grasp learning,” in8th Annual Conference on Robot Learning, 2024

work page 2024
[6]

RiEMann: Near real-time SE(3)-equivariant robot manip- ulation without point cloud segmentation,

C. Gaoet al., “RiEMann: Near real-time SE(3)-equivariant robot manip- ulation without point cloud segmentation,” in8th Annual Conference on Robot Learning, 2024

work page 2024
[7]

Equibot: Sim(3)-equivariant diffusion policy for generalizable and data efficient learning,

J. Yang, Z.-a. Cao, C. Deng, R. Antonova, S. Song, and J. Bohg, “Equibot: Sim(3)-equivariant diffusion policy for generalizable and data efficient learning,” in8th Annual Conference on Robot Learning, 2024

work page 2024
[8]

SE(3)-equivariant diffusion policy in spherical fourier space,

X. Zhu, F. Wang, R. Walters, and J. Shi, “SE(3)-equivariant diffusion policy in spherical fourier space,” inF orty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=U5nRMOs8Ed

work page 2025
[9]

JAX: composable transformations of Python+NumPy programs,

J. Bradburyet al., “JAX: composable transformations of Python+NumPy programs,” 2018. [Online]. Available: http://github.com/jax-ml/jax

work page 2018
[10]

Diffusion for multi-embodiment grasping,

R. Freiberg, A. Qualmann, N. A. Vien, and G. Neumann, “Diffusion for multi-embodiment grasping,”IEEE Robotics and Automation Letters, 2025

work page 2025
[11]

Contact- graspnet: Efficient 6-dof grasp generation in cluttered scenes,

M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact- graspnet: Efficient 6-dof grasp generation in cluttered scenes,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 13 438–13 444

work page 2021
[12]

V olumetric grasping network: Real-time 6 dof grasp detection in clutter,

M. Breyer, J. J. Chung, L. Ott, S. Roland, and N. Juan, “V olumetric grasping network: Real-time 6 dof grasp detection in clutter,” in Conference on Robot Learning, 2020

work page 2020
[13]

Synergies between affordance and geometry: 6-dof grasp detection via implicit representations,

Z. Jiang, Y . Zhu, M. Svetlik, K. Fang, and Y . Zhu, “Synergies between affordance and geometry: 6-dof grasp detection via implicit representations,”Robotics: science and systems, 2021

work page 2021
[14]

ACRONYM: A large-scale grasp dataset based on simulation,

C. Eppner, A. Mousavian, and D. Fox, “ACRONYM: A large-scale grasp dataset based on simulation,” in2021 IEEE Int. Conf. on Robotics and Automation, ICRA, 2020

work page 2020
[15]

Graspnet-1billion: A large-scale benchmark for general object grasping,

H.-S. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-scale benchmark for general object grasping,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 441–11 450

work page 2020
[16]

Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes,

J. Zhanget al., “Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes,” in8th Annual Conference on Robot Learning, 2024

work page 2024
[17]

Fast-grasp’d: Dexterous multi-finger grasp generation through differentiable simulation,

D. Turpinet al., “Fast-grasp’d: Dexterous multi-finger grasp generation through differentiable simulation,” inICRA, 2023

work page 2023
[18]

Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning,

W. Wanet al., “Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3891–3902

work page 2023
[19]

Ugg: Unified generative grasping,

J. Luet al., “Ugg: Unified generative grasping,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 414–433

work page 2024
[20]

Afforddexgrasp: Open-set language-guided dexter- ous grasp with generalizable-instructive affordance,

Y .-L. Weiet al., “Afforddexgrasp: Open-set language-guided dexter- ous grasp with generalizable-instructive affordance,”arXiv preprint arXiv:2503.07360, 2025

work page arXiv 2025
[21]

Dexgraspvla: A vision-language-action framework towards general dexterous grasping,

Y . Zhonget al., “Dexgraspvla: A vision-language-action framework towards general dexterous grasping,”arXiv preprint arXiv:2502.20900, 2025

work page arXiv 2025
[22]

Dexgrasp anything: Towards universal robotic dexterous grasping with physics awareness,

Y . Zhong, Q. Jiang, J. Yu, and Y . Ma, “Dexgrasp anything: Towards universal robotic dexterous grasping with physics awareness,” inPro- ceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 22 584–22 594

work page 2025
[23]

Multi- grippergrasp: A dataset for robotic grasping from parallel jaw grippers to dexterous hands,

L. F. Casas, N. Khargonkar, B. Prabhakaran, and Y . Xiang, “Multi- grippergrasp: A dataset for robotic grasping from parallel jaw grippers to dexterous hands,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 2978–2984

work page 2024
[24]

Bodex: Scalable and efficient robotic dexterous grasp synthesis using bilevel optimization.arXiv preprint arXiv:2412.16490, 2024

J. Chen, Y . Ke, and H. Wang, “Bodex: Scalable and efficient robotic dexterous grasp synthesis using bilevel optimization,”arXiv preprint arXiv:2412.16490, 2024. 8

work page arXiv 2024
[25]

Get a grip: Multi-finger grasp evaluation at scale enables robust sim-to-real transfer,

T. G. W. Lumet al., “Get a grip: Multi-finger grasp evaluation at scale enables robust sim-to-real transfer,” in8th Annual Conference on Robot Learning, 2024

work page 2024
[26]

D (r, o) grasp: A unified representation of robot and object interaction for cross-embodiment dexterous grasping,

Z. Weiet al., “D (r, o) grasp: A unified representation of robot and object interaction for cross-embodiment dexterous grasping,”CoRR, 2024

work page 2024
[27]

Robotfinger- print: Unified gripper coordinate space for multi-gripper grasp synthesis,

N. Khargonkar, L. F. Casas, , B. Prabhakaran, and Y . Xiang, “Robotfinger- print: Unified gripper coordinate space for multi-gripper grasp synthesis,” arXiv preprint arXiv:2409.14519, 2024

work page arXiv 2024
[28]

Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations.ArXiv, abs/2509.24661, 2025

Z. Wu, R. A. Potamias, X. Zhang, Z. Zhang, J. Deng, and S. Luo, “Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations,”arXiv preprint arXiv:2509.24661, 2025

work page arXiv 2025
[29]

Adagrasp: Learning an adaptive gripper-aware grasping policy,

Z. Xu, B. Qi, S. Agrawal, and S. Song, “Adagrasp: Learning an adaptive gripper-aware grasping policy,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 4620–4626

work page 2021
[30]

Cross-embodiment dexterous hand articulation generation via morphology-aware learning,

H. Zhang, K. Y . Ma, M. Z. Shou, W. Lin, and Y . Wu, “Cross-embodiment dexterous hand articulation generation via morphology-aware learning,” arXiv preprint arXiv:2510.06068, 2025

work page arXiv 2025
[31]

Anydexgrasp: General dexterous grasping for different hands with human-level learning efficiency.ArXiv, abs/2502.16420, 2025

H.-S. Fang, H. Yan, Z. Tang, H. Fang, C. Wang, and C. Lu, “Anydexgrasp: General dexterous grasping for different hands with human-level learning efficiency,”arXiv preprint arXiv:2502.16420, 2025

work page arXiv 2025
[32]

Multi-agent deep reinforce- ment learning for variable-finger dexterous grasping through multi-stream embedding fusion,

M. Bonyani, M. Soleymani, and C. Wang, “Multi-agent deep reinforce- ment learning for variable-finger dexterous grasping through multi-stream embedding fusion,” inICRA 2025 Workshop”Handy Moves: Dexterity in Multi-Fingered Hands”Paper Submission, 2025

work page 2025
[33]

Planning with diffusion for flexible behavior synthesis,

M. Janner, Y . Du, J. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” inInternational Conference on Machine Learning, 2022

work page 2022
[34]

Actionflow: Equivariant, accurate, and efficient policies with spatially symmetric flow matching,

N. Funk, J. Urain, J. Carvalho, V . Prasad, G. Chalvatzaki, and J. Peters, “Actionflow: Equivariant, accurate, and efficient policies with spatially symmetric flow matching,” 2024

work page 2024
[35]

Graspldm: Generative 6-dof grasp synthesis using latent diffusion models,

K. R. Barad, A. Orsula, A. Richard, J. Dentler, M. Olivares-Mendez, and C. Martinez, “Graspldm: Generative 6-dof grasp synthesis using latent diffusion models,”IEEE Access, 2024

work page 2024
[36]

Don’t Start From Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion,

K. Chen, E. Lim, L. Kelvin, Y . Chen, and H. Soh, “Don’t Start From Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion,” inProceedings of Robotics: Science and Systems, Delft, Netherlands, July 2024

work page 2024
[37]

Equigraspflow: Se(3)- equivariant 6-dof grasp pose generative flows,

B. Lim, J. Kim, J. Kim, Y . Lee, and F. C. Park, “Equigraspflow: Se(3)- equivariant 6-dof grasp pose generative flows,” in8th Annual Conference on Robot Learning, 2024

work page 2024
[38]

Se (3)-stochastic flow matching for protein backbone generation,

J. Boseet al., “Se (3)-stochastic flow matching for protein backbone generation,” inThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[39]

Se (3) diffusion model with application to protein backbone generation,

J. Yimet al., “Se (3) diffusion model with application to protein backbone generation,”arXiv preprint arXiv:2302.02277, 2023

work page arXiv 2023
[40]

Improved motif-scaffolding with SE(3) flow matching,

Y . Jasonet al., “Improved motif-scaffolding with SE(3) flow matching,” Transactions on Machine Learning Research, 2024. [Online]. Available: https://openreview.net/forum?id=fa1ne8xDGn

work page 2024
[41]

RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

K. Wuet al., “Robomind: Benchmark on multi-embodiment intelligence normative data for robot manipulation,”arXiv preprint arXiv:2412.13877, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[42]

Droid: A large-scale in-the-wild robot manipulation dataset,

A. Khazatskyet al., “Droid: A large-scale in-the-wild robot manipulation dataset,” inRSS 2024 Workshop: Data Generation for Robotics, 2024

work page 2024
[43]

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

Q. Buet al., “Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems,”arXiv preprint arXiv:2503.06669, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

OpenVLA: An Open-Source Vision-Language-Action Model

M. Kimet al., “Openvla: An open-source vision-language-action model,” arXiv preprint arXiv:2406.09246, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[45]

Octo: An open-source generalist robot policy,

Octo Model Teamet al., “Octo: An open-source generalist robot policy,” inProceedings of Robotics: Science and Systems, Delft, Netherlands, 2024

work page 2024
[46]

Open x-embodiment: Robotic learning datasets and RT-x models,

Q. Vuonget al., “Open x-embodiment: Robotic learning datasets and RT-x models,” inTowards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition @ CoRL2023, 2023

work page 2023
[47]

Pushing the limits of cross-embodiment learning for manipulation and navigation,

J. Yanget al., “Pushing the limits of cross-embodiment learning for manipulation and navigation,” inProceedings of Robotics: Science and Systems, Delft, Netherlands, 07 2024

work page 2024
[48]

Real-Time Execution of Action Chunking Flow Policies

K. Black, M. Y . Galliker, and S. Levine, “Real-time execution of action chunking flow policies,”arXiv preprint arXiv:2506.07339, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[49]

Latent policy steering with embodiment-agnostic pretrained world models,

Y . Wang, M. Verghese, and J. Schneider, “Latent policy steering with embodiment-agnostic pretrained world models,”arXiv preprint arXiv:2507.13340, 2025

work page arXiv 2025
[50]

One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,

N. Bohlingeret al., “One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,” in8th Annual Conference on Robot Learning, 2025

work page 2025
[51]

Multi-loco: Unifying multi-embodiment legged loco- motion via reinforcement learning augmented diffusion,

S. Yanget al., “Multi-loco: Unifying multi-embodiment legged loco- motion via reinforcement learning augmented diffusion,”arXiv preprint arXiv:2506.11470, 2025

work page arXiv 2025
[52]

Towards embodiment scaling laws in robot locomotion,

B. Aiet al., “Towards embodiment scaling laws in robot locomotion,” arXiv preprint arXiv:2505.05753, 2025

work page arXiv 2025
[53]

Unpaired image-to-image translation using cycle-consistent adversarial networkss,

J.-Y . Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networkss,” inComputer Vision (ICCV), 2017 IEEE International Conference on, 2017

work page 2017
[54]

Mirage: Cross-embodiment zero-shot policy transfer with cross-painting,

C. Lawrence, H. Kush, D. Karthik, X. Chenfeng, V . Quan, and G. Ken, “Mirage: Cross-embodiment zero-shot policy transfer with cross-painting,” inRobotics: Science and Systems, 2024

work page 2024
[55]

Shadow: Leveraging segmentation masks for cross-embodiment policy transfer.arXiv preprint arXiv:2503.00774, 2025

M. Lepert, R. Doshi, and J. Bohg, “Shadow: Leveraging segmen- tation masks for cross-embodiment policy transfer,”arXiv preprint arXiv:2503.00774, 2025

work page arXiv 2025
[56]

Group equivariant convolutional networks,

T. Cohen and M. Welling, “Group equivariant convolutional networks,” inInternational conference on machine learning. PMLR, 2016, pp. 2990–2999

work page 2016
[57]

Harmonic networks: Deep translation and rotation equivariance,

D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow, “Harmonic networks: Deep translation and rotation equivariance,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5028–5037

work page 2017
[58]

B-spline cnns on lie groups,

E. J. Bekkers, “B-spline cnns on lie groups,” inInternational Conference on Learning Representations, 2020

work page 2020
[59]

3D steerable CNNs: Learning rotationally equivariant features in volumetric data.Advances in Neural information processing systems, 31, 2018a

M. Weiler, P. Forr´e, E. Verlinde, and M. Welling, “Coordinate independent convolutional networks–isometry and gauge equivariant convolutions on riemannian manifolds,”arXiv preprint arXiv:2106.06020, 2021

work page arXiv 2021
[60]

Deepgcns: Can gcns go as deep as cnns?

G. Li, M. M ¨uller, A. Thabet, and B. Ghanem, “Deepgcns: Can gcns go as deep as cnns?” inThe IEEE International Conference on Computer Vision (ICCV), 2019

work page 2019
[61]

Learning local equivariant representations for large- scale atomistic dynamics,

A. Musaelianet al., “Learning local equivariant representations for large- scale atomistic dynamics,”Nature Communications, vol. 14, no. 1, p. 579, 2023

work page 2023
[62]

Foundationstereo: Zero-shot stereo matching,

B. Wen, M. Trepte, J. Aribido, J. Kautz, O. Gallo, and S. Birchfield, “Foundationstereo: Zero-shot stereo matching,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5249– 5260

work page 2025
[63]

Vggt: Visual geometry grounded transformer,

J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

work page 2025
[64]

Reducing so (3) convolutions to so (2) for efficient equivariant gnns,

S. Passaro and C. L. Zitnick, “Reducing so (3) convolutions to so (2) for efficient equivariant gnns,” inInternational conference on machine learning. PMLR, 2023, pp. 27 420–27 438

work page 2023
[65]

EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations,

Y .-L. Liao, B. Wood, A. Das*, and T. Smidt*, “EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https://openreview.net/forum?id=mCOBKZmrzD

work page 2024
[66]

Sonata: Self-supervised learning of reliable point representations,

X. Wuet al., “Sonata: Self-supervised learning of reliable point representations,” inCVPR, 2025

work page 2025
[67]

Mujoco: A physics engine for model- based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model- based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033

work page 2012
[68]

Google scanned objects: A high-quality dataset of 3d scanned household items,

L. Downset al., “Google scanned objects: A high-quality dataset of 3d scanned household items,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2553–2560

work page 2022
[69]

The ycb object and model set: Towards common benchmarks for manipulation research,

B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar, “The ycb object and model set: Towards common benchmarks for manipulation research,” in2015 international conference on advanced robotics (ICAR). IEEE, 2015, pp. 510–517

work page 2015

[1] [1]

Gendexgrasp: Generalizable dexterous grasping,

P. Liet al., “Gendexgrasp: Generalizable dexterous grasping,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 8068–8074

work page 2023

[2] [2]

Geometry matching for multi-embodiment grasping,

M. Attarianet al., “Geometry matching for multi-embodiment grasping,” inConference on Robot Learning. PMLR, 2023, pp. 1242–1256

work page 2023

[3] [3]

Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,

J. Urain, N. Funk, J. Peters, and G. Chalvatzaki, “Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5923–5930

work page 2023

[4] [4]

Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,

H. Ryuet al., “Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 007–18 018

work page 2024

[5] [5]

Orbitgrasp: Se (3)-equivariant grasp learning,

B. Huet al., “Orbitgrasp: Se (3)-equivariant grasp learning,” in8th Annual Conference on Robot Learning, 2024

work page 2024

[6] [6]

RiEMann: Near real-time SE(3)-equivariant robot manip- ulation without point cloud segmentation,

C. Gaoet al., “RiEMann: Near real-time SE(3)-equivariant robot manip- ulation without point cloud segmentation,” in8th Annual Conference on Robot Learning, 2024

work page 2024

[7] [7]

Equibot: Sim(3)-equivariant diffusion policy for generalizable and data efficient learning,

J. Yang, Z.-a. Cao, C. Deng, R. Antonova, S. Song, and J. Bohg, “Equibot: Sim(3)-equivariant diffusion policy for generalizable and data efficient learning,” in8th Annual Conference on Robot Learning, 2024

work page 2024

[8] [8]

SE(3)-equivariant diffusion policy in spherical fourier space,

X. Zhu, F. Wang, R. Walters, and J. Shi, “SE(3)-equivariant diffusion policy in spherical fourier space,” inF orty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=U5nRMOs8Ed

work page 2025

[9] [9]

JAX: composable transformations of Python+NumPy programs,

J. Bradburyet al., “JAX: composable transformations of Python+NumPy programs,” 2018. [Online]. Available: http://github.com/jax-ml/jax

work page 2018

[10] [10]

Diffusion for multi-embodiment grasping,

R. Freiberg, A. Qualmann, N. A. Vien, and G. Neumann, “Diffusion for multi-embodiment grasping,”IEEE Robotics and Automation Letters, 2025

work page 2025

[11] [11]

Contact- graspnet: Efficient 6-dof grasp generation in cluttered scenes,

M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact- graspnet: Efficient 6-dof grasp generation in cluttered scenes,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 13 438–13 444

work page 2021

[12] [12]

V olumetric grasping network: Real-time 6 dof grasp detection in clutter,

M. Breyer, J. J. Chung, L. Ott, S. Roland, and N. Juan, “V olumetric grasping network: Real-time 6 dof grasp detection in clutter,” in Conference on Robot Learning, 2020

work page 2020

[13] [13]

Synergies between affordance and geometry: 6-dof grasp detection via implicit representations,

Z. Jiang, Y . Zhu, M. Svetlik, K. Fang, and Y . Zhu, “Synergies between affordance and geometry: 6-dof grasp detection via implicit representations,”Robotics: science and systems, 2021

work page 2021

[14] [14]

ACRONYM: A large-scale grasp dataset based on simulation,

C. Eppner, A. Mousavian, and D. Fox, “ACRONYM: A large-scale grasp dataset based on simulation,” in2021 IEEE Int. Conf. on Robotics and Automation, ICRA, 2020

work page 2020

[15] [15]

Graspnet-1billion: A large-scale benchmark for general object grasping,

H.-S. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-scale benchmark for general object grasping,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 441–11 450

work page 2020

[16] [16]

Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes,

J. Zhanget al., “Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes,” in8th Annual Conference on Robot Learning, 2024

work page 2024

[17] [17]

Fast-grasp’d: Dexterous multi-finger grasp generation through differentiable simulation,

D. Turpinet al., “Fast-grasp’d: Dexterous multi-finger grasp generation through differentiable simulation,” inICRA, 2023

work page 2023

[18] [18]

Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning,

W. Wanet al., “Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3891–3902

work page 2023

[19] [19]

Ugg: Unified generative grasping,

J. Luet al., “Ugg: Unified generative grasping,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 414–433

work page 2024

[20] [20]

Afforddexgrasp: Open-set language-guided dexter- ous grasp with generalizable-instructive affordance,

Y .-L. Weiet al., “Afforddexgrasp: Open-set language-guided dexter- ous grasp with generalizable-instructive affordance,”arXiv preprint arXiv:2503.07360, 2025

work page arXiv 2025

[21] [21]

Dexgraspvla: A vision-language-action framework towards general dexterous grasping,

Y . Zhonget al., “Dexgraspvla: A vision-language-action framework towards general dexterous grasping,”arXiv preprint arXiv:2502.20900, 2025

work page arXiv 2025

[22] [22]

Dexgrasp anything: Towards universal robotic dexterous grasping with physics awareness,

Y . Zhong, Q. Jiang, J. Yu, and Y . Ma, “Dexgrasp anything: Towards universal robotic dexterous grasping with physics awareness,” inPro- ceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 22 584–22 594

work page 2025

[23] [23]

Multi- grippergrasp: A dataset for robotic grasping from parallel jaw grippers to dexterous hands,

L. F. Casas, N. Khargonkar, B. Prabhakaran, and Y . Xiang, “Multi- grippergrasp: A dataset for robotic grasping from parallel jaw grippers to dexterous hands,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 2978–2984

work page 2024

[24] [24]

Bodex: Scalable and efficient robotic dexterous grasp synthesis using bilevel optimization.arXiv preprint arXiv:2412.16490, 2024

J. Chen, Y . Ke, and H. Wang, “Bodex: Scalable and efficient robotic dexterous grasp synthesis using bilevel optimization,”arXiv preprint arXiv:2412.16490, 2024. 8

work page arXiv 2024

[25] [25]

Get a grip: Multi-finger grasp evaluation at scale enables robust sim-to-real transfer,

T. G. W. Lumet al., “Get a grip: Multi-finger grasp evaluation at scale enables robust sim-to-real transfer,” in8th Annual Conference on Robot Learning, 2024

work page 2024

[26] [26]

D (r, o) grasp: A unified representation of robot and object interaction for cross-embodiment dexterous grasping,

Z. Weiet al., “D (r, o) grasp: A unified representation of robot and object interaction for cross-embodiment dexterous grasping,”CoRR, 2024

work page 2024

[27] [27]

Robotfinger- print: Unified gripper coordinate space for multi-gripper grasp synthesis,

N. Khargonkar, L. F. Casas, , B. Prabhakaran, and Y . Xiang, “Robotfinger- print: Unified gripper coordinate space for multi-gripper grasp synthesis,” arXiv preprint arXiv:2409.14519, 2024

work page arXiv 2024

[28] [28]

Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations.ArXiv, abs/2509.24661, 2025

Z. Wu, R. A. Potamias, X. Zhang, Z. Zhang, J. Deng, and S. Luo, “Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations,”arXiv preprint arXiv:2509.24661, 2025

work page arXiv 2025

[29] [29]

Adagrasp: Learning an adaptive gripper-aware grasping policy,

Z. Xu, B. Qi, S. Agrawal, and S. Song, “Adagrasp: Learning an adaptive gripper-aware grasping policy,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 4620–4626

work page 2021

[30] [30]

Cross-embodiment dexterous hand articulation generation via morphology-aware learning,

H. Zhang, K. Y . Ma, M. Z. Shou, W. Lin, and Y . Wu, “Cross-embodiment dexterous hand articulation generation via morphology-aware learning,” arXiv preprint arXiv:2510.06068, 2025

work page arXiv 2025

[31] [31]

Anydexgrasp: General dexterous grasping for different hands with human-level learning efficiency.ArXiv, abs/2502.16420, 2025

H.-S. Fang, H. Yan, Z. Tang, H. Fang, C. Wang, and C. Lu, “Anydexgrasp: General dexterous grasping for different hands with human-level learning efficiency,”arXiv preprint arXiv:2502.16420, 2025

work page arXiv 2025

[32] [32]

Multi-agent deep reinforce- ment learning for variable-finger dexterous grasping through multi-stream embedding fusion,

M. Bonyani, M. Soleymani, and C. Wang, “Multi-agent deep reinforce- ment learning for variable-finger dexterous grasping through multi-stream embedding fusion,” inICRA 2025 Workshop”Handy Moves: Dexterity in Multi-Fingered Hands”Paper Submission, 2025

work page 2025

[33] [33]

Planning with diffusion for flexible behavior synthesis,

M. Janner, Y . Du, J. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” inInternational Conference on Machine Learning, 2022

work page 2022

[34] [34]

Actionflow: Equivariant, accurate, and efficient policies with spatially symmetric flow matching,

N. Funk, J. Urain, J. Carvalho, V . Prasad, G. Chalvatzaki, and J. Peters, “Actionflow: Equivariant, accurate, and efficient policies with spatially symmetric flow matching,” 2024

work page 2024

[35] [35]

Graspldm: Generative 6-dof grasp synthesis using latent diffusion models,

K. R. Barad, A. Orsula, A. Richard, J. Dentler, M. Olivares-Mendez, and C. Martinez, “Graspldm: Generative 6-dof grasp synthesis using latent diffusion models,”IEEE Access, 2024

work page 2024

[36] [36]

Don’t Start From Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion,

K. Chen, E. Lim, L. Kelvin, Y . Chen, and H. Soh, “Don’t Start From Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion,” inProceedings of Robotics: Science and Systems, Delft, Netherlands, July 2024

work page 2024

[37] [37]

Equigraspflow: Se(3)- equivariant 6-dof grasp pose generative flows,

B. Lim, J. Kim, J. Kim, Y . Lee, and F. C. Park, “Equigraspflow: Se(3)- equivariant 6-dof grasp pose generative flows,” in8th Annual Conference on Robot Learning, 2024

work page 2024

[38] [38]

Se (3)-stochastic flow matching for protein backbone generation,

J. Boseet al., “Se (3)-stochastic flow matching for protein backbone generation,” inThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[39] [39]

Se (3) diffusion model with application to protein backbone generation,

J. Yimet al., “Se (3) diffusion model with application to protein backbone generation,”arXiv preprint arXiv:2302.02277, 2023

work page arXiv 2023

[40] [40]

Improved motif-scaffolding with SE(3) flow matching,

Y . Jasonet al., “Improved motif-scaffolding with SE(3) flow matching,” Transactions on Machine Learning Research, 2024. [Online]. Available: https://openreview.net/forum?id=fa1ne8xDGn

work page 2024

[41] [41]

RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

K. Wuet al., “Robomind: Benchmark on multi-embodiment intelligence normative data for robot manipulation,”arXiv preprint arXiv:2412.13877, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[42] [42]

Droid: A large-scale in-the-wild robot manipulation dataset,

A. Khazatskyet al., “Droid: A large-scale in-the-wild robot manipulation dataset,” inRSS 2024 Workshop: Data Generation for Robotics, 2024

work page 2024

[43] [43]

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

Q. Buet al., “Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems,”arXiv preprint arXiv:2503.06669, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

OpenVLA: An Open-Source Vision-Language-Action Model

M. Kimet al., “Openvla: An open-source vision-language-action model,” arXiv preprint arXiv:2406.09246, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[45] [45]

Octo: An open-source generalist robot policy,

Octo Model Teamet al., “Octo: An open-source generalist robot policy,” inProceedings of Robotics: Science and Systems, Delft, Netherlands, 2024

work page 2024

[46] [46]

Open x-embodiment: Robotic learning datasets and RT-x models,

Q. Vuonget al., “Open x-embodiment: Robotic learning datasets and RT-x models,” inTowards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition @ CoRL2023, 2023

work page 2023

[47] [47]

Pushing the limits of cross-embodiment learning for manipulation and navigation,

J. Yanget al., “Pushing the limits of cross-embodiment learning for manipulation and navigation,” inProceedings of Robotics: Science and Systems, Delft, Netherlands, 07 2024

work page 2024

[48] [48]

Real-Time Execution of Action Chunking Flow Policies

K. Black, M. Y . Galliker, and S. Levine, “Real-time execution of action chunking flow policies,”arXiv preprint arXiv:2506.07339, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[49] [49]

Latent policy steering with embodiment-agnostic pretrained world models,

Y . Wang, M. Verghese, and J. Schneider, “Latent policy steering with embodiment-agnostic pretrained world models,”arXiv preprint arXiv:2507.13340, 2025

work page arXiv 2025

[50] [50]

One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,

N. Bohlingeret al., “One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,” in8th Annual Conference on Robot Learning, 2025

work page 2025

[51] [51]

Multi-loco: Unifying multi-embodiment legged loco- motion via reinforcement learning augmented diffusion,

S. Yanget al., “Multi-loco: Unifying multi-embodiment legged loco- motion via reinforcement learning augmented diffusion,”arXiv preprint arXiv:2506.11470, 2025

work page arXiv 2025

[52] [52]

Towards embodiment scaling laws in robot locomotion,

B. Aiet al., “Towards embodiment scaling laws in robot locomotion,” arXiv preprint arXiv:2505.05753, 2025

work page arXiv 2025

[53] [53]

Unpaired image-to-image translation using cycle-consistent adversarial networkss,

J.-Y . Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networkss,” inComputer Vision (ICCV), 2017 IEEE International Conference on, 2017

work page 2017

[54] [54]

Mirage: Cross-embodiment zero-shot policy transfer with cross-painting,

C. Lawrence, H. Kush, D. Karthik, X. Chenfeng, V . Quan, and G. Ken, “Mirage: Cross-embodiment zero-shot policy transfer with cross-painting,” inRobotics: Science and Systems, 2024

work page 2024

[55] [55]

Shadow: Leveraging segmentation masks for cross-embodiment policy transfer.arXiv preprint arXiv:2503.00774, 2025

M. Lepert, R. Doshi, and J. Bohg, “Shadow: Leveraging segmen- tation masks for cross-embodiment policy transfer,”arXiv preprint arXiv:2503.00774, 2025

work page arXiv 2025

[56] [56]

Group equivariant convolutional networks,

T. Cohen and M. Welling, “Group equivariant convolutional networks,” inInternational conference on machine learning. PMLR, 2016, pp. 2990–2999

work page 2016

[57] [57]

Harmonic networks: Deep translation and rotation equivariance,

D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow, “Harmonic networks: Deep translation and rotation equivariance,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5028–5037

work page 2017

[58] [58]

B-spline cnns on lie groups,

E. J. Bekkers, “B-spline cnns on lie groups,” inInternational Conference on Learning Representations, 2020

work page 2020

[59] [59]

3D steerable CNNs: Learning rotationally equivariant features in volumetric data.Advances in Neural information processing systems, 31, 2018a

M. Weiler, P. Forr´e, E. Verlinde, and M. Welling, “Coordinate independent convolutional networks–isometry and gauge equivariant convolutions on riemannian manifolds,”arXiv preprint arXiv:2106.06020, 2021

work page arXiv 2021

[60] [60]

Deepgcns: Can gcns go as deep as cnns?

G. Li, M. M ¨uller, A. Thabet, and B. Ghanem, “Deepgcns: Can gcns go as deep as cnns?” inThe IEEE International Conference on Computer Vision (ICCV), 2019

work page 2019

[61] [61]

Learning local equivariant representations for large- scale atomistic dynamics,

A. Musaelianet al., “Learning local equivariant representations for large- scale atomistic dynamics,”Nature Communications, vol. 14, no. 1, p. 579, 2023

work page 2023

[62] [62]

Foundationstereo: Zero-shot stereo matching,

B. Wen, M. Trepte, J. Aribido, J. Kautz, O. Gallo, and S. Birchfield, “Foundationstereo: Zero-shot stereo matching,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5249– 5260

work page 2025

[63] [63]

Vggt: Visual geometry grounded transformer,

J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

work page 2025

[64] [64]

Reducing so (3) convolutions to so (2) for efficient equivariant gnns,

S. Passaro and C. L. Zitnick, “Reducing so (3) convolutions to so (2) for efficient equivariant gnns,” inInternational conference on machine learning. PMLR, 2023, pp. 27 420–27 438

work page 2023

[65] [65]

EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations,

Y .-L. Liao, B. Wood, A. Das*, and T. Smidt*, “EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https://openreview.net/forum?id=mCOBKZmrzD

work page 2024

[66] [66]

Sonata: Self-supervised learning of reliable point representations,

X. Wuet al., “Sonata: Self-supervised learning of reliable point representations,” inCVPR, 2025

work page 2025

[67] [67]

Mujoco: A physics engine for model- based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model- based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033

work page 2012

[68] [68]

Google scanned objects: A high-quality dataset of 3d scanned household items,

L. Downset al., “Google scanned objects: A high-quality dataset of 3d scanned household items,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2553–2560

work page 2022

[69] [69]

The ycb object and model set: Towards common benchmarks for manipulation research,

B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar, “The ycb object and model set: Towards common benchmarks for manipulation research,” in2015 international conference on advanced robotics (ICAR). IEEE, 2015, pp. 510–517

work page 2015