pith. sign in

arxiv: 2606.24049 · v1 · pith:KWZNOBYSnew · submitted 2026-06-23 · 💻 cs.RO

SPACE: Enabling Learning from Cross-Robot Data Toward Generalist Policies

Pith reviewed 2026-06-26 00:47 UTC · model grok-4.3

classification 💻 cs.RO
keywords robot learningcross-embodimentbehavior cloningaction representationgeneralist policiesCartesian state deltadynamics adaptation
0
0 comments X

The pith

SPACE lets robot policies learn from mixed data across different machines by predicting Cartesian state deltas instead of specific commands.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Robot actions are tied to each machine's dynamics, so data from different robots cannot be mixed directly for training generalist policies via behavior cloning. SPACE addresses this by training a policy to predict geometric end-effector displacements in Cartesian space as a shared representation, then using an Action Adapter to convert those predictions into robot-specific commands. The framework targets dynamics differences across embodiments, across units of the same embodiment, and during single-robot operation. Experiments show it outperforms direct command prediction on cross-robot datasets and stays effective under deployment shifts such as altered control frequency, object weight, or gains. This approach supports scaling training data across hardware without the mismatch that normally blocks combined datasets.

Core claim

SPACE consists of a Cartesian state delta policy that predicts geometric end-effector displacement and an Action Adapter that converts the prediction into robot-specific control commands. This structure handles robot dynamics variation at three levels: across different embodiments, across hardware units of the same embodiment, and within a single robot during operation.

What carries the argument

Cartesian state delta as a universal action representation, together with an Action Adapter that maps the predicted delta to embodiment-specific commands.

If this is right

  • Policies trained with SPACE substantially outperform those that directly predict control commands when using data collected across different embodiments and across hardware units of the same embodiment.
  • SPACE remains robust under dynamics shifts at deployment, including changes in control frequency, object weight, and controller gains.
  • The shared Cartesian representation enables effective behavior cloning from aggregated datasets that would otherwise be incompatible due to embodiment-specific actions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same delta-based separation could be tested with additional sensor modalities to further reduce embodiment dependence in policy training.
  • If the adapter requires per-robot calibration data, the method's advantage would shrink for entirely novel hardware with no prior examples.
  • Extending the approach to include visual or force observations as part of the shared state could address cases where end-effector position alone is insufficient.

Load-bearing premise

The assumption that an Action Adapter can reliably map a predicted Cartesian state delta into accurate robot-specific commands for any embodiment or hardware unit without introducing large errors that would negate the benefit of the shared representation.

What would settle it

A controlled test on a new robot embodiment where the Action Adapter produces large command errors, causing the full SPACE policy to achieve lower task success than a baseline that directly predicts control commands from the same mixed data.

Figures

Figures reproduced from arXiv: 2606.24049 by Byeongguk Jeon, Haeone Lee, Jian Kim, Kimin Lee, Suchae Jeong.

Figure 1
Figure 1. Figure 1: Illustration of SPACE, a framework comprising a Cartesian state-delta policy and Action [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Different robots (e.g., UR5 vs. Franka Research 3) require different commands to per￾form the same movement. However, different robots require different con￾trol commands to achieve the same motion ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: We study co-training between UR5 and FR3 robot for 3 different tasks, and zero-shot [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Success rate for cross-embodiment experiments. We study (a) co-training on FR3 and [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FR3 trajectory replayed on a differ [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Success rate in DROID. We compare SPACE against policies trained using different action spaces available in DROID. Learning from multi-hardware data. While the above results show that SPACE enables trans￾fer from one hardware unit to another, an equally important scenario is training a policy on data collected from many hardware units simultaneously. To test this, we use the DROID dataset [7], a large-scal… view at source ↗
Figure 9
Figure 9. Figure 9: Success rate after increasing the object weight. 0 25 50 75 100 125 150 Step (t) 0.00 0.02 0.04 bt (z-axis) Box weight (90g) Box weight (530g) [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Action Adapter bias (bt) z-axis visualization in PnP Box task. Heavy box weight leads to an increase in z-axis value after grasping (high￾lighted by yellow). The value is aver￾aged over three rollouts. We also show that the SPACE can handle dynamics changes driven by environmental factors such as object weight. Specifically, we use the same PnP Box task and add metal pieces to the box, increasing its weig… view at source ↗
Figure 11
Figure 11. Figure 11: Different trajectory for Dcal used for ablation with circle and square shapes in addition to our original choice, random trajectories. Calibration trajectories. We also ablate how different choices for initial calibration trajectories affect the Action Adapter performance. To test this, we attempt new calibration trajectories named Circle and Square, in which calibration trajectories draw a circle and a s… view at source ↗
read the original abstract

In robot learning, scaling training datasets across diverse embodiments and environments has become a dominant paradigm for learning generalizable robot policies. These policies are commonly trained via behavior cloning to imitate actions from pre-collected demonstrations. However, since robot actions are tied to the dynamics of the data collection robot, different robots may require different actions to achieve the same motion. This discrepancy hinders both policy training and deployment across diverse robots. To address this, we propose using Cartesian state delta as a universal action representation across robots, and introduce State Prediction and Adaptive Command Execution (SPACE) framework. SPACE handles robot dynamics variation at three levels: across different embodiments, across hardware units of the same embodiment, and within a single robot during operation. It consists of two components: (i) a Cartesian state delta policy that predicts geometric end-effector displacement, and (ii) Action Adapter, which converts the predicted Cartesian state delta into robot-specific control commands. Experiments show that SPACE substantially outperforms policies that directly predict control commands when learning from data collected across different embodiments and across hardware units of the same embodiment. SPACE also remains robust under dynamics shifts at deployment, including changes in control frequency, object weight, and controller gains. The project page is available at http://haeone.site/space-website/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes the SPACE framework for learning generalist robot policies from cross-embodiment and cross-hardware-unit data. It replaces direct command prediction with a policy that outputs Cartesian state deltas (geometric end-effector displacements) as a universal representation, paired with an Action Adapter that converts these deltas into robot-specific control commands. The approach is intended to handle dynamics variation at three levels: across embodiments, across units of the same embodiment, and within a single robot at deployment. The abstract states that experiments demonstrate substantial outperformance over direct command prediction and robustness to shifts in control frequency, object weight, and controller gains.

Significance. If the empirical claims are substantiated with quantitative results, the separation of a shared geometric policy from an embodiment-specific adapter could provide a practical route to scaling behavior-cloning datasets across heterogeneous robots. The core idea of using Cartesian deltas to decouple policy learning from robot dynamics is a clear conceptual contribution to cross-robot generalization.

major comments (2)
  1. [Abstract] Abstract: the central empirical claim that SPACE 'substantially outperforms policies that directly predict control commands' is presented without any reported metrics, baseline details, dataset sizes, number of trials, or statistical tests, preventing assessment of whether the data support the claim.
  2. [Abstract] Abstract: the load-bearing assumption that the Action Adapter maps predicted Cartesian state deltas to accurate robot-specific commands across embodiments, hardware units, and deployment shifts (frequency, mass, gains) receives no quantitative validation; no adapter command-error metrics, ablations isolating the adapter, or failure-case analysis are described.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive feedback on the abstract. We agree that the abstract would benefit from additional quantitative details to better support the claims. We will revise the manuscript to address these points.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central empirical claim that SPACE 'substantially outperforms policies that directly predict control commands' is presented without any reported metrics, baseline details, dataset sizes, number of trials, or statistical tests, preventing assessment of whether the data support the claim.

    Authors: We agree that the abstract lacks specific metrics and details. In the revised version, we will update the abstract to include key quantitative results such as success rate improvements (e.g., average success rates across tasks), number of evaluation trials, and references to the experimental sections and tables that detail baselines, dataset sizes, and any statistical comparisons. This will make the empirical claims more self-contained while preserving the abstract's brevity. revision: yes

  2. Referee: [Abstract] Abstract: the load-bearing assumption that the Action Adapter maps predicted Cartesian state deltas to accurate robot-specific commands across embodiments, hardware units, and deployment shifts (frequency, mass, gains) receives no quantitative validation; no adapter command-error metrics, ablations isolating the adapter, or failure-case analysis are described.

    Authors: We acknowledge that the abstract does not provide quantitative validation or metrics for the Action Adapter. The full manuscript includes relevant ablations, command-error metrics under varying conditions, and robustness results in the experiments (Section 4). We will revise the abstract to briefly reference these adapter-specific results and the demonstrated robustness to shifts, ensuring the adapter's contribution is supported by evidence from the paper. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical framework with no derivation chain

full rationale

The paper introduces SPACE as a two-component framework (Cartesian state delta policy + Action Adapter) and supports its claims exclusively via experimental comparisons on cross-embodiment and cross-unit data. No equations, fitted parameters, or first-principles derivations are present in the provided text. The central claims (outperformance and robustness) are statistical outcomes of training and evaluation, not quantities that reduce to their own inputs by construction. Self-citation patterns, ansatz smuggling, or uniqueness theorems are absent. This is a standard empirical robotics paper whose validity hinges on experimental design rather than any self-referential mathematical structure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are stated or can be inferred from the provided text.

pith-pipeline@v0.9.1-grok · 5764 in / 1097 out tokens · 17360 ms · 2026-06-26T00:47:48.629421+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 16 linked inside Pith

  1. [1]

    Bommasani, D

    R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021

  2. [2]

    Achiam, S

    J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  3. [3]

    Comanici, E

    G. Comanici, E. Bieber, M. Schaekermann, I. Pasupat, N. Sachdeva, I. Dhillon, M. Blistein, O. Ram, D. Zhang, E. Rosen, et al. Gemini 2.5: Pushing the frontier with advanced rea- soning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

  4. [4]

    O’Neill, A

    A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models. InInternational Conference on Robotics and Automation, 2024

  5. [5]

    H.-S. Fang, H. Fang, Z. Tang, J. Liu, C. Wang, J. Wang, H. Zhu, and C. Lu. Rh20t: A compre- hensive robotic dataset for learning diverse skills in one-shot. InInternational Conference on Robotics and Automation, 2024

  6. [6]

    K. Wu, C. Hou, J. Liu, Z. Che, X. Ju, Z. Yang, M. Li, Y . Zhao, Z. Xu, G. Yang, et al. Robomind: Benchmark on multi-embodiment intelligence normative data for robot manipulation.arXiv preprint arXiv:2412.13877, 2024

  7. [7]

    Khazatsky, K

    A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y . Chen, K. Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset.arXiv preprint arXiv:2403.12945, 2024

  8. [8]

    Q. Bu, J. Cai, L. Chen, X. Cui, Y . Ding, S. Feng, S. Gao, X. He, X. Hu, X. Huang, et al. Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems.arXiv preprint arXiv:2503.06669, 2025

  9. [9]

    M. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

  10. [10]

    Black, N

    K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.π 0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

  11. [11]

    Black, N

    K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fu- sai, et al.π 0.5: a vision-language-action model with open-world generalization.arXiv preprint arXiv:2504.16054, 2025

  12. [12]

    Bjorck, F

    J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025

  13. [13]

    Zheng, J

    J. Zheng, J. Li, D. Liu, Y . Zheng, Z. Wang, Z. Ou, Y . Liu, J. Liu, Y .-Q. Zhang, and X. Zhan. Universal actions for enhanced embodied foundation models. InComputer Vision and Pattern Recognition Conference, 2025. 10

  14. [14]

    Bronars, Y

    A. Bronars, Y . Park, and P. Agrawal. Tune to learn: How controller gains shape robot policy learning.arXiv preprint arXiv:2604.02523, 2026

  15. [15]

    O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, et al. Octo: An open-source generalist robot policy.arXiv preprint arXiv:2405.12213, 2024

  16. [16]

    Brohan, N

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, et al. Rt-1: Robotics transformer for real-world control at scale. In Robotics: Science and Systems, 2023

  17. [17]

    Zitkovich, T

    B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In Conference on Robot Learning, 2023

  18. [18]

    J. Wen, Y . Zhu, J. Li, Z. Tang, C. Shen, and F. Feng. Dexvla: Vision-language model with plug-in diffusion expert for general robot control. InConference on Robot Learning, 2025

  19. [19]

    Zheng, J

    J. Zheng, J. Li, Z. Wang, D. Liu, X. Kang, Y . Feng, Y . Zheng, J. Zou, Y . Chen, J. Zeng, et al. X- vla: Soft-prompted transformer as scalable cross-embodiment vision-language-action model. arXiv preprint arXiv:2510.10274, 2025

  20. [20]

    S. Ye, J. Jang, B. Jeon, S. J. Joo, J. Yang, B. Peng, A. Mandlekar, R. Tan, Y .-W. Chao, B. Y . Lin, et al. Latent action pretraining from videos. InInternational Conference on Learning Representations, 2025

  21. [21]

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic con- trol with dynamics randomization. InInternational Conference on Robotics and Automation, 2018

  22. [22]

    Andrychowicz, B

    M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, et al. Learning dexterous in-hand manipulation.The Interna- tional Journal of Robotics Research, 39(1):3–20, 2020

  23. [23]

    Kumar, Z

    A. Kumar, Z. Fu, D. Pathak, and J. Malik. Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021

  24. [24]

    H. Qi, A. Kumar, R. Calandra, Y . Ma, and J. Malik. In-hand object rotation via rapid motor adaptation. InConference on Robot Learning, 2023

  25. [25]

    Torabi, G

    F. Torabi, G. Warnell, and P. Stone. Behavioral cloning from observation. InInternational Joint Conference on Artificial Intelligence, 2018

  26. [26]

    Radosavovic, X

    I. Radosavovic, X. Wang, L. Pinto, and J. Malik. State-only imitation learning for dexterous manipulation. InInternational Conference on Intelligent Robots and Systems, 2021

  27. [27]

    N. R. Arachchige, Z. Chen, W. Jung, W. C. Shin, R. Bansal, P. Barroso, Y . H. He, Y . C. Lin, B. Joffe, S. Kousik, et al. Sail: Faster-than-demonstration execution of imitation learning policies. InConference on Robot Learning, 2025

  28. [28]

    S. Kim, J. Kim, and J. J. Lim. Time optimal execution of action chunk policies beyond demon- stration speed. InInternational Conference on Learning Representations, 2026

  29. [29]

    C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song. Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots. InRobotics: Science and Systems, 2024

  30. [30]

    Zhaxizhuoma, K

    Z. Zhaxizhuoma, K. Liu, C. Guan, Z. Jia, Z. Wu, X. Liu, T. Wang, S. Liang, P. Chen, P. Zhang, et al. Fastumi: A scalable and hardware-independent universal manipulation interface with dataset. InConference on Robot Learning, 2025. 11

  31. [31]

    D. A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. InAdvances in Neural Information Processing Systems, 1988

  32. [32]

    Y . Feng, J. Zheng, Z. Wang, D. Liu, J. Li, J. Pang, T. Wang, and X. Zhan. Demystifying action space design for robotic manipulation policies.arXiv preprint arXiv:2602.23408, 2026

  33. [33]

    O. Khatib. A unified approach for motion and force control of robot manipulators: The opera- tional space formulation.IEEE Journal on Robotics and Automation, 3(1):43–53, 1987

  34. [34]

    L. Y . Chen, K. Hari, K. Dharmarajan, C. Xu, Q. Vuong, and K. Goldberg. Mirage: Cross- embodiment zero-shot policy transfer with cross-painting. InRobotics: Science and Systems, 2024

  35. [35]

    L. Hao, R. Pagani, M. Beschi, and G. Legnani. Dynamic and friction parameters of an indus- trial robot: Identification, comparison and repetitiveness analysis.Robotics, 10(1):49, 2021

  36. [36]

    T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware. InRobotics: Science and Systems, 2023

  37. [37]

    Akgun, M

    B. Akgun, M. Cakmak, J. W. Yoo, and A. L. Thomaz. Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective. InInternational Conference on Human-Robot Interaction, 2012

  38. [38]

    H. Li, Y . Cui, and D. Sadigh. How to train your robots? the impact of demonstration modality on imitation learning. InInternational Conference on Robotics and Automation, 2025

  39. [39]

    Haykin and B

    S. Haykin and B. Widrow. Least-mean-square adaptive filters. 2003

  40. [40]

    S. Ye, Y . Ge, K. Zheng, S. Gao, S. Yu, G. Kurian, S. Indupuru, Y . L. Tan, C. Zhu, J. Xiang, et al. World action models are zero-shot policies.arXiv preprint arXiv:2602.15922, 2026

  41. [41]

    Atreya, K

    P. Atreya, K. Pertsch, T. Lee, M. J. Kim, A. Jain, A. Kuramshin, C. Neary, E. S. Hu, K. Arora, K. Ellis, et al. Roboarena: Distributed real-world evaluation of generalist robot policies. In Conference on Robot Learning, 2025

  42. [42]

    UMI data

    L. Y . Chen, S. Adebola, and K. Goldberg. Berkeley UR5 demonstration dataset.https: //sites.google.com/view/berkeley-ur5/home. 12 Appendix A Pseudo Code Algorithm 1SPACE rollout with Action Adapter 1:Given:Cartesian state delta policyπ θ 2:Define:Action adapterˆu“W 0∆p`b 0 and its learning rateµ 3:CollectMcalibration trajectories with lengthK,D cal “ ttp0...