pith. sign in

arxiv: 2503.10070 · v2 · submitted 2025-03-13 · 💻 cs.RO · cs.AI· cs.LG

AhaRobot: A Low-Cost Open-Source Bimanual Mobile Manipulator for Embodied AI

Pith reviewed 2026-05-22 23:58 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords low-cost robotbimanual manipulatorteleoperationembodied AIdata collectionopen-source hardwareimitation learningmobile manipulator
0
0 comments X

The pith

A $1,000 open-source bimanual robot achieves 0.7 mm repeatability for embodied AI data collection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AhaRobot as a low-cost fully open-source bimanual mobile manipulator to address the high cost of commercial systems that limit large-scale collection of manipulation data for Vision-Language-Action models. It contributes a SCARA-like dual-arm hardware design, an optimized control stack using dual-motor backlash mitigation and dithering for friction compensation, and RoboPilot with a 26-faced marker handle for teleoperation. Experiments show the co-design reaches 0.7 mm repeatability at $1000 total hardware cost. The handle cuts tracking error by 80 percent versus a 6-faced baseline, boosts data-collection efficiency by 30 percent, and supports long-horizon remote tasks with data quality matching VR systems.

Core claim

The hardware-control co-design of AhaRobot delivers 0.7 mm repeatability at a total hardware cost of only $1,000. The 26-faced handle reduces tracking error by 80% over a 6-faced baseline and improves data-collection efficiency by 30%, while supporting long-horizon tasks and singularities in remote teleoperation.

What carries the argument

SCARA-like dual-arm hardware with dual-motor backlash mitigation, dithering for friction compensation, and the 26-faced marker handle in the RoboPilot teleoperation interface.

If this is right

  • Enables large-scale collection of diverse manipulation data for training Vision-Language-Action models.
  • Supports imitation learning of complex household behaviors involving bimanual coordination, upper-body mobility, and contact-rich interaction.
  • Maintains data quality comparable to VR-based collection at far lower cost.
  • Allows fully remote long-horizon teleoperation without singularity issues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The open-source release could let other labs adapt the arm geometry or handle for non-household tasks.
  • The marker design might transfer to improve precision in existing commercial tracking setups.
  • Long-term wear tests by multiple groups would be needed to confirm sustained 0.7 mm performance.

Load-bearing premise

The reported repeatability, error reduction, and data quality will hold when independent groups replicate the system under varied real-world lighting, operator skill levels, and long-term hardware wear.

What would settle it

An independent build and test that measures repeatability worse than 2 mm or tracking error reduction below 50 percent under standard indoor conditions would falsify the performance claims.

Figures

Figures reproduced from arXiv: 2503.10070 by Haiqin Cui, Jianye Hao, Yan Zheng, Yifu Yuan.

Figure 1
Figure 1. Figure 1: Overview of AhaRobot. The system costs only $1,000 for the robot and $1,000 for extra power and computing. Above left: Hardware Configuration of AhaRobot. Above right: Fully Remote Mobile Manipulation Teleoperation RoboPilot. Below: AhaRobot can perform various tasks in daily life. robots as easily as piloting. We first design a 26-faced marker handle for controlling the robot arms to mitigate the pose amb… view at source ↗
Figure 2
Figure 2. Figure 2: Block Diagram of Dual-Joint Control System. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: System Responses with Different Control Strategies. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Tracking Accuracy of Different Handles. TABLE III: Quantitative Comparisons on Polyhedrons. Type Avg. Rotation Err Avg. Translation Err 6-Faced 5.391 deg 9.9 mm 26-Faced (Ours) 1.094 deg ↓ (80%) 2.1 mm ↓ (79%) 26-Faced Motion Capture Handle: We use AprilTag [41] to capture the 6-DoF pose of the handle. A common method is to construct a 6-faced cube, but the perspective-n-point (PnP) algorithm with coplanar… view at source ↗
Figure 4
Figure 4. Figure 4: RoboPilot Teleoperation Workstation. By capturing the 6D pose of the handle through a web camera, we can fully remotely teleoperate the robot, with the entire setup costing no more than $50. Foot pedals can switch between two modes, respectively controlling the base’s movement and the upper limbs’ operation. It is essential to develop a simple and useful teleoperation method for Embodied AI. We aim to desi… view at source ↗
Figure 6
Figure 6. Figure 6: Very Long-horizon Remote Teleoperation. We used RoboPilot to control AhaRobot and demonstrated two specially designed complex tasks. Task 1 involves a very long sequence of operations with a total movement distance exceeding 200m, requiring remote tele-communication, agile movement, and precise environmental interaction. Task 2 specifically showcases AhaRobot’s lifting and lowering capabilities, enabling i… view at source ↗
Figure 7
Figure 7. Figure 7: Teleoperation Tasks Demonstration. continuing until five sets of successful data were collected. We selected 3 tasks and recorded the success rate and the average time of successful attempts. The task examples and full results are shown in [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Success Rate of Different Types of Base Movement. Due to spikes in the data, the policy trained by velocity control failed to move. 18 dimensions, including arm joint positions, gripper state, rotation angle of head camera, base movement command, and three images captured from the head and wrist cameras. Correspondingly, the teleoperator provided 18-dimensional commands through inverse kinematics. We colle… view at source ↗
read the original abstract

Scaling Vision-Language-Action models for embodied manipulation demands large volumes of diverse manipulation data, yet the high cost of commercial mobile manipulators and teleoperation interfaces that are difficult to deploy at scale remain key bottlenecks. We present AhaRobot, a low-cost, fully open-source bimanual mobile manipulator tailored for Embodied-AI. The system contributes: (1) a SCARA-like dual-arm hardware design that reduces motor torque demands while maintaining a large vertical reachable workspace, (2) an optimized control stack that improves precision via dual-motor backlash mitigation and static-friction compensation through dithering, and (3) RoboPilot, a teleoperation interface featuring a novel 26-faced marker handle for precise, long-horizon remote data collection. Experimental results show that our hardware-control co-design achieves 0.7 mm repeatability at a total hardware cost of only $1,000. The proposed 26-faced handle reduces tracking error by 80% over a 6-faced baseline and improves data-collection efficiency by 30%, while robustly handling singularities and supporting extremely long-horizon tasks in fully remote settings. Despite its low cost, AhaRobot enables imitation learning of complex household behaviors involving bimanual coordination, upper-body mobility, and contact-rich interaction, with data quality comparable to VR-based collection. All software, CAD files, and documentation are available at https://aha-robot.github.io.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces AhaRobot, a fully open-source bimanual mobile manipulator costing $1,000, designed to address data collection bottlenecks for vision-language-action models. It contributes a SCARA-like dual-arm hardware architecture, a control stack incorporating dual-motor backlash mitigation and dithering-based friction compensation, and the RoboPilot teleoperation system featuring a novel 26-faced marker handle. Reported results include 0.7 mm repeatability, an 80% reduction in tracking error versus a 6-faced baseline, a 30% gain in data-collection efficiency, robust singularity handling for long-horizon tasks, and imitation-learning data quality comparable to VR systems, with all CAD, code, and documentation released publicly.

Significance. If the experimental results replicate, the work has clear significance for embodied AI by substantially lowering the cost of high-quality bimanual and mobile manipulation data collection. The explicit bill-of-materials, control equations, marker geometry, and experimental protocols constitute a reproducible contribution. The open-source release of hardware designs and software is a particular strength that directly supports community adoption and extension beyond the authors' lab.

minor comments (2)
  1. [Experimental Results] Experimental Results section: the repeatability trials and tracking-error comparisons would benefit from an explicit statement of trial count, operator count, and lighting conditions to strengthen the replication claim.
  2. [Experimental Results] The comparison of data quality to VR systems is stated qualitatively; a short quantitative table (e.g., success rates or trajectory smoothness metrics) would make the claim more precise without altering scope.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of AhaRobot and the recommendation for minor revision. The assessment correctly identifies the core contributions in hardware design, control, and the open-source teleoperation interface. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript presents a hardware design, control stack (backlash mitigation and dithering), and 26-faced marker handle whose performance claims rest on direct experimental measurements (repeatability trials, tracking-error comparisons to a 6-faced baseline, data-collection efficiency) and a bill-of-materials. No equations, fitted parameters, or self-citations are invoked in a load-bearing way that reduces a claimed result to its own inputs by construction. The derivation chain consists of physical construction followed by empirical validation, which is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Engineering paper with no mathematical free parameters, axioms, or invented entities; claims rest on physical implementation and empirical measurements.

pith-pipeline@v0.9.0 · 5796 in / 1046 out tokens · 22597 ms · 2026-05-22T23:58:31.011001+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Nori Bot: A Sub-$1,000 Floor-to-Counter Mobile Manipulator

    cs.RO 2026-05 unverdicted novelty 3.0

    Nori Bot is a 17-DoF dual-arm mobile manipulator costing $947 with a 600 mm Z-axis lift, Raspberry Pi proactive control, and current-based servo protection.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Learning fine-grained bimanual manipulation with low-cost hardware,

    T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” in RSS, 2023

  2. [2]

    Diffusion policy: Visuomotor policy learning via action diffusion,

    C. Chi, S. Feng, Y . Du, Z. Xu, E. Cousineau, B. C. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” in RSS, 2023

  3. [3]

    Openvla: An open-source vision-language-action model,

    M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. P. Foster, P. R. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “Openvla: An open-source vision-language-action model,” in CoRL, 2025

  4. [4]

    Octo: An open-source generalist robot policy,

    O. team, “Octo: An open-source generalist robot policy,” in RSS, 2024

  5. [5]

    Rdt-1b: A diffusion foundation model for bimanual manipulation,

    S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu, “Rdt-1b: A diffusion foundation model for bimanual manipulation,” in ICLR, 2025

  6. [6]

    Demonstrating ok-robot: What really matters in integrating open- knowledge models for robotics,

    P. Liu, Y . Orru, J. Vakil, C. Paxton, N. M. M. Shafiullah, and L. Pinto, “Demonstrating ok-robot: What really matters in integrating open- knowledge models for robotics,” in RSS, 2024

  7. [7]

    Navid: Video-based vlm plans the next step for vision-and-language navigation,

    J. Zhang, K. Wang, R. Xu, G. Zhou, Y . Hong, X. Fang, Q. Wu, Z. Zhang, and H. Wang, “Navid: Video-based vlm plans the next step for vision-and-language navigation,” in RSS, 2024

  8. [8]

    Navgpt-2: Unleashing navigational reasoning capability for large vision-language models,

    G. Zhou, Y . Hong, Z. Wang, X. E. Wang, and Q. Wu, “Navgpt-2: Unleashing navigational reasoning capability for large vision-language models,” in ECCV, 2025

  9. [9]

    Mobile aloha: Learning bimanual mobile manipulation using low-cost whole-body teleoperation,

    Z. Fu, T. Z. Zhao, and C. Finn, “Mobile aloha: Learning bimanual mobile manipulation using low-cost whole-body teleoperation,” in CoRL, 2024

  10. [10]

    Whole-body teleoperation for mobile manipulation at zero added cost,

    D. Honerkamp, H. Mahesheka, J. O. von Hartz, T. Welschehold, and A. Valada, “Whole-body teleoperation for mobile manipulation at zero added cost,” IEEE Robotics and Automation Letters , 2025

  11. [11]

    Open-television: Teleoperation with immersive active visual feedback,

    X. Cheng, J. Li, S. Yang, G. Yang, and X. Wang, “Open-television: Teleoperation with immersive active visual feedback,” in CoRL, 2024

  12. [12]

    Bunny-visionpro: Real-time bimanual dexterous teleoperation for imitation learning,

    R. Ding, Y . Qin, J. Zhu, C. Jia, S. Yang, R. Yang, X. Qi, and X. Wang, “Bunny-visionpro: Real-time bimanual dexterous teleoperation for imitation learning,” arXiv:2407.03162, 2024

  13. [13]

    Omnih2o: Universal and dexterous human-to- humanoid whole-body teleoperation and learning,

    T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. M. Kitani, C. Liu, and G. Shi, “Omnih2o: Universal and dexterous human-to- humanoid whole-body teleoperation and learning,” in CoRL, 2024

  14. [14]

    Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning,

    J. Luo, C. Xu, J. Wu, and S. Levine, “Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning,” arxiv:2410.21845, 2024

  15. [15]

    Gello: A general, low- cost, and intuitive teleoperation framework for robot manipulators,

    P. Wu, F. Shentu, X. Lin, and P. Abbeel, “Gello: A general, low- cost, and intuitive teleoperation framework for robot manipulators,” in CoRL, 2023

  16. [16]

    Airexo: Low-cost exoskeletons for learning whole-arm manipulation in the wild,

    H. Fang, H.-S. Fang, Y . Wang, J. Ren, J. Chen, R. Zhang, W. Wang, and C. Lu, “Airexo: Low-cost exoskeletons for learning whole-arm manipulation in the wild,” in ICRA, 2024

  17. [17]

    Ace: A cross-platform and visual-exoskeletons system for low-cost dexterous teleoperation,

    S. Yang, M. Liu, Y . Qin, R. Ding, J. Li, X. Cheng, R. Yang, S. Yi, and X. Wang, “Ace: A cross-platform and visual-exoskeletons system for low-cost dexterous teleoperation,” in CoRL, 2024

  18. [18]

    Bimanual dexterity for complex tasks,

    K. Shaw, Y . Li, J. Yang, M. K. Srirama, R. Liu, H. Xiong, R. Men- donca, and D. Pathak, “Bimanual dexterity for complex tasks,” in CoRL, 2024

  19. [19]

    The design of stretch: A compact, lightweight mobile manipulator for indoor human environments,

    C. C. Kemp, A. Edsinger, H. M. Clever, and B. Matulevich, “The design of stretch: A compact, lightweight mobile manipulator for indoor human environments,” in ICRA, 2022

  20. [20]

    Droid: A large-scale in-the-wild robot manipulation dataset,

    D. Team, “Droid: A large-scale in-the-wild robot manipulation dataset,” in RSS, 2024

  21. [21]

    Xiong, R

    H. Xiong, R. Mendonca, K. Shaw, and D. Pathak, “Adaptive mobile manipulation for articulated objects in the open world,” arXiv:2401.14403, 2024

  22. [22]

    Demonstrating adap- tive mobile manipulation in retail environments,

    M. Spahn, C. Pezzato, C. Salmi, R. Dekker, C. Wang, C. Pek, J. Kober, J. Alonso-Mora, C. H. Corbato, and M. Wisse, “Demonstrating adap- tive mobile manipulation in retail environments,” in RSS, 2024

  23. [23]

    Revolutionizing battery disassembly: The design and implementation of a battery disassembly autonomous mobile manipulator robot(beam- 1),

    Y . Peng, Z. Wang, Y . Zhang, S. Zhang, N. Cai, F. Wu, and M. Chen, “Revolutionizing battery disassembly: The design and implementation of a battery disassembly autonomous mobile manipulator robot(beam- 1),” in IROS, 2024

  24. [24]

    Tidybot++: An open-source holonomic mobile manipulator for robot learning,

    J. Wu, W. Chong, R. Holmberg, A. Prasad, Y . Gao, O. Khatib, S. Song, S. Rusinkiewicz, and J. Bohg, “Tidybot++: An open-source holonomic mobile manipulator for robot learning,” in CoRL, 2024

  25. [25]

    Coupled active perception and manipulation planning for a mobile manipulator in precision agriculture applications,

    S. Xie, C. Hu, D. Wang, J. Johnson, M. Bagavathiannan, and D. Song, “Coupled active perception and manipulation planning for a mobile manipulator in precision agriculture applications,” in ICRA, 2024

  26. [26]

    Dynamic inter- action control in legged mobile manipulators: A decoupled approach,

    Q. Li, Q. Meng, Y . Qin, J. Chen, X. Ding, and K. Xu, “Dynamic inter- action control in legged mobile manipulators: A decoupled approach,” in ICRA, 2024

  27. [27]

    Learning to open and traverse doors with a legged manipulator,

    M. Zhang, Y . Ma, T. Miki, and M. Hutter, “Learning to open and traverse doors with a legged manipulator,” in CoRL, 2024

  28. [28]

    A mobile manipulation system for one-shot teaching of complex tasks in homes,

    M. Bajracharya, J. Borders, D. Helmick, T. Kollar, M. Laskey, J. Leichty, J. Ma, U. Nagarajan, A. Ochiai, J. Petersen, K. Shankar, K. Stone, and Y . Takaoka, “A mobile manipulation system for one-shot teaching of complex tasks in homes,” in ICRA, 2020

  29. [29]

    Demonstrating mobile manipulation in the wild: A metrics-driven approach,

    M. Bajracharya, J. Borders, R. Cheng, D. Helmick, L. Kaul, D. Kruse, J. Leichty, J. Ma, C. Matl, F. Michel, C. Papazov, J. Petersen, K. Shankar, and M. Tjersland, “Demonstrating mobile manipulation in the wild: A metrics-driven approach,” in RSS, 2023

  30. [30]

    Design of stickbug: A six-armed precision pollination robot,

    T. Smith, M. Rijal, C. Tatsch, R. M. Butts, J. Beard, R. T. Cook, A. Chu, J. Gross, and Y . Gu, “Design of stickbug: A six-armed precision pollination robot,” in IROS, 2024

  31. [31]

    Nimbro wins ana avatar xprize immer- sive telepresence competition: Human-centric evaluation and lessons learned,

    C. Lenz, M. Schwarz, A. Rochow, B. P ¨atzold, R. Memmesheimer, M. Schreiber, and S. Behnke, “Nimbro wins ana avatar xprize immer- sive telepresence competition: Human-centric evaluation and lessons learned,” International Journal of Social Robotics , 2023

  32. [32]

    Temporal difference learning for model predictive control,

    N. A. Hansen, H. Su, and X. Wang, “Temporal difference learning for model predictive control,” in ICML, 2022

  33. [33]

    $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

    π0 Team, “π0: A vision-language-action flow model for general robot control,” arXiv:2410.24164, 2024

  34. [34]

    Open teach: A versatile teleoperation system for robotic manipulation,

    A. Iyer, Z. Peng, Y . Dai, I. Guzey, S. Haldar, S. Chintala, and L. Pinto, “Open teach: A versatile teleoperation system for robotic manipulation,” in CoRL, 2024

  35. [35]

    Marionet: Motion acquisition for robots through iterative online evaluative training,

    A. Setapen, M. Quinlan, and P. Stone, “Marionet: Motion acquisition for robots through iterative online evaluative training,” in AAMAS, 2010

  36. [36]

    Teleoperation of a humanoid robot using full-body motion capture, example movements, and machine learning,

    C. Stanton, A. Bogdanovych, and E. Ratanasena, “Teleoperation of a humanoid robot using full-body motion capture, example movements, and machine learning,” Australasian Conference on Robotics and Automation, 2012

  37. [37]

    Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots,

    C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots,” in RSS, 2024

  38. [38]

    Fast-umi: A scalable and hardware-independent universal manipulation interface,

    Z. Wu, T. Wang, C. Guan, Z. Jia, S. Liang, H. Song, D. Qu, D. Wang, Z. Wang, N. Cao, Y . Ding, B. Zhao, and X. Li, “Fast-umi: A scalable and hardware-independent universal manipulation interface,” arXiv:2409.19499, 2024

  39. [39]

    Self-organization, embodiment, and biologically inspired robotics,

    R. Pfeifer, M. Lungarella, and F. Iida, “Self-organization, embodiment, and biologically inspired robotics,” Science, 2007

  40. [40]

    Friction models and friction compensation,

    H. Olsson, K. J. ˚Astr¨om, C. Canudas de Wit, M. G ¨afvert, and P. Lischinsky, “Friction models and friction compensation,” European Journal of Control , 1998

  41. [41]

    Apriltag: A robust and flexible visual fiducial system,

    E. Olson, “Apriltag: A robust and flexible visual fiducial system,” in ICRA, 2011

  42. [42]

    Robust pose estimation from a planar target,

    G. Schweighofer and A. Pinz, “Robust pose estimation from a planar target,” IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2006