Learning Dynamic Pick-and-Place for a Legged Manipulator

Donghoon Youm; Donghyuk Choi; HyeongJun Kim; Hyunsik Oh; Jemin Hwangbo; Jie Song; Jiseong Lee; Juhyeok Mun; Jungwoo Hur; Moonkyu Jung

arxiv: 2605.15713 · v1 · pith:VTPTTTPJnew · submitted 2026-05-15 · 💻 cs.RO · cs.AI

Learning Dynamic Pick-and-Place for a Legged Manipulator

Moonkyu Jung , Jiseong Lee , Zhengmao He , Donghoon Youm , Juhyeok Mun , HyeongJun Kim , Hyunsik Oh , Donghyuk Choi

show 3 more authors

Jungwoo Hur Jie Song Jemin Hwangbo

This is my paper

Pith reviewed 2026-05-20 18:59 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords legged manipulatorreinforcement learningpick and placedynamic manipulationmass estimationquadruped robotwhole-body controlmobile manipulation

0 comments

The pith

A hierarchical RL framework with explicit mass estimation lets a quadruped arm perform dynamic pick-and-place for payloads up to 1.3 kg while walking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a hierarchical reinforcement learning system for a quadruped robot fitted with a 6-DOF arm that can pick objects from the ground or tables and place them elsewhere without pausing locomotion. An explicit mass estimation module feeds real-time weight data into the whole-body controller so the robot can adjust its balance and motion for different object masses. In simulation the method reaches 86 percent success with loads to 2.3 kg; in hardware tests across six scenarios it averages 73 percent success with loads to 1.3 kg and finishes each task in roughly four seconds. The approach replaces the slow, stop-and-go sequences of earlier work with continuous, concurrent locomotion and manipulation. Readers care because it moves legged mobile manipulators closer to practical use in homes or warehouses where objects vary in weight and location.

Core claim

The authors claim that adding an explicit mass estimation module to a hierarchical reinforcement learning policy produces adaptive whole-body control that supports dynamic, continuous pick-and-place on a quadruped with arm, yielding 86.05 percent success in simulation for payloads up to 2.3 kg and 73.3 percent average success in real experiments for payloads up to 1.3 kg over ground-to-1.1 m heights, while prior methods required slow piecewise execution for only lightweight objects.

What carries the argument

Hierarchical reinforcement learning policy combined with an explicit real-time mass estimation module that supplies payload weight to the adaptive whole-body controller.

If this is right

The robot can switch between objects of different masses without retraining or separate controllers.
Pick-and-place becomes possible across an extended workspace from floor level to 1.1 m tabletops in a single continuous motion.
Average task time stays near four seconds because locomotion and arm motion run together instead of in sequence.
Heavier payloads up to 2.3 kg become feasible at least in simulation without sacrificing stability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mass-aware controller could support additional skills such as pushing or carrying objects while walking.
Transfer to outdoor or cluttered settings would test whether the learned policies remain stable when visual and terrain conditions change.
Scaling the approach to longer sequences or multi-object tasks would reveal whether the hierarchical structure continues to prevent interference between locomotion and manipulation.

Load-bearing premise

The mass estimation module supplies accurate enough real-time payload weights to prevent the added load from destabilizing the quadruped's locomotion.

What would settle it

Run the same tasks after deliberately disabling or degrading the mass estimation module and measure whether success rate drops sharply for objects whose weights differ from the training distribution.

Figures

Figures reproduced from arXiv: 2605.15713 by Donghoon Youm, Donghyuk Choi, HyeongJun Kim, Hyunsik Oh, Jemin Hwangbo, Jie Song, Jiseong Lee, Juhyeok Mun, Jungwoo Hur, Moonkyu Jung, Zhengmao He.

**Figure 1.** Figure 1: Dynamic Pick-and-Place of Legged Manipulator: Sequential snapshots capturing real-world deployments using the proposed framework. The top sequence shows transferring an object from the floor to a 1.1 mhigh tabletop, and the bottom sequence shows placing an object from the tabletop back onto the floor. These demonstrate the framework’s capability to coordinate agile locomotion and precise manipulation, ena… view at source ↗

**Figure 2.** Figure 2: Hierarchical training framework: Our framework consists of two training stages. In Step 1, a low-level locomotion controller is trained via reinforcement learning to generate robust whole-body stabilization under random arm disturbances. In Step 2, the low-level actor is frozen, and a high-level pick-and-place controller is trained jointly with an LSTM estimator that predicts object mass and contact state … view at source ↗

**Figure 3.** Figure 3: Disturbance rejection performance of the low-level controller: Top: Yaw-velocity fluctuation under yaw disturbances induced by periodic motion of Joint 1 (amplitude 1 rad, frequency 0.8 Hz). Our controller achieves a yawvelocity RMSE of 0.026 rad/s, compared to 0.117 rad/s for the baseline. Bottom: Pitch-angle fluctuation under pitch disturbances induced by periodic motion of Joint 3 (amplitude 1 rad, fre… view at source ↗

**Figure 4.** Figure 4: Real-time mass estimates during four pick–and–place episodes. The estimator begins each episode with an initial guess near the trainingdistribution mean of approximately 1.3 kg. After pick-up, the estimate shows a brief transient response before settling near the true object mass as the robot lifts and transports the object. The steady-state estimates at the release moment are [0.5319, 0.9973, 1.5350, 2.0… view at source ↗

**Figure 5.** Figure 5: Base and end-effector motion profile in a single episode: At the moment the gripper closes (rising edge of the gripper action, in red line), the quadruped base maintains a linear speed of approximately 1 m/s (in blue line) and an angular velocity of about −1 rad/s (in green line), while the endeffector speed has already reduced to around 0.2 m/s (in yellow line). value propagation across task phases, the … view at source ↗

read the original abstract

Legged manipulators extend robotic capabilities beyond static manipulation by integrating agile locomotion with versatile arm control. However, achieving precise manipulation while maintaining coordinated locomotion remains a major challenge. This work presents a hierarchical reinforcement learning framework for dynamic pick-and-place tasks using a quadruped equipped with a 6-DOF robotic arm. The framework incorporates an explicit mass estimation module enabling adaptive whole-body control for objects with varying weights. In simulation, the system achieves an 86.05% success rate with payloads up to 2.3 kg. The approach is further validated through real-world experiments across six representative scenarios with controlled variations in object physical properties (size and mass) and task heights. Specifically, within a wide vertical workspace ranging from ground level to 1.1~m-high tabletops, the system demonstrates an average success rate of 73.3% for payloads up to 1.3 kg, with an average execution time of 4.06 s. Unlike prior works that handle lightweight objects and execute pick-and-place motions with slow, piecewise motions, the proposed framework exploits concurrent locomotion and manipulation for dynamic, continuous execution. These results demonstrate the potential of quadrupedal mobile manipulators for adaptive, whole-body pick-and-place with heavier payloads and extended workspaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows concurrent dynamic pick-and-place on a quadruped arm for up to 1.3 kg real-world payloads via hierarchical RL plus mass estimation, with decent cross-scenario tests, but the estimator's accuracy and contribution stay unquantified.

read the letter

Colleague, the main thing to know is that this work gets a quadruped with a 6-DOF arm to do pick-and-place while still locomoting, for payloads up to 1.3 kg across ground-to-1.1 m heights, reporting 73.3% average real success and 4-second times. That is a concrete step past the slow, stop-and-go methods that dominate prior legged manipulation papers. In simulation they reach 86% success up to 2.3 kg. The hierarchical RL plus explicit mass estimation module is what lets the whole-body controller adapt to changing object weights on the fly and keep motions continuous rather than piecewise. They back the claim with real-robot trials that vary object size, mass, and task height in six controlled scenarios, which is more than many similar efforts deliver. The experimental design itself is the strongest part: controlled variations and hardware deployment give a reader a clear picture of where the system holds up and where it does not. The soft spot is the mass estimation module. The paper presents it as the key enabler for stable adaptive control under varying loads, yet the abstract and summary give no error statistics, no accuracy numbers during trials, and no ablation that replaces the estimator with a fixed mass value. Without those, it is hard to tell how much of the reported success comes from the new module versus the base controller's robustness margin. Training procedure and reward details are also light, which limits immediate reproduction. This paper is for people working on legged mobile manipulators and RL control for dynamic tasks. A reader who needs practical ideas for whole-body pick-and-place in logistics-style settings will find the scenario coverage and success rates useful. It has enough real hardware grounding to deserve a serious referee, even if the review will likely ask for estimator metrics and more method transparency. I would send it to peer review with those specific requests.

Referee Report

1 major / 1 minor

Summary. The paper presents a hierarchical reinforcement learning framework for dynamic pick-and-place tasks on a quadruped with a 6-DOF arm. It incorporates an explicit mass estimation module for adaptive whole-body control under varying payloads. Simulation results show 86.05% success with payloads up to 2.3 kg; real-world tests across six scenarios (ground to 1.1 m heights) yield 73.3% average success with payloads up to 1.3 kg and 4.06 s average execution time, emphasizing concurrent locomotion and manipulation versus prior piecewise approaches.

Significance. If validated, the work advances legged mobile manipulation by showing adaptive, dynamic pick-and-place with heavier payloads and larger workspaces than prior slow methods. Real-world experiments with controlled variations in object size, mass, and height provide practical evidence of concurrent locomotion-manipulation capability, which could enable more agile robotic systems in unstructured settings.

major comments (1)

The explicit mass estimation module is described as the key enabler for stable adaptive control with varying payloads (up to 1.3 kg in real tests), yet the manuscript reports no quantitative accuracy metrics, error statistics during trials, or ablation studies (e.g., constant-mass baseline). Without these, it is unclear whether estimation errors stay within the whole-body controller's robustness margin, undermining attribution of the 73.3% real-world success rate and concurrent execution to the proposed framework.

minor comments (1)

The abstract omits details on the training procedure, reward design, and statistical significance of the reported success rates; adding a brief summary of these would improve completeness without altering the core contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below and will revise the paper accordingly to strengthen the presentation of the mass estimation module.

read point-by-point responses

Referee: The explicit mass estimation module is described as the key enabler for stable adaptive control with varying payloads (up to 1.3 kg in real tests), yet the manuscript reports no quantitative accuracy metrics, error statistics during trials, or ablation studies (e.g., constant-mass baseline). Without these, it is unclear whether estimation errors stay within the whole-body controller's robustness margin, undermining attribution of the 73.3% real-world success rate and concurrent execution to the proposed framework.

Authors: We agree that additional quantitative validation of the mass estimation module would improve the manuscript. In the revised version, we will include: (1) accuracy metrics such as mean absolute error and standard deviation of mass estimates across simulation and real-world trials with different payloads; (2) error statistics observed during the six real-world scenarios; and (3) an ablation study comparing the full framework against a constant-mass baseline. These additions will demonstrate that estimation errors remain within the robustness margins of the whole-body controller and support attribution of the reported success rates and concurrent locomotion-manipulation performance to the proposed approach. revision: yes

Circularity Check

0 steps flagged

Empirical RL success rates show no circular derivation chain

full rationale

The manuscript describes a hierarchical reinforcement learning framework whose headline results are measured success rates (86.05 % simulation, 73.3 % real-world) obtained after policy training. No equations, first-principles derivations, or parameter fits are presented that reduce the reported performance numbers to quantities defined by the authors' own inputs or prior self-citations. The explicit mass-estimation module is introduced as an architectural component whose contribution is assessed through end-to-end task outcomes rather than by construction or renaming of fitted statistics. Consequently the central claims remain independent of the circularity patterns enumerated in the analyzer guidelines.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central performance claims rest on standard RL assumptions plus the unverified accuracy of the introduced mass estimation module; no explicit free parameters or invented physical entities beyond the module itself are detailed in the abstract.

axioms (1)

domain assumption The robot and environment can be modeled as a Markov decision process suitable for hierarchical reinforcement learning.
Implicit in any RL control framework for continuous robot tasks.

invented entities (1)

explicit mass estimation module no independent evidence
purpose: To provide real-time estimates of object weight for adaptive whole-body control during dynamic pick-and-place.
Introduced as the key component enabling handling of varying payloads; no independent falsifiable evidence outside the reported success rates is given in the abstract.

pith-pipeline@v0.9.0 · 5793 in / 1621 out tokens · 46325 ms · 2026-05-20T18:59:00.127913+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hierarchical reinforcement learning framework... explicit mass estimation module enabling adaptive whole-body control

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 1 internal anchor

[1]

Anymal parkour: Learning agile navigation for quadrupedal robots,

D. Hoeller, N. Rudin, D. Sako, and M. Hutter, “Anymal parkour: Learning agile navigation for quadrupedal robots,”Science Robotics, vol. 9, no. 88, p. eadi7566, 2024

work page 2024
[2]

High-speed control and navigation for quadrupedal robots on complex and discrete terrain,

H. Kim, H. Oh, J. Park, Y . Kim, D. Youm, M. Jung, M. Lee, and J. Hwangbo, “High-speed control and navigation for quadrupedal robots on complex and discrete terrain,”Science Robotics, vol. 10, no. 102, p. eads6192, 2025

work page 2025
[3]

A unified mpc framework for whole-body dynamic locomotion and manipulation,

J.-P. Sleiman, F. Farshidian, M. V . Minniti, and M. Hutter, “A unified mpc framework for whole-body dynamic locomotion and manipulation,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4688–4695, 2021

work page 2021
[4]

A collision-free mpc for whole-body dynamic locomotion and manipula- tion,

J.-R. Chiu, J.-P. Sleiman, M. Mittal, F. Farshidian, and M. Hutter, “A collision-free mpc for whole-body dynamic locomotion and manipula- tion,” in2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 4686–4693

work page 2022
[5]

Deep whole-body control: Learning a unified policy for manipulation and locomotion,

Z. Fu, X. Cheng, and D. Pathak, “Deep whole-body control: Learning a unified policy for manipulation and locomotion,” inConference on Robot Learning (CoRL), 2022

work page 2022
[6]

Visual whole-body control for legged loco-manipulation,

M. Liu, Z. Chen, X. Cheng, Y . Ji, R. Qiu, R. Yang, and X. Wang, “Visual whole-body control for legged loco-manipulation,” inProceedings of the 2024 Conference on Robot Learning, 2024

work page 2024
[7]

Whole- body end-effector pose tracking,

T. Portela, A. Cramariuc, M. Mittal, and M. Hutter, “Whole- body end-effector pose tracking,” 2024. [Online]. Available: https: //arxiv.org/abs/2409.16048

work page arXiv 2024
[8]

Learning force control for legged manipulation,

T. Portela, G. B. Margolis, Y . Ji, and P. Agrawal, “Learning force control for legged manipulation,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 15 366–15 372

work page 2024
[9]

Gamma: Graspability-aware mobile manipulation policy learning based on online grasping pose fusion,

J. Zhang, N. Gireesh, J. Wang, X. Fang, C. Xu, W. Chen, L. Dai, and H. Wang, “Gamma: Graspability-aware mobile manipulation policy learning based on online grasping pose fusion,” in2024 IEEE Interna- tional Conference on Robotics and Automation (ICRA), 2024, pp. 1399– 1405

work page 2024
[10]

Asc: Adaptive skill coordination for robotic mobile manipulation,

N. Yokoyama, A. Clegg, J. Truong, E. Undersander, T.-Y . Yang, S. Ar- naud, S. Ha, D. Batra, and A. Rai, “Asc: Adaptive skill coordination for robotic mobile manipulation,”IEEE Robotics and Automation Letters, vol. 9, no. 1, pp. 779–786, 2024

work page 2024
[11]

Combining learning-based locomotion policy with model-based manipulation for legged mobile manipulators,

Y . Ma, F. Farshidian, T. Miki, J. Lee, and M. Hutter, “Combining learning-based locomotion policy with model-based manipulation for legged mobile manipulators,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2377–2384, 2022

work page 2022
[12]

Learning to open and traverse doors with a legged manipulator,

M. Zhang, Y . Ma, T. Miki, and M. Hutter, “Learning to open and traverse doors with a legged manipulator,”arXiv preprint arXiv:2409.04882, 2024

work page arXiv 2024
[13]

Learning coor- dinated badminton skills for legged manipulators,

Y . Ma, A. Cramariuc, F. Farshidian, and M. Hutter, “Learning coor- dinated badminton skills for legged manipulators,”Science Robotics, vol. 10, no. 102, p. eadu3922, 2025

work page 2025
[14]

Bridging the sim-to-real gap for athletic loco-manipulation,

N. Fey, G. B. Margolis, M. Peticco, and P. Agrawal, “Bridging the sim-to-real gap for athletic loco-manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2502.10894

work page arXiv 2025
[15]

Learning multi-stage pick-and-place with a legged mobile manipulator,

H. Zhang, H. Yu, L. Zhao, A. Choi, Q. Bai, Y . Yang, and W. Xu, “Learning multi-stage pick-and-place with a legged mobile manipulator,” IEEE Robotics and Automation Letters, 2025

work page 2025
[16]

Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,

G. Ji, J. Mun, H. Kim, and J. Hwangbo, “Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4630–4637, 2022

work page 2022
[17]

Walk these ways: Tuning robot control for generalization with multiplicity of behavior,

G. B. Margolis and P. Agrawal, “Walk these ways: Tuning robot control for generalization with multiplicity of behavior,”Conference on Robot Learning, 2022

work page 2022
[18]

Asymmetric actor critic for image-based robot learning,

L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel, “Asymmetric actor critic for image-based robot learning,” inRobotics: Science and Systems, 2018

work page 2018
[19]

Learning whole-body manipulation for quadrupedal robot,

S. Jeon, M. Jung, S. Choi, B. Kim, and J. Hwangbo, “Learning whole-body manipulation for quadrupedal robot,”IEEE Robotics and Automation Letters, vol. 9, no. 1, pp. 699–706, 2024

work page 2024
[20]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Per-contact iteration method for solving contact dynamics,

J. Hwangbo, J. Lee, and M. Hutter, “Per-contact iteration method for solving contact dynamics,”IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 895–902, 2018. [Online]. Available: www.raisim.com

work page 2018

[1] [1]

Anymal parkour: Learning agile navigation for quadrupedal robots,

D. Hoeller, N. Rudin, D. Sako, and M. Hutter, “Anymal parkour: Learning agile navigation for quadrupedal robots,”Science Robotics, vol. 9, no. 88, p. eadi7566, 2024

work page 2024

[2] [2]

High-speed control and navigation for quadrupedal robots on complex and discrete terrain,

H. Kim, H. Oh, J. Park, Y . Kim, D. Youm, M. Jung, M. Lee, and J. Hwangbo, “High-speed control and navigation for quadrupedal robots on complex and discrete terrain,”Science Robotics, vol. 10, no. 102, p. eads6192, 2025

work page 2025

[3] [3]

A unified mpc framework for whole-body dynamic locomotion and manipulation,

J.-P. Sleiman, F. Farshidian, M. V . Minniti, and M. Hutter, “A unified mpc framework for whole-body dynamic locomotion and manipulation,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4688–4695, 2021

work page 2021

[4] [4]

A collision-free mpc for whole-body dynamic locomotion and manipula- tion,

J.-R. Chiu, J.-P. Sleiman, M. Mittal, F. Farshidian, and M. Hutter, “A collision-free mpc for whole-body dynamic locomotion and manipula- tion,” in2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 4686–4693

work page 2022

[5] [5]

Deep whole-body control: Learning a unified policy for manipulation and locomotion,

Z. Fu, X. Cheng, and D. Pathak, “Deep whole-body control: Learning a unified policy for manipulation and locomotion,” inConference on Robot Learning (CoRL), 2022

work page 2022

[6] [6]

Visual whole-body control for legged loco-manipulation,

M. Liu, Z. Chen, X. Cheng, Y . Ji, R. Qiu, R. Yang, and X. Wang, “Visual whole-body control for legged loco-manipulation,” inProceedings of the 2024 Conference on Robot Learning, 2024

work page 2024

[7] [7]

Whole- body end-effector pose tracking,

T. Portela, A. Cramariuc, M. Mittal, and M. Hutter, “Whole- body end-effector pose tracking,” 2024. [Online]. Available: https: //arxiv.org/abs/2409.16048

work page arXiv 2024

[8] [8]

Learning force control for legged manipulation,

T. Portela, G. B. Margolis, Y . Ji, and P. Agrawal, “Learning force control for legged manipulation,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 15 366–15 372

work page 2024

[9] [9]

Gamma: Graspability-aware mobile manipulation policy learning based on online grasping pose fusion,

J. Zhang, N. Gireesh, J. Wang, X. Fang, C. Xu, W. Chen, L. Dai, and H. Wang, “Gamma: Graspability-aware mobile manipulation policy learning based on online grasping pose fusion,” in2024 IEEE Interna- tional Conference on Robotics and Automation (ICRA), 2024, pp. 1399– 1405

work page 2024

[10] [10]

Asc: Adaptive skill coordination for robotic mobile manipulation,

N. Yokoyama, A. Clegg, J. Truong, E. Undersander, T.-Y . Yang, S. Ar- naud, S. Ha, D. Batra, and A. Rai, “Asc: Adaptive skill coordination for robotic mobile manipulation,”IEEE Robotics and Automation Letters, vol. 9, no. 1, pp. 779–786, 2024

work page 2024

[11] [11]

Combining learning-based locomotion policy with model-based manipulation for legged mobile manipulators,

Y . Ma, F. Farshidian, T. Miki, J. Lee, and M. Hutter, “Combining learning-based locomotion policy with model-based manipulation for legged mobile manipulators,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2377–2384, 2022

work page 2022

[12] [12]

Learning to open and traverse doors with a legged manipulator,

M. Zhang, Y . Ma, T. Miki, and M. Hutter, “Learning to open and traverse doors with a legged manipulator,”arXiv preprint arXiv:2409.04882, 2024

work page arXiv 2024

[13] [13]

Learning coor- dinated badminton skills for legged manipulators,

Y . Ma, A. Cramariuc, F. Farshidian, and M. Hutter, “Learning coor- dinated badminton skills for legged manipulators,”Science Robotics, vol. 10, no. 102, p. eadu3922, 2025

work page 2025

[14] [14]

Bridging the sim-to-real gap for athletic loco-manipulation,

N. Fey, G. B. Margolis, M. Peticco, and P. Agrawal, “Bridging the sim-to-real gap for athletic loco-manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2502.10894

work page arXiv 2025

[15] [15]

Learning multi-stage pick-and-place with a legged mobile manipulator,

H. Zhang, H. Yu, L. Zhao, A. Choi, Q. Bai, Y . Yang, and W. Xu, “Learning multi-stage pick-and-place with a legged mobile manipulator,” IEEE Robotics and Automation Letters, 2025

work page 2025

[16] [16]

Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,

G. Ji, J. Mun, H. Kim, and J. Hwangbo, “Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4630–4637, 2022

work page 2022

[17] [17]

Walk these ways: Tuning robot control for generalization with multiplicity of behavior,

G. B. Margolis and P. Agrawal, “Walk these ways: Tuning robot control for generalization with multiplicity of behavior,”Conference on Robot Learning, 2022

work page 2022

[18] [18]

Asymmetric actor critic for image-based robot learning,

L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel, “Asymmetric actor critic for image-based robot learning,” inRobotics: Science and Systems, 2018

work page 2018

[19] [19]

Learning whole-body manipulation for quadrupedal robot,

S. Jeon, M. Jung, S. Choi, B. Kim, and J. Hwangbo, “Learning whole-body manipulation for quadrupedal robot,”IEEE Robotics and Automation Letters, vol. 9, no. 1, pp. 699–706, 2024

work page 2024

[20] [20]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

Per-contact iteration method for solving contact dynamics,

J. Hwangbo, J. Lee, and M. Hutter, “Per-contact iteration method for solving contact dynamics,”IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 895–902, 2018. [Online]. Available: www.raisim.com

work page 2018