pith. sign in

arxiv: 2606.19928 · v1 · pith:OIBLMHQCnew · submitted 2026-06-18 · 💻 cs.RO

SWAP: Symmetric Equivariant World-Model for Agile Robot Parkour

Pith reviewed 2026-06-26 17:00 UTC · model grok-4.3

classification 💻 cs.RO
keywords symmetry equivarianceworld modellegged locomotionquadruped parkourequivariant networksreinforcement learninggeometric generalizationzero-shot transfer
0
0 comments X

The pith

Embedding left-right symmetry directly into a robot world model and policies enables record parkour leaps and climbs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Purely data-driven world models must encode left-right symmetric interactions as separate patterns, inflating the learning burden and limiting how well the latent space supports agile control. The paper introduces SWAP, an end-to-end framework that hard-codes symmetry equivariance into the world model and the actor-critic networks. This produces a quadruped that leaps a 2.13 m gap and climbs a 1.63 m platform in the real world while generalizing to unseen mirrored terrains and transferring zero-shot to new outdoor settings. A sympathetic reader would care because the result indicates that geometric symmetry can serve as a structural prior that materially expands the physical capabilities of learned locomotion.

Core claim

The SWAP framework embeds symmetry equivariance directly into both the world model and the actor-critic networks, allowing the robot to leap a 2.13 m gap and climb a 1.63 m platform while showing robust generalization to unseen mirrored terrains and zero-shot transfer across outdoor environments.

What carries the argument

The SWAP symmetric equivariant world model, which enforces left-right symmetry in latent dynamics predictions and control policies.

If this is right

  • The robot reaches higher physical performance limits than prior data-driven methods on the same hardware.
  • Generalization to mirrored obstacle layouts occurs without retraining or additional data.
  • Zero-shot transfer succeeds across diverse outdoor environments.
  • Symmetry reduces redundant encoding of bilateral interactions in the latent space.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same symmetry prior could be added to other bilateral robot tasks such as manipulation or bipedal walking.
  • Training data requirements for agile behaviors may decrease because symmetric features are learned once.
  • The approach could be combined with other geometric priors to further compress the latent representation.

Load-bearing premise

That directly embedding symmetry into the world model and actor-critic networks captures all necessary geometric regularities without discarding asymmetric information required for stable control.

What would settle it

Training an otherwise identical non-equivariant world model on the same data and checking whether it matches the 2.13 m gap and 1.63 m climb distances plus mirrored-terrain robustness.

Figures

Figures reproduced from arXiv: 2606.19928 by Chaojie Fu, Chengkai Su, Choi Lam Wong, Hongtao Wang, Hongyi Li, Kaixin Lan, Lei Jiang, Yongbin Jin, Ze Wang.

Figure 1
Figure 1. Figure 1: Our SWAP framework enables the Apollo quadruped robot to perform highly dynamic locomotion on challenging terrains. The trained policy [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the SWAP framework. The architecture consists of a low-frequency Symmetric Equivariant World Model (blue) and a high-frequency [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Analysis of symmetric terrain transfer. (a) Evaluation setup: Policies are trained exclusively on unilaterally tilted terrains and evaluated directly on [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance comparison on extreme parkour tasks. Left: Success [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual comparison of motion strategies. (a) Gap leaping: SWAP generates powerful forward and upward impulses. (b) Box climbing: SWAP [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Zero-shot generalization in diverse outdoor environments. The unified policy enables the robot to robustly tackle four underlying physical and perceptual challenges: (a) High-dynamic maneuvers on a wet granite platform and outdoor gaps. (b) Perception degradation caused by specular reflections in shallow water and darkness on staircases. (c) Visual￾proprioceptive mismatch from non-rigid obstacles such as t… view at source ↗
read the original abstract

While latent world models enable the proactive predictions required for extreme parkour, their purely data-driven nature forces them to redundantly encode left-right symmetric interactions as independent patterns. This inflates the learning burden and hinders the capture of geometric regularities, restricting the latent space's efficiency for downstream policies. To address this, we propose SWAP, an end-to-end equivariant symmetric world model. This framework embeds symmetry directly into both the world model and the actor-critic networks. In real-world tests, the robot leaps across a 2.13 m gap and climbs a 1.63 m platform, breaking records for quadruped parkour. Furthermore, the framework exhibits robust geometric generalization to unseen mirrored terrains and exceptional zero-shot transferability across diverse outdoor environments. These results demonstrate that symmetry equivariance is an effective structural prior for pushing the physical boundaries of learned legged locomotion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes SWAP, an end-to-end equivariant symmetric world model that embeds symmetry directly into the world model and actor-critic networks for quadruped parkour. It claims this structural prior reduces redundant encoding of left-right symmetric interactions, enabling record real-world performance (2.13 m gap leap, 1.63 m platform climb) and robust generalization to unseen mirrored terrains plus zero-shot outdoor transfer.

Significance. If the experimental claims hold, the work demonstrates that symmetry equivariance can serve as an effective architectural prior for extreme legged locomotion, improving latent-space efficiency and geometric generalization without requiring additional data or parameters for symmetric patterns.

major comments (2)
  1. [Abstract] Abstract: the central performance claims (2.13 m gap, 1.63 m platform) are presented without any baseline comparisons, ablation results, or statistical details, making it impossible to isolate the contribution of the equivariant design from other factors such as training regime or hardware.
  2. [Abstract] The manuscript does not specify how the reflection symmetry is realized in the latent space or actor-critic (e.g., via group-equivariant layers or data augmentation), leaving open whether asymmetric task information required for stable control is preserved or discarded.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the feedback. The abstract is space-constrained and emphasizes the core claims and contributions, but we agree that additional context would strengthen it. The full manuscript contains the requested experimental details and architectural specifications. We address each point below and will revise the abstract accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claims (2.13 m gap, 1.63 m platform) are presented without any baseline comparisons, ablation results, or statistical details, making it impossible to isolate the contribution of the equivariant design from other factors such as training regime or hardware.

    Authors: The abstract highlights the record performances as the primary outcome. The full manuscript includes extensive baseline comparisons (e.g., against non-equivariant world models and standard RL policies), ablation studies isolating the symmetry components, and statistical results from repeated trials with variance reported in the Experiments and Results sections. To better isolate the equivariant contribution in the abstract itself, we will add a brief clause noting the performance gains relative to ablated variants. revision: yes

  2. Referee: [Abstract] The manuscript does not specify how the reflection symmetry is realized in the latent space or actor-critic (e.g., via group-equivariant layers or data augmentation), leaving open whether asymmetric task information required for stable control is preserved or discarded.

    Authors: The abstract provides a high-level overview. The full manuscript details the realization via group-equivariant layers (specifically, reflection-equivariant convolutions and linear layers) applied to the world model encoder/decoder and actor-critic networks, as described in the Methods and Architecture sections. Asymmetric task information is preserved by applying equivariance only to the symmetric state features while retaining task-specific asymmetric inputs (e.g., goal directions) in the policy head. We will insert a short clarifying phrase in the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes SWAP as a new end-to-end architectural framework that directly embeds symmetry equivariance into the latent world model and actor-critic networks. The central claim rests on this design choice as a structural prior, validated by empirical real-world performance (2.13 m gap leap, 1.63 m platform climb, generalization to mirrored terrains). No derivation chain, first-principles prediction, or fitted parameter is presented that reduces by construction to its own inputs; the symmetry embedding is an explicit modeling decision rather than a result derived from data or self-citation. The argument is self-contained as an engineering prior with external falsifiability through robot experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities can be identified. Likely relies on standard assumptions of equivariant neural networks and reinforcement learning but these are not detailed.

pith-pipeline@v0.9.1-grok · 5700 in / 1094 out tokens · 22669 ms · 2026-06-26T17:00:41.859402+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 2 linked inside Pith

  1. [1]

    Symmetry in markov decision pro- cesses and its implications for single agent and multiagent learning,

    M. Zinkevich and T. R. Balch, “Symmetry in markov decision pro- cesses and its implications for single agent and multiagent learning,” inProceedings of the eighteenth international conference on machine learning, 2001, p. 632

  2. [2]

    On-robot learning with equivariant models,

    D. Wang, M. Jia, X. Zhu, R. Walters, and R. Platt, “On-robot learning with equivariant models,”arXiv preprint arXiv:2203.04923, 2022

  3. [3]

    Learning latent dynamics for planning from pixels,

    D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning latent dynamics for planning from pixels,” in International conference on machine learning. PMLR, 2019, pp. 2555–2565

  4. [4]

    Dream to control: Learning behaviors by latent imagination,

    D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to control: Learning behaviors by latent imagination,” inProceedings of the International Conference on Learning Representations (ICLR), 2020. [Online]. Available: https://openreview.net/forum?id=S1l7maskCr

  5. [5]

    Day- dreamer: World models for physical robot learning,

    P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg, “Day- dreamer: World models for physical robot learning,” inConference on robot learning. PMLR, 2023, pp. 2226–2240

  6. [6]

    World model-based perception for visual legged locomotion,

    H. Lai, J. Cao, J. Xu, H. Wu, Y . Lin, T. Kong, Y . Yu, and W. Zhang, “World model-based perception for visual legged locomotion,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 11 531–11 537

  7. [7]

    Extreme parkour with legged robots,

    X. Cheng, K. Shi, A. Agarwal, and D. Pathak, “Extreme parkour with legged robots,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 11 443–11 450

  8. [8]

    Model-free reinforce- ment learning for robust locomotion using demonstrations from tra- jectory optimization,

    M. Bogdanovic, M. Khadiv, and L. Righetti, “Model-free reinforce- ment learning for robust locomotion using demonstrations from tra- jectory optimization,”Frontiers in Robotics and AI, vol. 9, p. 854212, 2022

  9. [9]

    Lumos: Language-conditioned imitation learning with world models,

    I. Nematollahi, B. DeMoss, A. L. Chandra, N. Hawes, W. Burgard, and I. Posner, “Lumos: Language-conditioned imitation learning with world models,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 8219–8225

  10. [10]

    Parkour in the wild: Learn- ing a general and extensible agile locomotion policy using multi-expert distillation and rl fine-tuning,

    N. Rudin, J. He, J. Aurand, and M. Hutter, “Parkour in the wild: Learn- ing a general and extensible agile locomotion policy using multi-expert distillation and rl fine-tuning,”arXiv preprint arXiv:2505.11164v1, 2025

  11. [11]

    Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning,

    I. M. A. Nahrendra, B. Yu, and H. Myung, “Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5078–5084

  12. [12]

    Pie: Parkour with implicit-explicit learning framework for legged robots,

    S. Luo, S. Li, R. Yu, W. Zhicheng, J. Wu, and Q. Zhu, “Pie: Parkour with implicit-explicit learning framework for legged robots,”IEEE Robotics and Automation Letters, vol. PP, pp. 1–8, 11 2024

  13. [13]

    Start: Traversing sparse footholds with terrain reconstruction,

    R. Yu, Q. Wang, H. Li, Z. Jun, Z. Wang, J. Wu, and Q. Zhu, “Start: Traversing sparse footholds with terrain reconstruction,”IEEE Robotics and Automation Letters, vol. 11, no. 2, pp. 2194–2201, 2025

  14. [14]

    World models,

    D. Ha and J. Schmidhuber, “World models,”arXiv preprint arXiv:1803.10122, 2018. [Online]. Available: https://arxiv.org/abs/ 1803.10122

  15. [15]

    Mastering atari with discrete world models,

    D. Hafner, T. Lillicrap, M. Norouzi, and J. Ba, “Mastering atari with discrete world models,” inInternational Conference on Learning Representations (ICLR), 2021

  16. [16]

    Mastering diverse control tasks through world models,

    D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering diverse control tasks through world models,”Nature, vol. 640, no. 8059, pp. 647–653, 2025. [Online]. Available: https://doi.org/10.1038/ s41586-025-08744-2

  17. [17]

    Invariant transform experience replay: Data augmentation for deep re- inforcement learning,

    Y . Lin, J. Huang, M. Zimmer, Y . Guan, J. Rojas, and P. Weng, “Invariant transform experience replay: Data augmentation for deep re- inforcement learning,”IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6615–6622, 2020

  18. [18]

    Learning symmetric and low-energy locomotion,

    W. Yu, G. Turk, and C. K. Liu, “Learning symmetric and low-energy locomotion,”ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1–12, 2018

  19. [19]

    Symme- try considerations for learning task symmetric robot policies,

    M. Mittal, N. Rudin, V . Klemm, A. Allshire, and M. Hutter, “Symme- try considerations for learning task symmetric robot policies,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 7433–7439

  20. [20]

    Group equivariant convolutional networks,

    T. Cohen and M. Welling, “Group equivariant convolutional networks,” inInternational Conference on Machine Learning (ICML). PMLR, 2016, pp. 2990–2999

  21. [21]

    Mdp homomorphic networks: Group symmetries in re- inforcement learning,

    E. van der Pol, D. E. Worrall, H. van Hoof, F. A. Oliehoek, and M. Welling, “Mdp homomorphic networks: Group symmetries in re- inforcement learning,” inAdvances in Neural Information Processing Systems, vol. 33, 2020

  22. [22]

    Tensor field networks: Rotation-and translation- equivariant neural networks for 3d point clouds,

    N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, and P. Riley, “Tensor field networks: Rotation-and translation- equivariant neural networks for 3d point clouds,”arXiv preprint arXiv:1802.08219, 2018

  23. [23]

    Leveraging symmetry in rl- based legged locomotion control,

    Z. Su, X. Huang, D. Ordo ˜nez-Apraez, Y . Li, Z. Li, Q. Liao, G. Turrisi, M. Pontil, C. Semini, Y . Wuet al., “Leveraging symmetry in rl- based legged locomotion control,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 6899–6906

  24. [24]

    Coordinated humanoid robot locomotion with symmetry equivariant reinforcement learning policy,

    B. Nie, Y . Zhang, R. Jin, Z. Cao, H. Lin, X. Yang, and Y . Gao, “Coordinated humanoid robot locomotion with symmetry equivariant reinforcement learning policy,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 22, 2026, pp. 18 523–18 531

  25. [25]

    Symmetry-guided memory augmentation for efficient locomotion learning,

    K. Bao, C. Li, Y . As, A. Krause, and M. Hutter, “Symmetry-guided memory augmentation for efficient locomotion learning,” 2026. [Online]. Available: https://arxiv.org/abs/2502.01521

  26. [26]

    Amp: Adversarial motion priors for stylized physics-based character con- trol,

    X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp: Adversarial motion priors for stylized physics-based character con- trol,”ACM Transactions on Graphics (ToG), vol. 40, no. 4, pp. 1–20, 2021

  27. [27]

    Equivariant reinforcement learning under partial observability,

    H. H. Nguyen, A. Baisero, D. Klee, D. Wang, R. Platt, and C. Amato, “Equivariant reinforcement learning under partial observability,” in Conference on Robot Learning. PMLR, 2023, pp. 3309–3320

  28. [28]

    Robot parkour learning,

    Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao, “Robot parkour learning,”arXiv preprint arXiv:2309.05665, 2023

  29. [29]

    Anymal parkour: Learning agile navigation for quadrupedal robots,

    D. Hoeller, N. Rudin, D. Sako, and M. Hutter, “Anymal parkour: Learning agile navigation for quadrupedal robots,”Science Robotics, vol. 9, no. 88, p. eadi7566, 2024

  30. [30]

    High-speed control and navigation for quadrupedal robots on complex and discrete terrain,

    H. Kim, H. Oh, J. Park, Y . Kim, D. Youm, M. Jung, M. Lee, and J. Hwangbo, “High-speed control and navigation for quadrupedal robots on complex and discrete terrain,”Science Robotics, vol. 10, no. 102, p. eads6192, 2025