WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems

Haohong Lin; Hongjue Zhao; Huajie Shao; Jiangtao Kong; Lu Gan; Sizhe Wei; Tianyi Zhou; Xiaochang Li; Yuchen Wang

arxiv: 2603.14392 · v2 · pith:TVLE4MKYnew · submitted 2026-03-15 · 💻 cs.LG · cs.RO

WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems

Yuchen Wang , Jiangtao Kong , Sizhe Wei , Xiaochang Li , Haohong Lin , Hongjue Zhao , Tianyi Zhou , Lu Gan

show 1 more author

Huajie Shao

This is my paper

Pith reviewed 2026-05-21 11:36 UTC · model grok-4.3

classification 💻 cs.LG cs.RO

keywords trajectory world modelmixture of expertsrobotic systemszero-shot generalizationstructural embeddingsystem embeddingmodel-based controlscalable dynamics model

0 comments

The pith

WestWorld uses a system-aware mixture of experts and structural embeddings to build one trajectory world model that generalizes across many different robots.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents WestWorld, a trajectory world model built to scale to large numbers of distinct robotic systems while incorporating physical structure knowledge. It introduces a system-aware Mixture-of-Experts that routes specialized experts using a learnable system embedding, plus a structural embedding that aligns trajectories with morphological information. Pretrained on 89 environments spanning simulation and real settings, the model shows gains in zero- and few-shot prediction over baselines, scales to varied robots, boosts model-based control, and supports stable real-world locomotion on a quadruped.

Core claim

WestWorld is a knowledge-encoded scalable trajectory world model for diverse robotic systems. It employs a novel system-aware Mixture-of-Experts (Sys-MoE) that dynamically routes and combines experts via a learnable system embedding, together with a structural embedding that aligns trajectory representations with robot physical morphologies. After pretraining on 89 complex environments across simulation and real-world settings, it delivers significant gains over baselines in zero- and few-shot trajectory prediction, exhibits strong scalability, improves downstream model-based control, and produces stable locomotion when deployed on a real Unitree Go1.

What carries the argument

System-aware Mixture-of-Experts (Sys-MoE) with learnable system embedding, augmented by structural embedding for morphological alignment.

Load-bearing premise

The learnable system embedding will let the mixture-of-experts reliably select and align experts for unseen robots without expert interference or per-system retraining.

What would settle it

Train on the 89 environments then measure zero-shot prediction error on a robot whose morphology is absent from training; high error or clear performance drop when more systems are added would falsify the central claim.

Figures

Figures reproduced from arXiv: 2603.14392 by Haohong Lin, Hongjue Zhao, Huajie Shao, Jiangtao Kong, Lu Gan, Sizhe Wei, Tianyi Zhou, Xiaochang Li, Yuchen Wang.

**Figure 2.** Figure 2: Trajectory plot comparison of our method and three baselines for 100-step rollout prediction on three robots: [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison between our method against the best performing SOTA by scaling the number of environments. We can see from [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Sys-MoE routing weights across six layers (L1–L6), each containing four experts (E1–E4), for three robotic [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Real-world deployment on Unitree Go1. The distilled-and-fine-tuned [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: The effect of pre-training on few-shot learning for three different robotic systems: (a) Cassie, (b) A1 and (c) [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

read the original abstract

Trajectory world models play a crucial role in robotic dynamics learning, planning, and control. While recent works have explored trajectory world models for diverse robotic systems, they struggle to scale to a large number of distinct system dynamics and overlook domain knowledge of physical structures. To address these limitations, we introduce WestWorld, a knoWledge-Encoded Scalable Trajectory World model for diverse robotic systems. To tackle the scalability challenge, we propose a novel system-aware Mixture-of-Experts (Sys-MoE) that dynamically combines and routes specialized experts for different robotic systems via a learnable system embedding. To further enhance zero-shot generalization, we incorporate domain knowledge of robot physical structures by introducing a structural embedding that aligns trajectory representations with morphological information. After pretraining on 89 complex environments spanning diverse morphologies across both simulation and real-world settings, WestWorld achieves significant improvements over competitive baselines in zero- and few-shot trajectory prediction. Additionally, it shows strong scalability across a wide range of robotic environments and significantly improves performance on downstream model-based control for different robots. Finally, we deploy our model on a real-world Unitree Go1, where it demonstrates stable locomotion performance. The code is available at https://github.com/511205787/WestWorld.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WestWorld adds Sys-MoE routing via learnable system embeddings plus morphological structural embeddings to trajectory world models, with pretraining on 89 mixed sim-real environments, but the abstract gives little quantitative backing for the zero-shot claims.

read the letter

WestWorld's main point is that a single trajectory world model can handle many different robots by routing through a system-aware mixture of experts that uses a learnable embedding for each system and a structural embedding that brings in morphological knowledge. They pretrain on 89 environments covering diverse morphologies in both simulation and real settings, then show gains in zero- and few-shot prediction, better model-based control, and a working deployment on the Unitree Go1 quadruped for stable locomotion. The code is released, which is useful for checking the details.

Referee Report

3 major / 3 minor

Summary. The paper introduces WestWorld, a knowledge-encoded scalable trajectory world model for diverse robotic systems. It proposes a system-aware Mixture-of-Experts (Sys-MoE) that uses a learnable system embedding to dynamically route and combine specialized experts, combined with a structural embedding that aligns representations with robot morphological information. The model is pretrained on 89 complex environments spanning simulation and real-world settings with diverse morphologies, and the authors claim significant gains over baselines in zero- and few-shot trajectory prediction, strong scalability, improved model-based control performance, and successful real-world deployment on a Unitree Go1 quadruped.

Significance. If the quantitative results and generalization claims hold under rigorous evaluation, this work would represent a meaningful step toward scalable world models that handle many distinct robotic dynamics without per-system retraining. The combination of learnable system embeddings with explicit structural knowledge injection is a concrete technical contribution, and the scale of pretraining (89 environments) plus public code release are positive aspects that could support reproducibility and follow-on research in robotics and model-based RL.

major comments (3)

[§3.2] §3.2 (Sys-MoE architecture): The zero-shot generalization claim for unseen morphologies depends on the learnable system embedding reliably selecting and combining experts without interference. The manuscript provides no ablation on expert count, routing loss formulation, or explicit OOD morphology splits (e.g., training on 70 environments and testing on 19 held-out morphologies), so it is unclear whether the routing mechanism actually supports the no-retraining scalability assertion or collapses for novel systems.
[Table 2, §4.3] Table 2 and §4.3 (zero-shot prediction results): The reported improvements over baselines lack error bars, statistical significance tests, and details on baseline implementations or hyperparameter matching. Without these, it is impossible to determine whether the gains are robust or sensitive to post-hoc choices, which directly affects the strength of the central empirical claim.
[§5] §5 (real-world deployment): The Unitree Go1 locomotion results are presented without quantitative metrics (e.g., tracking error, success rate, or comparison to a non-pretrained baseline) or discussion of sim-to-real gaps in the structural embedding, weakening the claim that the pretrained model transfers stably to hardware.

minor comments (3)

[Abstract] The abstract states performance gains but supplies no numerical values or baseline names; moving at least one key quantitative result (with error bars) into the abstract would improve readability.
[§2.3] Notation for the structural embedding (e.g., how morphological features are encoded and fused with trajectory tokens) is introduced without a clear equation or diagram in §2.3, making the alignment mechanism harder to follow.
[§4] The paper mentions 'competitive baselines' in §4 but does not list them explicitly in a table or appendix; adding this would aid comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. We address each major comment below and have revised the manuscript accordingly to strengthen the presentation of our results and claims.

read point-by-point responses

Referee: [§3.2] §3.2 (Sys-MoE architecture): The zero-shot generalization claim for unseen morphologies depends on the learnable system embedding reliably selecting and combining experts without interference. The manuscript provides no ablation on expert count, routing loss formulation, or explicit OOD morphology splits (e.g., training on 70 environments and testing on 19 held-out morphologies), so it is unclear whether the routing mechanism actually supports the no-retraining scalability assertion or collapses for novel systems.

Authors: We appreciate the referee's emphasis on rigorous validation of the Sys-MoE routing for zero-shot generalization. While the current experiments already evaluate on diverse held-out morphologies within the 89-environment pretraining corpus, we agree that explicit ablations and OOD splits would provide clearer evidence. In the revised manuscript we will add ablations varying the number of experts and the routing loss formulation. We will also report results on an explicit 70/19 train/test morphology split to directly demonstrate that the system embedding enables reliable expert selection without retraining on novel systems. revision: yes
Referee: [Table 2, §4.3] Table 2 and §4.3 (zero-shot prediction results): The reported improvements over baselines lack error bars, statistical significance tests, and details on baseline implementations or hyperparameter matching. Without these, it is impossible to determine whether the gains are robust or sensitive to post-hoc choices, which directly affects the strength of the central empirical claim.

Authors: We agree that the absence of error bars and statistical tests limits the interpretability of the reported gains. In the revision we will augment Table 2 with standard error bars across multiple random seeds, include paired statistical significance tests, and expand §4.3 with explicit descriptions of baseline implementations together with the hyperparameter search ranges used to ensure fair and reproducible comparisons. revision: yes
Referee: [§5] §5 (real-world deployment): The Unitree Go1 locomotion results are presented without quantitative metrics (e.g., tracking error, success rate, or comparison to a non-pretrained baseline) or discussion of sim-to-real gaps in the structural embedding, weakening the claim that the pretrained model transfers stably to hardware.

Authors: The referee correctly identifies that the current real-world section relies primarily on qualitative description. We will revise §5 to report quantitative metrics including tracking error and success rate for the Unitree Go1 experiments, add a comparison against a non-pretrained baseline, and include a dedicated paragraph discussing observed sim-to-real gaps in the structural embedding along with the mechanisms (e.g., morphology alignment) that support stable transfer. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or claims

full rationale

The paper introduces a new model architecture (Sys-MoE with learnable system embedding and structural embedding) that is pretrained on external data from 89 environments and then evaluated on held-out zero-shot and few-shot trajectory prediction tasks plus downstream control. No equations, derivations, or first-principles results are shown that reduce by construction to the inputs, fitted parameters renamed as predictions, or self-citation chains. The central claims rest on empirical performance against baselines rather than tautological re-derivation, so the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 2 invented entities

The abstract introduces two new components (Sys-MoE and structural embedding) whose effectiveness is asserted without detailing the underlying assumptions about embedding alignment or expert specialization.

free parameters (1)

learnable system embedding
A vector representation per robotic system that routes the MoE; its dimensionality and initialization are not specified.

invented entities (2)

Sys-MoE no independent evidence
purpose: Dynamically route specialized experts using system embeddings for different robot dynamics.
New routing mechanism proposed to address scalability across many robotic systems.
structural embedding no independent evidence
purpose: Align trajectory representations with morphological information from robot physical structures.
Introduced to incorporate domain knowledge for better zero-shot generalization.

pith-pipeline@v0.9.0 · 5780 in / 1250 out tokens · 30792 ms · 2026-05-21T11:36:55.262843+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose a novel system-aware Mixture-of-Experts (Sys-MoE) that dynamically combines and routes specialized experts for different robotic systems via a learnable system embedding... structural embedding that aligns trajectory representations with morphological information
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we first model each articulated object as a rooted kinematic tree and convert it to a binary tree using the left-child-right-sibling (LCRS) transformation... embed these indices to obtain a structure embedding

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 7 internal anchors

[1]

Acosta, W

B. Acosta, W. Yang, and M. Posa. Validating robotics simulators on real-world impacts.IEEE Robotics and Automation Letters, 7(3):6471–6478, 2022

work page 2022
[2]

Cosmos World Foundation Model Platform for Physical AI

N. Agarwal, A. Ali, M. Bala, Y . Balaji, E. Barker, T. Cai, P. Chattopadhyay, Y . Chen, Y . Cui, Y . Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Alvarez-Padilla, J

J. Alvarez-Padilla, J. Z. Zhang, S. Kwok, J. M. Dolan, and Z. Manchester. Real-time whole-body control of legged robots with model-predictive path integral control. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 14721–14727. IEEE, 2025

work page 2025
[4]

Belkhale, Y

S. Belkhale, Y . Cui, and D. Sadigh. Hydra: Hybrid robot actions for imitation learning. InConference on Robot Learning, pages 2113–2133. PMLR, 2023

work page 2023
[5]

L. Chen, S. Bahl, and D. Pathak. Playfusion: Skill acquisition via diffusion from language-annotated play. In Conference on Robot Learning, pages 2012–2029. PMLR, 2023

work page 2012
[6]

Z. Chen, V . Badrinarayanan, C.-Y . Lee, and A. Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InInternational conference on machine learning, pages 794–803. PMLR, 2018. 10 APREPRINT- MARCH17, 2026

work page 2018
[7]

X. Chi, P. Jia, C.-K. Fan, X. Ju, W. Mi, K. Zhang, Z. Qin, W. Tian, K. Ge, H. Li, et al. Wow: Towards a world omniscient world model through embodied interaction.arXiv preprint arXiv:2509.22642, 2025

work page arXiv 2025
[8]

K. Chua, R. Calandra, R. McAllister, and S. Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models.Advances in neural information processing systems, 31, 2018

work page 2018
[9]

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

F. Ebert, C. Finn, S. Dasari, A. Xie, A. Lee, and S. Levine. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control.arXiv preprint arXiv:1812.00568, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine. D4rl: Datasets for deep data-driven reinforcement learning.arXiv preprint arXiv:2004.07219, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004
[11]

Gallouédec, E

Q. Gallouédec, E. Beeching, C. Romac, and E. Dellandréa. Jack of all trades, master of some, a multi-purpose transformer agent.arXiv preprint arXiv:2402.09844, 2024

work page arXiv 2024
[12]

Gu and T

A. Gu and T. Dao. Mamba: Linear-time sequence modeling with selective state spaces. InFirst conference on language modeling, 2024

work page 2024
[13]

Gulcehre, Z

C. Gulcehre, Z. Wang, A. Novikov, T. Paine, S. Gómez, K. Zolna, R. Agarwal, J. S. Merel, D. J. Mankowitz, C. Paduraru, et al. Rl unplugged: A suite of benchmarks for offline reinforcement learning.Advances in neural information processing systems, 33:7248–7259, 2020

work page 2020
[14]

Y . Guo, L. X. Shi, J. Chen, and C. Finn. Ctrl-world: A controllable generative world model for robot manipulation. arXiv preprint arXiv:2510.10125, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Gupta, S

A. Gupta, S. Tian, Y . Zhang, J. Wu, R. Martín-Martín, and L. Fei-Fei. Maskvit: Masked visual pre-training for video prediction.arXiv preprint arXiv:2206.11894, 2022

work page arXiv 2022
[16]

Ha and J

D. Ha and J. Schmidhuber. World models

work page
[17]

Ha and J

D. Ha and J. Schmidhuber. Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018

work page 2018
[18]

Hansen, H

N. Hansen, H. Su, and X. Wang. Td-mpc2: Scalable, robust world models for continuous control. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[19]

Hansen, X

N. Hansen, X. Wang, and H. Su. Temporal difference learning for model predictive control. InInternational Conference on Machine Learning, PMLR, 2022

work page 2022
[20]

M. Heo, Y . Lee, D. Lee, and J. J. Lim. Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation.The International Journal of Robotics Research, 44(10-11):1863–1891, 2025

work page 2025
[21]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[22]

S. Hong, D. Yoon, and K.-E. Kim. Structure-aware transformer policy for inhomogeneous multi-task reinforcement learning. InInternational Conference on Learning Representations, 2021

work page 2021
[23]

Huang, I

W. Huang, I. Mordatch, and D. Pathak. One policy to control them all: Shared modular policies for agent-agnostic control. InInternational Conference on Machine Learning, pages 4455–4464. PMLR, 2020

work page 2020
[24]

M. A. Lee, Y . Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg. Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks. In2019 International conference on robotics and automation (ICRA), pages 8943–8950. IEEE, 2019

work page 2019
[25]

H. Liu, S. Nasiriany, L. Zhang, Z. Bao, and Y . Zhu. Robot learning on the job: Human-in-the-loop autonomy and learning during deployment.The International Journal of Robotics Research, 44(10-11):1727–1742, 2025

work page 2025
[26]

X. Long, Q. Zhao, K. Zhang, Z. Zhang, D. Wang, Y . Liu, Z. Shu, Y . Lu, S. Wang, X. Wei, et al. A survey: Learning embodied intelligence from physical simulators and world models.arXiv preprint arXiv:2507.00917, 2025

work page arXiv 2025
[27]

Parthasarathy, N

A. Parthasarathy, N. Kalra, R. Agrawal, Y . LeCun, O. Bounou, P. Izmailov, and M. Goldblum. Closing the train-test gap in world models for gradient-based planning.arXiv preprint arXiv:2512.09929, 2025

work page arXiv 2025
[28]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

work page 2019
[29]

M. H. Raibert, H. B. Brown Jr, M. Chepponis, J. Koechling, and J. K. Hodgins. Dynamically stable legged locomotion. Technical report, 1989

work page 1989
[30]

Sawhney, S

A. Sawhney, S. Lee, K. Zhang, M. Veloso, and O. Kroemer. Playing with food: Learning food item representations through interactive exploration. InInternational Symposium on Experimental Robotics, pages 309–322. Springer, 2020. 11 APREPRINT- MARCH17, 2026

work page 2020
[31]

Saxena, M

S. Saxena, M. Sharma, and O. Kroemer. Multi-resolution sensing for real-time control with vision-language models. InConference on Robot Learning, pages 2210–2228. PMLR, 2023

work page 2023
[32]

Schiavi, P

G. Schiavi, P. Wulkop, G. Rizzi, L. Ott, R. Siegwart, and J. J. Chung. Learning agent-aware affordances for closed- loop interaction with articulated objects. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5916–5922. IEEE, 2023

work page 2023
[33]

A generalist dynamics model for control

I. Schubert, J. Zhang, J. Bruce, S. Bechtle, E. Parisotto, M. Riedmiller, J. T. Springenberg, A. Byravan, L. Hasen- clever, and N. Heess. A generalist dynamics model for control.arXiv preprint arXiv:2305.10912, 2023

work page arXiv 2023
[34]

W. J. Schwind.Spring loaded inverted pendulum running: A plant model. University of Michigan, 1998

work page 1998
[35]

Sekar, O

R. Sekar, O. Rybkin, K. Daniilidis, P. Abbeel, D. Hafner, and D. Pathak. Planning to explore via self-supervised world models. InInternational conference on machine learning, pages 8583–8592. PMLR, 2020

work page 2020
[36]

Y . Tang, W. Yu, J. Tan, H. Zen, A. Faust, and T. Harada. Saytap: Language to quadrupedal locomotion. In7th Annual Conference on Robot Learning

work page
[37]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. De Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, et al. Gymnasium: A standard interface for reinforcement learning environments.arXiv preprint arXiv:2407.17032, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Vuong, S

Q. Vuong, S. Levine, H. R. Walke, K. Pertsch, A. Singh, R. Doshi, C. Xu, J. Luo, L. Tan, D. Shah, et al. Open x-embodiment: Robotic learning datasets and rt-x models. InTowards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition@ CoRL2023, 2023

work page 2023
[39]

Y . Wang, H. Zhao, H. Lin, E. Xu, L. He, and H. Shao. A generalizable physics-enhanced state space model for long-term dynamics forecasting in complex environments. InF orty-second International Conference on Machine Learning, 2025

work page 2025
[40]

S. Wei, X. Chen, F. Xie, G. E. Katz, Z. Gan, and L. Gan. Ms-ppo: Morphological-symmetry-equivariant policy for legged robot locomotion.arXiv preprint arXiv:2512.00727, 2025

work page arXiv 2025
[41]

Y . Wen, Z. Wan, M. Zhou, S. Hou, Z. Cao, C. Le, J. Chen, Z. Tian, W. Zhang, and J. Wang. On realization of intelligent decision-making in the real world: A foundation decision model perspective.arXiv preprint arXiv:2212.12669, 2022

work page arXiv 2022
[42]

Model Predictive Path Integral Control using Covariance Variable Importance Sampling

G. Williams, A. Aldrich, and E. Theodorou. Model predictive path integral control using covariance variable importance sampling.arXiv preprint arXiv:1509.01149, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[43]

P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023

work page 2023
[44]

F. Xie, S. Wei, Y . Song, Y . Yue, and L. Gan. Morphological-symmetry-equivariant heterogeneous graph neural network for robotic dynamics learning. In7th Annual Learning for Dynamics \& Control Conference, pages 1392–1405. PMLR, 2025

work page 2025
[45]

Yarats, D

D. Yarats, D. Brandfonbrener, H. Liu, M. Laskin, P. Abbeel, A. Lazaric, and L. Pinto. Don’t change the algorithm, change the data: Exploratory data for offline reinforcement learning.arXiv preprint arXiv:2201.13425, 2022

work page arXiv 2022
[46]

S. Yin, J. Wu, S. Huang, X. Su, X. He, J. HAO, and M. Long. Trajectory world models for heterogeneous environments. InF orty-second International Conference on Machine Learning, 2025

work page 2025
[47]

Y . Zhou, S. Sonawani, M. Phielipp, H. Ben Amor, and S. Stepputtis. Learning modular language-conditioned robot policies through attention.Autonomous Robots, 47(8):1013–1033, 2023

work page 2023
[48]

F. Zhu, H. Wu, S. Guo, Y . Liu, C. Cheang, and T. Kong. Irasim: A fine-grained world model for robot manipulation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9834–9844, 2025

work page 2025
[49]

X. Zhu, R. Tian, C. Xu, M. Huo, W. Zhan, M. Tomizuka, and M. Ding. Fanuc manipulation: A dataset for learning-based manipulation with fanuc mate 200id robot, 2023. 12 APREPRINT- MARCH17, 2026 A Notations The table below summarizes the notation used in this paper. Lowercase letters (e.g., x) denote scalars, bold lowercase letters (e.g.,x) represent vectors...

work page 2023

[1] [1]

Acosta, W

B. Acosta, W. Yang, and M. Posa. Validating robotics simulators on real-world impacts.IEEE Robotics and Automation Letters, 7(3):6471–6478, 2022

work page 2022

[2] [2]

Cosmos World Foundation Model Platform for Physical AI

N. Agarwal, A. Ali, M. Bala, Y . Balaji, E. Barker, T. Cai, P. Chattopadhyay, Y . Chen, Y . Cui, Y . Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

Alvarez-Padilla, J

J. Alvarez-Padilla, J. Z. Zhang, S. Kwok, J. M. Dolan, and Z. Manchester. Real-time whole-body control of legged robots with model-predictive path integral control. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 14721–14727. IEEE, 2025

work page 2025

[4] [4]

Belkhale, Y

S. Belkhale, Y . Cui, and D. Sadigh. Hydra: Hybrid robot actions for imitation learning. InConference on Robot Learning, pages 2113–2133. PMLR, 2023

work page 2023

[5] [5]

L. Chen, S. Bahl, and D. Pathak. Playfusion: Skill acquisition via diffusion from language-annotated play. In Conference on Robot Learning, pages 2012–2029. PMLR, 2023

work page 2012

[6] [6]

Z. Chen, V . Badrinarayanan, C.-Y . Lee, and A. Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InInternational conference on machine learning, pages 794–803. PMLR, 2018. 10 APREPRINT- MARCH17, 2026

work page 2018

[7] [7]

X. Chi, P. Jia, C.-K. Fan, X. Ju, W. Mi, K. Zhang, Z. Qin, W. Tian, K. Ge, H. Li, et al. Wow: Towards a world omniscient world model through embodied interaction.arXiv preprint arXiv:2509.22642, 2025

work page arXiv 2025

[8] [8]

K. Chua, R. Calandra, R. McAllister, and S. Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models.Advances in neural information processing systems, 31, 2018

work page 2018

[9] [9]

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

F. Ebert, C. Finn, S. Dasari, A. Xie, A. Lee, and S. Levine. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control.arXiv preprint arXiv:1812.00568, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine. D4rl: Datasets for deep data-driven reinforcement learning.arXiv preprint arXiv:2004.07219, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004

[11] [11]

Gallouédec, E

Q. Gallouédec, E. Beeching, C. Romac, and E. Dellandréa. Jack of all trades, master of some, a multi-purpose transformer agent.arXiv preprint arXiv:2402.09844, 2024

work page arXiv 2024

[12] [12]

Gu and T

A. Gu and T. Dao. Mamba: Linear-time sequence modeling with selective state spaces. InFirst conference on language modeling, 2024

work page 2024

[13] [13]

Gulcehre, Z

C. Gulcehre, Z. Wang, A. Novikov, T. Paine, S. Gómez, K. Zolna, R. Agarwal, J. S. Merel, D. J. Mankowitz, C. Paduraru, et al. Rl unplugged: A suite of benchmarks for offline reinforcement learning.Advances in neural information processing systems, 33:7248–7259, 2020

work page 2020

[14] [14]

Y . Guo, L. X. Shi, J. Chen, and C. Finn. Ctrl-world: A controllable generative world model for robot manipulation. arXiv preprint arXiv:2510.10125, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

Gupta, S

A. Gupta, S. Tian, Y . Zhang, J. Wu, R. Martín-Martín, and L. Fei-Fei. Maskvit: Masked visual pre-training for video prediction.arXiv preprint arXiv:2206.11894, 2022

work page arXiv 2022

[16] [16]

Ha and J

D. Ha and J. Schmidhuber. World models

work page

[17] [17]

Ha and J

D. Ha and J. Schmidhuber. Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018

work page 2018

[18] [18]

Hansen, H

N. Hansen, H. Su, and X. Wang. Td-mpc2: Scalable, robust world models for continuous control. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[19] [19]

Hansen, X

N. Hansen, X. Wang, and H. Su. Temporal difference learning for model predictive control. InInternational Conference on Machine Learning, PMLR, 2022

work page 2022

[20] [20]

M. Heo, Y . Lee, D. Lee, and J. J. Lim. Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation.The International Journal of Robotics Research, 44(10-11):1863–1891, 2025

work page 2025

[21] [21]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[22] [22]

S. Hong, D. Yoon, and K.-E. Kim. Structure-aware transformer policy for inhomogeneous multi-task reinforcement learning. InInternational Conference on Learning Representations, 2021

work page 2021

[23] [23]

Huang, I

W. Huang, I. Mordatch, and D. Pathak. One policy to control them all: Shared modular policies for agent-agnostic control. InInternational Conference on Machine Learning, pages 4455–4464. PMLR, 2020

work page 2020

[24] [24]

M. A. Lee, Y . Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg. Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks. In2019 International conference on robotics and automation (ICRA), pages 8943–8950. IEEE, 2019

work page 2019

[25] [25]

H. Liu, S. Nasiriany, L. Zhang, Z. Bao, and Y . Zhu. Robot learning on the job: Human-in-the-loop autonomy and learning during deployment.The International Journal of Robotics Research, 44(10-11):1727–1742, 2025

work page 2025

[26] [26]

X. Long, Q. Zhao, K. Zhang, Z. Zhang, D. Wang, Y . Liu, Z. Shu, Y . Lu, S. Wang, X. Wei, et al. A survey: Learning embodied intelligence from physical simulators and world models.arXiv preprint arXiv:2507.00917, 2025

work page arXiv 2025

[27] [27]

Parthasarathy, N

A. Parthasarathy, N. Kalra, R. Agrawal, Y . LeCun, O. Bounou, P. Izmailov, and M. Goldblum. Closing the train-test gap in world models for gradient-based planning.arXiv preprint arXiv:2512.09929, 2025

work page arXiv 2025

[28] [28]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

work page 2019

[29] [29]

M. H. Raibert, H. B. Brown Jr, M. Chepponis, J. Koechling, and J. K. Hodgins. Dynamically stable legged locomotion. Technical report, 1989

work page 1989

[30] [30]

Sawhney, S

A. Sawhney, S. Lee, K. Zhang, M. Veloso, and O. Kroemer. Playing with food: Learning food item representations through interactive exploration. InInternational Symposium on Experimental Robotics, pages 309–322. Springer, 2020. 11 APREPRINT- MARCH17, 2026

work page 2020

[31] [31]

Saxena, M

S. Saxena, M. Sharma, and O. Kroemer. Multi-resolution sensing for real-time control with vision-language models. InConference on Robot Learning, pages 2210–2228. PMLR, 2023

work page 2023

[32] [32]

Schiavi, P

G. Schiavi, P. Wulkop, G. Rizzi, L. Ott, R. Siegwart, and J. J. Chung. Learning agent-aware affordances for closed- loop interaction with articulated objects. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5916–5922. IEEE, 2023

work page 2023

[33] [33]

A generalist dynamics model for control

I. Schubert, J. Zhang, J. Bruce, S. Bechtle, E. Parisotto, M. Riedmiller, J. T. Springenberg, A. Byravan, L. Hasen- clever, and N. Heess. A generalist dynamics model for control.arXiv preprint arXiv:2305.10912, 2023

work page arXiv 2023

[34] [34]

W. J. Schwind.Spring loaded inverted pendulum running: A plant model. University of Michigan, 1998

work page 1998

[35] [35]

Sekar, O

R. Sekar, O. Rybkin, K. Daniilidis, P. Abbeel, D. Hafner, and D. Pathak. Planning to explore via self-supervised world models. InInternational conference on machine learning, pages 8583–8592. PMLR, 2020

work page 2020

[36] [36]

Y . Tang, W. Yu, J. Tan, H. Zen, A. Faust, and T. Harada. Saytap: Language to quadrupedal locomotion. In7th Annual Conference on Robot Learning

work page

[37] [37]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. De Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, et al. Gymnasium: A standard interface for reinforcement learning environments.arXiv preprint arXiv:2407.17032, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

Vuong, S

Q. Vuong, S. Levine, H. R. Walke, K. Pertsch, A. Singh, R. Doshi, C. Xu, J. Luo, L. Tan, D. Shah, et al. Open x-embodiment: Robotic learning datasets and rt-x models. InTowards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition@ CoRL2023, 2023

work page 2023

[39] [39]

Y . Wang, H. Zhao, H. Lin, E. Xu, L. He, and H. Shao. A generalizable physics-enhanced state space model for long-term dynamics forecasting in complex environments. InF orty-second International Conference on Machine Learning, 2025

work page 2025

[40] [40]

S. Wei, X. Chen, F. Xie, G. E. Katz, Z. Gan, and L. Gan. Ms-ppo: Morphological-symmetry-equivariant policy for legged robot locomotion.arXiv preprint arXiv:2512.00727, 2025

work page arXiv 2025

[41] [41]

Y . Wen, Z. Wan, M. Zhou, S. Hou, Z. Cao, C. Le, J. Chen, Z. Tian, W. Zhang, and J. Wang. On realization of intelligent decision-making in the real world: A foundation decision model perspective.arXiv preprint arXiv:2212.12669, 2022

work page arXiv 2022

[42] [42]

Model Predictive Path Integral Control using Covariance Variable Importance Sampling

G. Williams, A. Aldrich, and E. Theodorou. Model predictive path integral control using covariance variable importance sampling.arXiv preprint arXiv:1509.01149, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[43] [43]

P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023

work page 2023

[44] [44]

F. Xie, S. Wei, Y . Song, Y . Yue, and L. Gan. Morphological-symmetry-equivariant heterogeneous graph neural network for robotic dynamics learning. In7th Annual Learning for Dynamics \& Control Conference, pages 1392–1405. PMLR, 2025

work page 2025

[45] [45]

Yarats, D

D. Yarats, D. Brandfonbrener, H. Liu, M. Laskin, P. Abbeel, A. Lazaric, and L. Pinto. Don’t change the algorithm, change the data: Exploratory data for offline reinforcement learning.arXiv preprint arXiv:2201.13425, 2022

work page arXiv 2022

[46] [46]

S. Yin, J. Wu, S. Huang, X. Su, X. He, J. HAO, and M. Long. Trajectory world models for heterogeneous environments. InF orty-second International Conference on Machine Learning, 2025

work page 2025

[47] [47]

Y . Zhou, S. Sonawani, M. Phielipp, H. Ben Amor, and S. Stepputtis. Learning modular language-conditioned robot policies through attention.Autonomous Robots, 47(8):1013–1033, 2023

work page 2023

[48] [48]

F. Zhu, H. Wu, S. Guo, Y . Liu, C. Cheang, and T. Kong. Irasim: A fine-grained world model for robot manipulation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9834–9844, 2025

work page 2025

[49] [49]

X. Zhu, R. Tian, C. Xu, M. Huo, W. Zhan, M. Tomizuka, and M. Ding. Fanuc manipulation: A dataset for learning-based manipulation with fanuc mate 200id robot, 2023. 12 APREPRINT- MARCH17, 2026 A Notations The table below summarizes the notation used in this paper. Lowercase letters (e.g., x) denote scalars, bold lowercase letters (e.g.,x) represent vectors...

work page 2023