pith. sign in

arxiv: 2606.05015 · v1 · pith:M3NSPDQQnew · submitted 2026-06-03 · 💻 cs.RO

Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation

Pith reviewed 2026-06-28 05:49 UTC · model grok-4.3

classification 💻 cs.RO
keywords world modelssim-to-real transferquadrotor navigationreinforcement learningself-supervised learningDreamerV3environmental variability
0
0 comments X

The pith

World model robustness in self-supervised pretraining predicts successful sim-to-real quadrotor navigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether world models trained with DreamerV3 generalize across different levels of simulated environmental randomness for vision-based quadrotor navigation. It measures this generalization through cross-environment validation during both SSL pretraining and RL fine-tuning, then deploys the models on a physical quadrotor in unseen real settings, including an open-loop mode that relies on imagination after only 2.5 seconds of real sensor input. Results indicate that strong SSL generalization across simulated environments reliably forecasts real-world success, such as traversing narrow gaps, while simulation-only performance does not. The study identifies latent size and training sequence length as key factors controlling model quality under variability.

Core claim

World model robustness during SSL pretraining is a strong predictor of sim-to-real transfer: every model that generalized well in cross-environment SSL validation deployed successfully in the real world, passing through gaps as narrow as 0.67m, whereas the model that dominated simulation policy evaluation failed on the real platform. The discrete latent size and the training-sequence length are the dominant factors governing world model quality.

What carries the argument

Cross-environment validation of DreamerV3 world models, spanning SSL pretraining and RL fine-tuning under discrete levels of environmental randomness.

If this is right

  • SSL pretraining robustness can serve as an early filter for selecting world models likely to transfer to hardware.
  • Increasing discrete latent size or training-sequence length improves world model quality under environmental variability.
  • Open-loop navigation over 12m is feasible when the world model receives limited real sensory input before relying on internal predictions.
  • Policy performance in simulation alone does not guarantee real-world deployment success for these navigation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Prioritizing SSL cross-environment checks during training could shorten the iteration cycle for sim-to-real robotics projects that use generative world models.
  • Extending the approach to continuous rather than discrete randomness schedules might produce a more graded predictor of transfer success.
  • The same SSL robustness signal could be tested as a predictor for other robot platforms or sensing modalities beyond quadrotors.

Load-bearing premise

The discrete levels of environmental randomness used in simulation are representative of the variability that will be encountered during real-world quadrotor deployments.

What would settle it

A world model that performs poorly on cross-environment SSL validation yet still succeeds in real-world deployment, including the 12m open-loop traverse, would falsify the claimed predictive relationship.

Figures

Figures reproduced from arXiv: 2606.05015 by Grzegorz Malczyk, Kostas Alexis, Luca Zanatta.

Figure 1
Figure 1. Figure 1: From simulation to real-world closed- and open-loop deployment. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Method overview. First, the data is collected per environmental randomness level. Next, the DreamerV3 world model is trained by self-supervised reconstruction. Finally, an actor and critic are trained in imagination with world-model fine-tuning: after a real-observation context (green), the model rolls out in pure imagination (orange), hallucinating future observations. 2 Problem Formulation This work inve… view at source ↗
Figure 3
Figure 3. Figure 3: Hyperparameter sweep results: each cell shows [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Cross-environment validation heatmap (MSE [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: RL imagination cross-environment validation (MSE [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Real-world deployment: closed-loop (a) and pure-imagination open-loop (b) navigation. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Hyperparameter sweep with full learning curves. Each cell shows all runs for a given [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative illustration of MSE and SSIM as complementary reconstruction metrics. Each [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Temporal evolution of imagined depth observations during open-loop rollout. The last [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Closed-loop real-world deployment with 5 cuboid obstacles. Top-down trajectories of all [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
read the original abstract

World models, learned generative models that predict how an environment evolves, have become a promising tool for sample-efficient robot learning. Yet how robust they are to environmental variability remains poorly understood. To address this, we conduct a systematic study using vision-based quadrotor navigation as a testbed problem, training DreamerV3-based world models under varying levels of environmental randomness and evaluating them across all levels through cross-environment validation, spanning both Self-Supervised Learning (SSL) pretraining and Reinforcement Learning (RL) fine-tuning. We then deploy all world models and associated navigation policies on a real quadrotor in unseen environments, including an open-loop run where the model receives just 2.5s of real sensory input before all sensors are cut off, leaving the system to navigate entirely in imagination over a 12m traverse. Our results show that world model robustness during SSL pretraining is a strong predictor of sim-to-real transfer: every model that generalized well in cross-environment SSL validation deployed successfully in the real world, passing through gaps as narrow as 0.67m, whereas the model that dominated simulation policy evaluation failed on the real platform. We further identify (a) the discrete latent size and (b) the training-sequence length as the dominant factors governing world model quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that robustness of DreamerV3 world models during SSL pretraining, assessed via cross-environment validation across discrete levels of environmental randomness, is a strong predictor of sim-to-real transfer success in vision-based quadrotor navigation. Models showing good SSL generalization succeed on real hardware (including 0.67m gap traversal and 12m open-loop imagination navigation after 2.5s real input), while a model dominant in simulation policy evaluation fails in reality. Discrete latent size and training-sequence length are identified as the dominant factors governing model quality.

Significance. If the result holds, the work provides an empirical selection criterion for world models that is likely to transfer, reducing the need for exhaustive real-world testing in sample-efficient robot learning. The inclusion of hardware deployment with challenging narrow-gap and open-loop scenarios adds practical value for vision-based navigation tasks.

major comments (2)
  1. [Abstract] Abstract: The central claim that SSL cross-environment generalization is a 'strong predictor' of sim-to-real success is load-bearing for the contribution, yet the abstract (and by extension the reported results) provides no quantitative metrics, trial counts, success rates, error bars, or statistical tests to support the correlation or the statement that 'every model that generalized well... deployed successfully.'
  2. [Abstract] Abstract: The predictor's validity rests on the assumption that the discrete levels of environmental randomness used in simulation are representative of real-world variability. No quantitative matching is described between simulated factors and actual deployment conditions (sensor noise distributions, lighting, wind, texture), so the successful narrow-gap traversals could be explained by test-specific conditions rather than the claimed SSL robustness factor.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by briefly stating the number of models evaluated and the range of environmental randomness levels tested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight opportunities to strengthen the presentation of our central claims. We respond to each major comment below and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that SSL cross-environment generalization is a 'strong predictor' of sim-to-real success is load-bearing for the contribution, yet the abstract (and by extension the reported results) provides no quantitative metrics, trial counts, success rates, error bars, or statistical tests to support the correlation or the statement that 'every model that generalized well... deployed successfully.'

    Authors: We agree that the abstract would benefit from explicit quantitative support for the predictor claim. The manuscript body details multiple world models evaluated via cross-environment SSL validation and reports their real-world deployment outcomes, including the narrow-gap and open-loop scenarios. We will revise the abstract to reference the number of models tested, the observed success rates in hardware, and the trial counts underlying the statement that models with strong SSL generalization deployed successfully. revision: yes

  2. Referee: [Abstract] Abstract: The predictor's validity rests on the assumption that the discrete levels of environmental randomness used in simulation are representative of real-world variability. No quantitative matching is described between simulated factors and actual deployment conditions (sensor noise distributions, lighting, wind, texture), so the successful narrow-gap traversals could be explained by test-specific conditions rather than the claimed SSL robustness factor.

    Authors: The discrete randomness levels were selected to induce a controlled spectrum of variability during pretraining rather than to replicate exact real-world distributions. The primary evidence for the predictor is the empirical outcome of real-world deployments in previously unseen conditions, where SSL-robust models succeeded on tasks such as 0.67 m gap traversal and 12 m open-loop navigation. We acknowledge that no explicit quantitative distribution matching (e.g., sensor noise or lighting statistics) is provided and will add a clarifying paragraph in the discussion section explaining the design rationale and the role of real-world validation. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical cross-validation and hardware results are independent of inputs

full rationale

The paper conducts an empirical study by training DreamerV3 world models at discrete environmental randomness levels, running cross-environment SSL and RL validation, and performing real-world quadrotor deployments including open-loop imagination-based navigation. The claim that SSL generalization predicts sim-to-real success is grounded in observed experimental outcomes (successful 0.67m gap traversals for robust models, failure of the sim-dominant model) rather than any equation, fitted parameter, or self-citation that reduces the predictor to a definition or forces it by construction. No load-bearing steps match the enumerated circularity patterns; the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, ad-hoc axioms, or new entities; the study inherits standard assumptions of DreamerV3 and RL training.

axioms (1)
  • domain assumption Standard assumptions of DreamerV3 world-model training and reinforcement-learning algorithms hold without modification
    The study deploys DreamerV3 as the base model and reports results under its training regime.

pith-pipeline@v0.9.1-grok · 5762 in / 1132 out tokens · 32953 ms · 2026-06-28T05:49:04.959112+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 2 linked inside Pith

  1. [1]

    Ha and J

    D. Ha and J. Schmidhuber. Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018

  2. [2]

    Janner, J

    M. Janner, J. Fu, M. Zhang, and S. Levine. When to trust your model: Model-based policy optimization.Advances in neural information processing systems, 32, 2019

  3. [3]

    Hafner, T

    D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson. Learning latent dynamics for planning from pixels. InInternational conference on machine learning, pages 2555–2565. PMLR, 2019

  4. [4]

    Hafner, T

    D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1lOTC4tDS

  5. [5]

    Hafner, T

    D. Hafner, T. P. Lillicrap, M. Norouzi, and J. Ba. Mastering atari with discrete world models. In International Conference on Learning Representations, 2021. URLhttps://openreview. net/forum?id=0oabwyZbOu

  6. [6]

    Hafner, J

    D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap. Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025

  7. [7]

    Aljalbout, M

    E. Aljalbout, M. Krinner, A. Romero, and D. Scaramuzza. Accelerating model-based rein- forcement learning with state-space world models. InICLR 2025 Workshop on World Models: Understanding, Modelling and Scaling, 2025

  8. [8]

    L. Maes, Q. L. Lidec, D. Scieur, Y . LeCun, and R. Balestriero. Leworldmodel: Stable end- to-end joint-embedding predictive architecture from pixels.arXiv preprint arXiv:2603.19312, 2026

  9. [9]

    Schrittwieser, I

    J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lock- hart, D. Hassabis, T. Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

  10. [10]

    P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023

  11. [11]

    Hansen, H

    N. Hansen, H. Su, and X. Wang. Td-mpc2: Scalable, robust world models for continuous control. InInternational Conference on Learning Representations, volume 2024, pages 47376– 47405, 2024

  12. [12]

    A. Bar, G. Zhou, D. Tran, T. Darrell, and Y . LeCun. Navigation world models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15791–15801, 2025

  13. [13]

    C. Li, A. Krause, and M. Hutter. Robotic world model: A neural network simulator for robust policy optimization in robotics. InNeurIPS 2025 Workshop on Embodied World Models for Decision Making, 2025. URLhttps://openreview.net/forum?id=u76d3gBWCX

  14. [14]

    Romero, A

    A. Romero, A. Shenai, I. Geles, E. Aljalbout, and D. Scaramuzza. Dream to fly: Model-based reinforcement learning for vision-based drone flight.arXiv preprint arXiv:2501.14377, 2025. 9

  15. [15]

    Verraest, S

    A. Verraest, S. Bahnam, R. Ferede, G. de Croon, and C. De Wagter. Skydreamer: Interpretable end-to-end vision-based drone racing with model-based reinforcement learning.arXiv preprint arXiv:2510.14783, 2025

  16. [16]

    Geles, L

    I. Geles, L. Bauersfeld, A. Romero, J. Xing, and D. Scaramuzza. Demonstrating agile flight from pixels without state estimation. InRobotics: Science and Systems (RSS), 2024

  17. [17]

    Kulkarni, W

    M. Kulkarni, W. Rehberg, and K. Alexis. Aerial gym simulator: A framework for highly parallelized simulation of aerial robots.IEEE Robotics and Automation Letters, 2025

  18. [18]

    I. M. Sobol. Distribution of points in a cube and approximate evaluation of integrals.USSR Computational mathematics and mathematical physics, 7:86–112, 1967

  19. [19]

    G. E. Uhlenbeck and L. S. Ornstein. On the theory of the brownian motion.Physical review, 36(5):823, 1930

  20. [20]

    T. Lee, M. Leok, and N. H. McClamroch. Geometric tracking control of a quadrotor uav on se (3). In49th IEEE conference on decision and control (CDC), pages 5420–5425. IEEE, 2010

  21. [21]

    Tobin, R

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017

  22. [22]

    more random

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automation (ICRA), pages 3803–3810. IEEE, 2018. 10 Appendix A Design of the Environmental Randomness Levels The choice of environmental randomness levels in Section 3.1 is importa...