Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation

Grzegorz Malczyk; Kostas Alexis; Luca Zanatta

arxiv: 2606.05015 · v1 · pith:M3NSPDQQnew · submitted 2026-06-03 · 💻 cs.RO

Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation

Luca Zanatta , Grzegorz Malczyk , Kostas Alexis This is my paper

Pith reviewed 2026-06-28 05:49 UTC · model grok-4.3

classification 💻 cs.RO

keywords world modelssim-to-real transferquadrotor navigationreinforcement learningself-supervised learningDreamerV3environmental variability

0 comments

The pith

World model robustness in self-supervised pretraining predicts successful sim-to-real quadrotor navigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether world models trained with DreamerV3 generalize across different levels of simulated environmental randomness for vision-based quadrotor navigation. It measures this generalization through cross-environment validation during both SSL pretraining and RL fine-tuning, then deploys the models on a physical quadrotor in unseen real settings, including an open-loop mode that relies on imagination after only 2.5 seconds of real sensor input. Results indicate that strong SSL generalization across simulated environments reliably forecasts real-world success, such as traversing narrow gaps, while simulation-only performance does not. The study identifies latent size and training sequence length as key factors controlling model quality under variability.

Core claim

World model robustness during SSL pretraining is a strong predictor of sim-to-real transfer: every model that generalized well in cross-environment SSL validation deployed successfully in the real world, passing through gaps as narrow as 0.67m, whereas the model that dominated simulation policy evaluation failed on the real platform. The discrete latent size and the training-sequence length are the dominant factors governing world model quality.

What carries the argument

Cross-environment validation of DreamerV3 world models, spanning SSL pretraining and RL fine-tuning under discrete levels of environmental randomness.

If this is right

SSL pretraining robustness can serve as an early filter for selecting world models likely to transfer to hardware.
Increasing discrete latent size or training-sequence length improves world model quality under environmental variability.
Open-loop navigation over 12m is feasible when the world model receives limited real sensory input before relying on internal predictions.
Policy performance in simulation alone does not guarantee real-world deployment success for these navigation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Prioritizing SSL cross-environment checks during training could shorten the iteration cycle for sim-to-real robotics projects that use generative world models.
Extending the approach to continuous rather than discrete randomness schedules might produce a more graded predictor of transfer success.
The same SSL robustness signal could be tested as a predictor for other robot platforms or sensing modalities beyond quadrotors.

Load-bearing premise

The discrete levels of environmental randomness used in simulation are representative of the variability that will be encountered during real-world quadrotor deployments.

What would settle it

A world model that performs poorly on cross-environment SSL validation yet still succeeds in real-world deployment, including the 12m open-loop traverse, would falsify the claimed predictive relationship.

Figures

Figures reproduced from arXiv: 2606.05015 by Grzegorz Malczyk, Kostas Alexis, Luca Zanatta.

**Figure 2.** Figure 2: Method overview. First, the data is collected per environmental randomness level. Next, the DreamerV3 world model is trained by self-supervised reconstruction. Finally, an actor and critic are trained in imagination with world-model fine-tuning: after a real-observation context (green), the model rolls out in pure imagination (orange), hallucinating future observations. 2 Problem Formulation This work inve… view at source ↗

**Figure 3.** Figure 3: Hyperparameter sweep results: each cell shows [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Cross-environment validation heatmap (MSE [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: RL imagination cross-environment validation (MSE [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Real-world deployment: closed-loop (a) and pure-imagination open-loop (b) navigation. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Hyperparameter sweep with full learning curves. Each cell shows all runs for a given [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative illustration of MSE and SSIM as complementary reconstruction metrics. Each [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Temporal evolution of imagined depth observations during open-loop rollout. The last [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: Closed-loop real-world deployment with 5 cuboid obstacles. Top-down trajectories of all [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

read the original abstract

World models, learned generative models that predict how an environment evolves, have become a promising tool for sample-efficient robot learning. Yet how robust they are to environmental variability remains poorly understood. To address this, we conduct a systematic study using vision-based quadrotor navigation as a testbed problem, training DreamerV3-based world models under varying levels of environmental randomness and evaluating them across all levels through cross-environment validation, spanning both Self-Supervised Learning (SSL) pretraining and Reinforcement Learning (RL) fine-tuning. We then deploy all world models and associated navigation policies on a real quadrotor in unseen environments, including an open-loop run where the model receives just 2.5s of real sensory input before all sensors are cut off, leaving the system to navigate entirely in imagination over a 12m traverse. Our results show that world model robustness during SSL pretraining is a strong predictor of sim-to-real transfer: every model that generalized well in cross-environment SSL validation deployed successfully in the real world, passing through gaps as narrow as 0.67m, whereas the model that dominated simulation policy evaluation failed on the real platform. We further identify (a) the discrete latent size and (b) the training-sequence length as the dominant factors governing world model quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SSL cross-environment robustness in DreamerV3 predicts real quadrotor success here, but the abstract gives no numbers or stats so the strength of that link is still unclear.

read the letter

The core observation is that models whose world-model pretraining generalized across simulated environment levels also succeeded on the real quadrotor, including a 12 m open-loop traverse after only 2.5 s of real input, while the best sim-policy model did not. Latent size and training-sequence length are flagged as the main drivers of that pretraining robustness.

What the work actually does is run a controlled cross-validation of DreamerV3 variants on vision-based gap navigation, then move the same models and policies to hardware in unseen settings. The real-robot test with sensor blackout is a concrete step beyond pure simulation, and the fact that every SSL-robust model cleared 0.67 m gaps while one sim leader failed is a useful data point for anyone choosing world models for deployment.

The soft spot is the missing quantitative backbone. The abstract states the correlation and names the two factors but supplies no success rates, trial counts, variance, or statistical tests, so it is impossible to judge how tight the predictor actually is or whether the discrete randomness levels used in sim are close enough to real sensor noise, lighting, and wind to make the predictor reliable. The stress-test concern about variability mismatch therefore lands; without that matching or additional controls the claim rests on a small set of conditions.

This is the kind of incremental but practical study that matters to groups already running Dreamer-style pipelines on mobile robots. A reader who needs a quick filter before hardware trials will get value from the reported pattern, even if the paper needs tighter metrics and more environments to stand on its own.

I would send it to review once the full text supplies the missing counts and any ablation on the randomness schedule; the experimental skeleton is there and the real-robot result is worth referee scrutiny.

Referee Report

2 major / 1 minor

Summary. The paper claims that robustness of DreamerV3 world models during SSL pretraining, assessed via cross-environment validation across discrete levels of environmental randomness, is a strong predictor of sim-to-real transfer success in vision-based quadrotor navigation. Models showing good SSL generalization succeed on real hardware (including 0.67m gap traversal and 12m open-loop imagination navigation after 2.5s real input), while a model dominant in simulation policy evaluation fails in reality. Discrete latent size and training-sequence length are identified as the dominant factors governing model quality.

Significance. If the result holds, the work provides an empirical selection criterion for world models that is likely to transfer, reducing the need for exhaustive real-world testing in sample-efficient robot learning. The inclusion of hardware deployment with challenging narrow-gap and open-loop scenarios adds practical value for vision-based navigation tasks.

major comments (2)

[Abstract] Abstract: The central claim that SSL cross-environment generalization is a 'strong predictor' of sim-to-real success is load-bearing for the contribution, yet the abstract (and by extension the reported results) provides no quantitative metrics, trial counts, success rates, error bars, or statistical tests to support the correlation or the statement that 'every model that generalized well... deployed successfully.'
[Abstract] Abstract: The predictor's validity rests on the assumption that the discrete levels of environmental randomness used in simulation are representative of real-world variability. No quantitative matching is described between simulated factors and actual deployment conditions (sensor noise distributions, lighting, wind, texture), so the successful narrow-gap traversals could be explained by test-specific conditions rather than the claimed SSL robustness factor.

minor comments (1)

[Abstract] The abstract would be strengthened by briefly stating the number of models evaluated and the range of environmental randomness levels tested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight opportunities to strengthen the presentation of our central claims. We respond to each major comment below and indicate where revisions will be made.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that SSL cross-environment generalization is a 'strong predictor' of sim-to-real success is load-bearing for the contribution, yet the abstract (and by extension the reported results) provides no quantitative metrics, trial counts, success rates, error bars, or statistical tests to support the correlation or the statement that 'every model that generalized well... deployed successfully.'

Authors: We agree that the abstract would benefit from explicit quantitative support for the predictor claim. The manuscript body details multiple world models evaluated via cross-environment SSL validation and reports their real-world deployment outcomes, including the narrow-gap and open-loop scenarios. We will revise the abstract to reference the number of models tested, the observed success rates in hardware, and the trial counts underlying the statement that models with strong SSL generalization deployed successfully. revision: yes
Referee: [Abstract] Abstract: The predictor's validity rests on the assumption that the discrete levels of environmental randomness used in simulation are representative of real-world variability. No quantitative matching is described between simulated factors and actual deployment conditions (sensor noise distributions, lighting, wind, texture), so the successful narrow-gap traversals could be explained by test-specific conditions rather than the claimed SSL robustness factor.

Authors: The discrete randomness levels were selected to induce a controlled spectrum of variability during pretraining rather than to replicate exact real-world distributions. The primary evidence for the predictor is the empirical outcome of real-world deployments in previously unseen conditions, where SSL-robust models succeeded on tasks such as 0.67 m gap traversal and 12 m open-loop navigation. We acknowledge that no explicit quantitative distribution matching (e.g., sensor noise or lighting statistics) is provided and will add a clarifying paragraph in the discussion section explaining the design rationale and the role of real-world validation. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical cross-validation and hardware results are independent of inputs

full rationale

The paper conducts an empirical study by training DreamerV3 world models at discrete environmental randomness levels, running cross-environment SSL and RL validation, and performing real-world quadrotor deployments including open-loop imagination-based navigation. The claim that SSL generalization predicts sim-to-real success is grounded in observed experimental outcomes (successful 0.67m gap traversals for robust models, failure of the sim-dominant model) rather than any equation, fitted parameter, or self-citation that reduces the predictor to a definition or forces it by construction. No load-bearing steps match the enumerated circularity patterns; the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, ad-hoc axioms, or new entities; the study inherits standard assumptions of DreamerV3 and RL training.

axioms (1)

domain assumption Standard assumptions of DreamerV3 world-model training and reinforcement-learning algorithms hold without modification
The study deploys DreamerV3 as the base model and reports results under its training regime.

pith-pipeline@v0.9.1-grok · 5762 in / 1132 out tokens · 32953 ms · 2026-06-28T05:49:04.959112+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 2 linked inside Pith

[1]

Ha and J

D. Ha and J. Schmidhuber. Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018

2018
[2]

Janner, J

M. Janner, J. Fu, M. Zhang, and S. Levine. When to trust your model: Model-based policy optimization.Advances in neural information processing systems, 32, 2019

2019
[3]

Hafner, T

D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson. Learning latent dynamics for planning from pixels. InInternational conference on machine learning, pages 2555–2565. PMLR, 2019

2019
[4]

Hafner, T

D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1lOTC4tDS

2020
[5]

Hafner, T

D. Hafner, T. P. Lillicrap, M. Norouzi, and J. Ba. Mastering atari with discrete world models. In International Conference on Learning Representations, 2021. URLhttps://openreview. net/forum?id=0oabwyZbOu

2021
[6]

Hafner, J

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap. Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025

2025
[7]

Aljalbout, M

E. Aljalbout, M. Krinner, A. Romero, and D. Scaramuzza. Accelerating model-based rein- forcement learning with state-space world models. InICLR 2025 Workshop on World Models: Understanding, Modelling and Scaling, 2025

2025
[8]

L. Maes, Q. L. Lidec, D. Scieur, Y . LeCun, and R. Balestriero. Leworldmodel: Stable end- to-end joint-embedding predictive architecture from pixels.arXiv preprint arXiv:2603.19312, 2026

Pith/arXiv arXiv 2026
[9]

Schrittwieser, I

J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lock- hart, D. Hassabis, T. Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

2020
[10]

P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023

2023
[11]

Hansen, H

N. Hansen, H. Su, and X. Wang. Td-mpc2: Scalable, robust world models for continuous control. InInternational Conference on Learning Representations, volume 2024, pages 47376– 47405, 2024

2024
[12]

A. Bar, G. Zhou, D. Tran, T. Darrell, and Y . LeCun. Navigation world models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15791–15801, 2025

2025
[13]

C. Li, A. Krause, and M. Hutter. Robotic world model: A neural network simulator for robust policy optimization in robotics. InNeurIPS 2025 Workshop on Embodied World Models for Decision Making, 2025. URLhttps://openreview.net/forum?id=u76d3gBWCX

2025
[14]

Romero, A

A. Romero, A. Shenai, I. Geles, E. Aljalbout, and D. Scaramuzza. Dream to fly: Model-based reinforcement learning for vision-based drone flight.arXiv preprint arXiv:2501.14377, 2025. 9

Pith/arXiv arXiv 2025
[15]

Verraest, S

A. Verraest, S. Bahnam, R. Ferede, G. de Croon, and C. De Wagter. Skydreamer: Interpretable end-to-end vision-based drone racing with model-based reinforcement learning.arXiv preprint arXiv:2510.14783, 2025

arXiv 2025
[16]

Geles, L

I. Geles, L. Bauersfeld, A. Romero, J. Xing, and D. Scaramuzza. Demonstrating agile flight from pixels without state estimation. InRobotics: Science and Systems (RSS), 2024

2024
[17]

Kulkarni, W

M. Kulkarni, W. Rehberg, and K. Alexis. Aerial gym simulator: A framework for highly parallelized simulation of aerial robots.IEEE Robotics and Automation Letters, 2025

2025
[18]

I. M. Sobol. Distribution of points in a cube and approximate evaluation of integrals.USSR Computational mathematics and mathematical physics, 7:86–112, 1967

1967
[19]

G. E. Uhlenbeck and L. S. Ornstein. On the theory of the brownian motion.Physical review, 36(5):823, 1930

1930
[20]

T. Lee, M. Leok, and N. H. McClamroch. Geometric tracking control of a quadrotor uav on se (3). In49th IEEE conference on decision and control (CDC), pages 5420–5425. IEEE, 2010

2010
[21]

Tobin, R

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017

2017
[22]

more random

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automation (ICRA), pages 3803–3810. IEEE, 2018. 10 Appendix A Design of the Environmental Randomness Levels The choice of environmental randomness levels in Section 3.1 is importa...

2018

[1] [1]

Ha and J

D. Ha and J. Schmidhuber. Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018

2018

[2] [2]

Janner, J

M. Janner, J. Fu, M. Zhang, and S. Levine. When to trust your model: Model-based policy optimization.Advances in neural information processing systems, 32, 2019

2019

[3] [3]

Hafner, T

D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson. Learning latent dynamics for planning from pixels. InInternational conference on machine learning, pages 2555–2565. PMLR, 2019

2019

[4] [4]

Hafner, T

D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1lOTC4tDS

2020

[5] [5]

Hafner, T

D. Hafner, T. P. Lillicrap, M. Norouzi, and J. Ba. Mastering atari with discrete world models. In International Conference on Learning Representations, 2021. URLhttps://openreview. net/forum?id=0oabwyZbOu

2021

[6] [6]

Hafner, J

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap. Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025

2025

[7] [7]

Aljalbout, M

E. Aljalbout, M. Krinner, A. Romero, and D. Scaramuzza. Accelerating model-based rein- forcement learning with state-space world models. InICLR 2025 Workshop on World Models: Understanding, Modelling and Scaling, 2025

2025

[8] [8]

L. Maes, Q. L. Lidec, D. Scieur, Y . LeCun, and R. Balestriero. Leworldmodel: Stable end- to-end joint-embedding predictive architecture from pixels.arXiv preprint arXiv:2603.19312, 2026

Pith/arXiv arXiv 2026

[9] [9]

Schrittwieser, I

J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lock- hart, D. Hassabis, T. Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

2020

[10] [10]

P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023

2023

[11] [11]

Hansen, H

N. Hansen, H. Su, and X. Wang. Td-mpc2: Scalable, robust world models for continuous control. InInternational Conference on Learning Representations, volume 2024, pages 47376– 47405, 2024

2024

[12] [12]

A. Bar, G. Zhou, D. Tran, T. Darrell, and Y . LeCun. Navigation world models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15791–15801, 2025

2025

[13] [13]

C. Li, A. Krause, and M. Hutter. Robotic world model: A neural network simulator for robust policy optimization in robotics. InNeurIPS 2025 Workshop on Embodied World Models for Decision Making, 2025. URLhttps://openreview.net/forum?id=u76d3gBWCX

2025

[14] [14]

Romero, A

A. Romero, A. Shenai, I. Geles, E. Aljalbout, and D. Scaramuzza. Dream to fly: Model-based reinforcement learning for vision-based drone flight.arXiv preprint arXiv:2501.14377, 2025. 9

Pith/arXiv arXiv 2025

[15] [15]

Verraest, S

A. Verraest, S. Bahnam, R. Ferede, G. de Croon, and C. De Wagter. Skydreamer: Interpretable end-to-end vision-based drone racing with model-based reinforcement learning.arXiv preprint arXiv:2510.14783, 2025

arXiv 2025

[16] [16]

Geles, L

I. Geles, L. Bauersfeld, A. Romero, J. Xing, and D. Scaramuzza. Demonstrating agile flight from pixels without state estimation. InRobotics: Science and Systems (RSS), 2024

2024

[17] [17]

Kulkarni, W

M. Kulkarni, W. Rehberg, and K. Alexis. Aerial gym simulator: A framework for highly parallelized simulation of aerial robots.IEEE Robotics and Automation Letters, 2025

2025

[18] [18]

I. M. Sobol. Distribution of points in a cube and approximate evaluation of integrals.USSR Computational mathematics and mathematical physics, 7:86–112, 1967

1967

[19] [19]

G. E. Uhlenbeck and L. S. Ornstein. On the theory of the brownian motion.Physical review, 36(5):823, 1930

1930

[20] [20]

T. Lee, M. Leok, and N. H. McClamroch. Geometric tracking control of a quadrotor uav on se (3). In49th IEEE conference on decision and control (CDC), pages 5420–5425. IEEE, 2010

2010

[21] [21]

Tobin, R

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017

2017

[22] [22]

more random

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automation (ICRA), pages 3803–3810. IEEE, 2018. 10 Appendix A Design of the Environmental Randomness Levels The choice of environmental randomness levels in Section 3.1 is importa...

2018