Recurrent World Models Facilitate Policy Evolution

David Ha , J\"urgen Schmidhuber

Authors on Pith no claims yet

classification 💻 cs.LG stat.ML

keywords modelworldenvironmentenvironmentsevolutionpolicyrecurrenttrained

read the original abstract

A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model's extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an environment generated by its own internal world model, and transfer this policy back into the actual environment. Interactive version of paper at https://worldmodels.github.io

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Simulating clinical interventions with a generative multimodal model of human physiology
cs.AI 2026-04 unverdicted novelty 7.0

HealthFormer is a generative multimodal transformer that forecasts individual physiological trajectories and simulates clinical interventions, outperforming clinical risk scores on disease prediction and matching tria...
Grounded World Model for Semantically Generalizable Planning
cs.RO 2026-04 conditional novelty 6.0

A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.
Safety, Security, and Cognitive Risks in World Models
cs.CR 2026-04 unverdicted novelty 6.0

World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and D...