pith. machine review for the scientific record. sign in

arxiv: 1809.01999 · v1 · submitted 2018-09-04 · 💻 cs.LG · stat.ML

Recognition: unknown

Recurrent World Models Facilitate Policy Evolution

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords modelworldenvironmentenvironmentsevolutionpolicyrecurrenttrained
0
0 comments X
read the original abstract

A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model's extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an environment generated by its own internal world model, and transfer this policy back into the actual environment. Interactive version of paper at https://worldmodels.github.io

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Simulating clinical interventions with a generative multimodal model of human physiology

    cs.AI 2026-04 unverdicted novelty 7.0

    HealthFormer is a generative multimodal transformer that forecasts individual physiological trajectories and simulates clinical interventions, outperforming clinical risk scores on disease prediction and matching tria...

  2. Grounded World Model for Semantically Generalizable Planning

    cs.RO 2026-04 conditional novelty 6.0

    A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.

  3. Safety, Security, and Cognitive Risks in World Models

    cs.CR 2026-04 unverdicted novelty 6.0

    World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and D...