Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation

Ahmed Khalifa; Julian Togelius; Niels Justesen; Philip Bontrager; Ruben Rodriguez Torrado; Sebastian Risi

arxiv: 1806.10729 · v5 · pith:7BIMGNMHnew · submitted 2018-06-28 · 💻 cs.LG · cs.AI· stat.ML

Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation

Niels Justesen , Ruben Rodriguez Torrado , Philip Bontrager , Ahmed Khalifa , Julian Togelius , Sebastian Risi This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords levelslevellearningperformanceagentdeepenvironmentgenerality

0 comments

read the original abstract

Deep reinforcement learning (RL) has shown impressive results in a variety of domains, learning directly from high-dimensional sensory streams. However, when neural networks are trained in a fixed environment, such as a single level in a video game, they will usually overfit and fail to generalize to new levels. When RL models overfit, even slight modifications to the environment can result in poor agent performance. This paper explores how procedurally generated levels during training can increase generality. We show that for some games procedural level generation enables generalization to new levels within the same distribution. Additionally, it is possible to achieve better performance with less data by manipulating the difficulty of the levels in response to the performance of the agent. The generality of the learned behaviors is also evaluated on a set of human-designed levels. The results suggest that the ability to generalize to human-designed levels highly depends on the design of the level generators. We apply dimensionality reduction and clustering techniques to visualize the generators' distributions of levels and analyze to what degree they can produce levels similar to those designed by a human.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation
cs.LG 2026-05 unverdicted novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
On the Measure of Intelligence
cs.AI 2019-11 unverdicted novelty 7.0

Intelligence is skill-acquisition efficiency, and the ARC benchmark measures human-like general fluid intelligence by testing abstraction and reasoning with minimal, innate-like priors.
Procedural Generation of Initial States of Sokoban
cs.AI 2019-07 unverdicted novelty 6.0

Beta generates Sokoban initial states harder for a specialized solver than human-designed ones by using pattern database heuristics and novelty.
The Rise and Potential of Large Language Model Based Agents: A Survey
cs.AI 2023-09 accept novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.