Assessing Generalization in Deep Reinforcement Learning

Charles Packer , Katelyn Gao , Jernej Kos , Philipp Kr\"ahenb\"uhl , Vladlen Koltun , Dawn Song

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIstat.ML

keywords generalizationdeepalgorithmsevaluationexperimentalgeneralizelearningreinforcement

read the original abstract

Deep reinforcement learning (RL) has achieved breakthrough results on many tasks, but agents often fail to generalize beyond the environment they were trained in. As a result, deep RL algorithms that promote generalization are receiving increasing attention. However, works in this area use a wide variety of tasks and experimental setups for evaluation. The literature lacks a controlled assessment of the merits of different generalization schemes. Our aim is to catalyze community-wide progress on generalization in deep RL. To this end, we present a benchmark and experimental protocol, and conduct a systematic empirical study. Our framework contains a diverse set of environments, our methodology covers both in-distribution and out-of-distribution generalization, and our evaluation includes deep RL algorithms that specifically tackle generalization. Our key finding is that `vanilla' deep RL algorithms generalize better than specialized schemes that were proposed specifically to tackle generalization.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Why Does Agentic Safety Fail to Generalize Across Tasks?
cs.LG 2026-05 conditional novelty 6.0

Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstr...
Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms
cs.LG 2026-03 unverdicted novelty 6.0

A projection-based visualization of critic match loss landscapes that reveals optimization paths and stability characteristics in online actor-critic reinforcement learning.
Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters
cs.LG 2026-05 unverdicted novelty 4.0

SHAP analysis of RL algorithms and hyperparameters reveals consistent impact patterns that enable guided configuration selection for improved generalization in robotic environments.