pith. sign in

arxiv: 1811.06032 · v1 · pith:FTAU7XYSnew · submitted 2018-11-14 · 💻 cs.LG · cs.AI· stat.ML

Natural Environment Benchmarks for Reinforcement Learning

classification 💻 cs.LG cs.AIstat.ML
keywords learningalgorithmsbenchmarkdatadomainsnaturalreinforcementwhile
0
0 comments X
read the original abstract

While current benchmark reinforcement learning (RL) tasks have been useful to drive progress in the field, they are in many ways poor substitutes for learning with real-world data. By testing increasingly complex RL algorithms on low-complexity simulation environments, we often end up with brittle RL policies that generalize poorly beyond the very specific domain. To combat this, we propose three new families of benchmark RL domains that contain some of the complexity of the natural world, while still supporting fast and extensive data acquisition. The proposed domains also permit a characterization of generalization through fair train/test separation, and easy comparison and replication of results. Through this work, we challenge the RL research community to develop more robust algorithms that meet high standards of evaluation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation

    cs.LG 2026-05 unverdicted novelty 5.0

    The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.

  2. Optimal Control with Natural Images: Efficient Reinforcement Learning using Overcomplete Sparse Codes

    cs.LG 2024-12 unverdicted novelty 5.0

    Overcomplete sparse coding of natural images enables reinforcement learning to solve optimal control tasks orders of magnitude larger than with complete codes, via a new scalable benchmark and theoretical justification.