Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

Akshay Krishnamurthy; Alekh Agarwal; David Abel; Fernando Diaz; Robert E. Schapire

arxiv: 1603.04119 · v1 · pith:HPVOUD3Qnew · submitted 2016-03-14 · 💻 cs.AI · cs.LG· stat.ML

Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

David Abel , Alekh Agarwal , Fernando Diaz , Akshay Krishnamurthy , Robert E. Schapire This is my paper

classification 💻 cs.AI cs.LGstat.ML

keywords functiontasksexplorationhigh-dimensionallearningtechniquesapproximatorbenchmarks

0 comments

read the original abstract

High-dimensional observations and complex real-world dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals. And second, we propose an exploration strategy inspired by the principles of state abstraction and information acquisition under uncertainty. We demonstrate the empirical effectiveness of these techniques, first, as a preliminary check, on two standard tasks (Blackjack and $n$-Chain), and then on two much larger and more realistic tasks with high-dimensional observation spaces. Specifically, we introduce two benchmarks built within the game Minecraft where the observations are pixel arrays of the agent's visual field. A combination of our two algorithmic techniques performs competitively on the standard reinforcement-learning tasks while consistently and substantially outperforming baselines on the two tasks with high-dimensional observation spaces. The new function approximator, exploration strategy, and evaluation benchmarks are each of independent interest in the pursuit of reinforcement-learning methods that scale to real-world domains.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
cs.AI 2024-08 conditional novelty 6.0

Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.
ORRB -- OpenAI Remote Rendering Backend
cs.GR 2019-06 unverdicted novelty 4.0

ORRB is an open-source remote rendering backend that pairs Unity3d with MuJoCo for high-throughput, customizable visual domain randomization in robotics environments.