Human-level control through deep reinforcement learning.nature, 518(7540):529–533

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al · 2015

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

On the Importance of Multistability for Horizon Generalization in Reinforcement Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Multistability is necessary for temporal horizon generalization in POMDPs, sufficient in simple tasks along with transient dynamics in complex ones, while monostable parallelizable RNNs like SSMs and gated linear RNNs fail by construction.

Towards Model-Free Learning in Dynamic Population Games: An Application to Karma Economies

cs.GT · 2026-05-11 · unverdicted · novelty 7.0

Model-free DQN learning achieves suboptimality bounds of O(1/sqrt(Ns)) + O(1/N) in Karma DPGs at equilibrium, and deep RL combined with fictitious play empirically reaches near-Stationary Nash Equilibrium from scratch.

Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.

Integrating Causal DAGs in Deep RL: Activating Minimal Markovian States with Multi-Order Exposure

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

A procedure builds provably minimal Markovian states from a longitudinal causal graph, but deep RL requires multi-order historical state exposure (MOSE) to realize gains over minimal or fixed-window baselines.

Quantile Geometry Regularization for Distributional Reinforcement Learning

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

RQIQN introduces a Wasserstein DRO-based correction to Bellman quantile targets that enlarges distributional spread without altering risk-neutral averages.

citing papers explorer

Showing 5 of 5 citing papers.

On the Importance of Multistability for Horizon Generalization in Reinforcement Learning cs.LG · 2026-05-12 · unverdicted · none · ref 30
Multistability is necessary for temporal horizon generalization in POMDPs, sufficient in simple tasks along with transient dynamics in complex ones, while monostable parallelizable RNNs like SSMs and gated linear RNNs fail by construction.
Towards Model-Free Learning in Dynamic Population Games: An Application to Karma Economies cs.GT · 2026-05-11 · unverdicted · none · ref 26
Model-free DQN learning achieves suboptimality bounds of O(1/sqrt(Ns)) + O(1/N) in Karma DPGs at equilibrium, and deep RL combined with fictitious play empirically reaches near-Stationary Nash Equilibrium from scratch.
Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners cs.AI · 2026-05-08 · unverdicted · none · ref 5
Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.
Integrating Causal DAGs in Deep RL: Activating Minimal Markovian States with Multi-Order Exposure cs.LG · 2026-05-08 · unverdicted · none · ref 11
A procedure builds provably minimal Markovian states from a longitudinal causal graph, but deep RL requires multi-order historical state exposure (MOSE) to realize gains over minimal or fixed-window baselines.
Quantile Geometry Regularization for Distributional Reinforcement Learning cs.LG · 2026-05-05 · unverdicted · none · ref 21
RQIQN introduces a Wasserstein DRO-based correction to Bellman quantile targets that enlarges distributional spread without altering risk-neutral averages.

Human-level control through deep reinforcement learning.nature, 518(7540):529–533

fields

years

verdicts

representative citing papers

citing papers explorer