Advances in Neural Information Processing Systems , year=

Deep Reinforcement Learning at the Edge of the Statistical Precipice , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Behavior-Consistent Deep Reinforcement Learning

cs.LG · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.

TD-MPC2: Scalable, Robust World Models for Continuous Control

cs.LG · 2023-10-25 · conditional · novelty 6.0

TD-MPC2 scales an implicit world-model RL method to a 317M-parameter agent that masters 80 tasks across four domains with a single hyperparameter configuration.

For How Long Should We Be Punching? Learning Action Duration in Fighting Games

cs.AI · 2026-05-20 · unverdicted · novelty 5.0

RL agents in fighting games learn to jointly predict actions and their durations, matching fixed frame-skip performance while favoring repeatable exploitative patterns against scripted bots.

citing papers explorer

Showing 4 of 4 citing papers.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling cs.LG · 2026-05-14 · unverdicted · none · ref 92
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Behavior-Consistent Deep Reinforcement Learning cs.LG · 2026-05-20 · unverdicted · none · ref 219 · 2 links
QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
TD-MPC2: Scalable, Robust World Models for Continuous Control cs.LG · 2023-10-25 · conditional · none · ref 147
TD-MPC2 scales an implicit world-model RL method to a 317M-parameter agent that masters 80 tasks across four domains with a single hyperparameter configuration.
For How Long Should We Be Punching? Learning Action Duration in Fighting Games cs.AI · 2026-05-20 · unverdicted · none · ref 14
RL agents in fighting games learn to jointly predict actions and their durations, matching fixed frame-skip performance while favoring repeatable exploitative patterns against scripted bots.

Advances in Neural Information Processing Systems , year=

fields

years

verdicts

representative citing papers

citing papers explorer