super hub Mixed citations

Human-level control through deep reinforcement learning

Alex Graves, Andreas K. Fidjeland, Andrei A. Rusu, Charles Beattie, David Silver, Georg Ostrovski + 2 more · 2015 · Nature · DOI 10.1038/nature14236

Mixed citation behavior. Most common role is background (43%).

56 Pith papers citing it

22.6k external citations · Crossref

Background 43% of classified citations

open at publisher browse 56 citing papers more from Alex Graves

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 4 method 2 baseline 1

citation-polarity summary

background 3 use method 2 baseline 1 unclear 1

authors

Alex Graves Andreas K. Fidjeland Andrei A. Rusu Charles Beattie David Silver Georg Ostrovski Joel Veness Koray Kavukcuoglu Marc G. Bellemare Martin Riedmiller Stig Petersen Volodymyr Mnih

co-cited works

representative citing papers

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

Heavy-Ball Q-Learning with Residual Weighting Correction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Corrected heavy-ball Q-learning with convergence and acceleration guarantees is derived via switched linear system and joint spectral radius analysis, extended to linear function approximation.

CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy

cs.RO · 2026-06-10 · unverdicted · novelty 7.0

CHORUS adapts a single VLA backbone for decentralized control of diverse robot teams, achieving 64-point gains over from-scratch decentralized baselines and outperforming centralized methods in real-world tasks using only local observations.

Expected Free Energy-based Planning as Variational Inference

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.

What Type of Inference is Active Inference?

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

cs.LG · 2026-06-02 · unverdicted · novelty 7.0

Derives an SDE describing the infinitesimal change in state distribution at each gradient step for neural actor-critic RL in continuous environments under vanishing learning rate in the infinite width limit.

Coordination Graphs for Constrained Multi-Agent Reinforcement Learning

cs.AI · 2026-06-01 · conditional · novelty 7.0

CG-CMARL decomposes constrained multi-agent RL into pairwise coordination graphs with shared Q-functions, using Max-Sum message passing and a Lagrangian multiplier to coordinate actions and trace Pareto fronts scalably.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

Inline Critic Steers Image Editing

cs.CV · 2026-05-12 · conditional · novelty 7.0

Inline Critic uses a learnable token to critique and steer a frozen image-editing model's intermediate layers during generation, delivering state-of-the-art results on GEdit-Bench, RISEBench, and KRIS-Bench.

Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum

cs.LG · 2026-02-02 · unverdicted · novelty 7.0

Single-timescale actor-critic with STORM momentum and a recent-sample buffer achieves optimal O(ε^{-2}) sample complexity for ε-optimal policies in finite discounted MDPs.

Variational Sequential Optimal Experimental Design using Reinforcement Learning

stat.ML · 2023-06-17 · unverdicted · novelty 7.0

vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.

Parametric Open Source Games

cs.GT · 2026-06-25 · unverdicted · novelty 6.0

Introduces parametric open-source games as continuous analogues of program equilibria, proves equilibrium existence, and derives an exact coupling threshold for cooperation in symmetric 2x2 games under gradient ascent.

SMR: Scheduler with Multi-Channel Map-Encoded Reinforcement Learning for Radio Telescopes

astro-ph.IM · 2026-06-25 · unverdicted · novelty 6.0

SMR uses multi-channel map-encoded reinforcement learning to achieve roughly 10% better time utilization than greedy baselines for single-dish radio telescope scheduling.

Identifying structural design principles shaping the computational abilities of recurrent neural networks

q-bio.NC · 2026-06-22 · unverdicted · novelty 6.0

Local 2- and 3-cycles enhance RNN computational capacity for Boolean functions, predicted by structural statistics, while adding interneurons boosts large networks.

NASDAQ: Normalized Observation Space Dynamics-Augmented Q-Learning

cs.LG · 2026-06-19 · unverdicted · novelty 6.0

NASDAQ normalizes observations in an online RL setting so that dynamics prediction losses are balanced across dimensions, yielding competitive performance with lower wall-time than prior model-based and self-predictive methods.

Formalizing Task-Space Complexity for Zero-Shot Generalization

cs.LG · 2026-06-18 · unverdicted · novelty 6.0

Introduces signed divergence to bound generalization gaps and defines task-space complexity as the minimum source contexts needed for ε-coverage under local smoothness, with set-cover reduction and empirical validation on LQR and DRL systems.

Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

RL training disrupts gradient-based adversarial attacks by inducing unstable low-magnitude gradients that limit the effectiveness of methods like PGD within practical budgets.

Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation

math.NA · 2026-06-09 · unverdicted · novelty 6.0

Dmsh is a new multi-agent RL framework that formulates mesh generation as an MDP and uses three coordinated agents plus curriculum learning to produce globally conforming all-quad meshes without post-processing.

Rollout-Level Advantage-Prioritized Experience Replay for GRPO

cs.LG · 2026-06-03 · conditional · novelty 6.0

Rollout-level advantage-prioritized experience replay for GRPO recycles high-advantage individual rollouts with age eviction and fresh-anchored batches to outperform standard GRPO on math benchmarks, with gains increasing with model size.

ReviewGuard: Aligning LLM-Assisted Peer Review with Long-Term Scientific Impact

cs.DL · 2026-05-29 · unverdicted · novelty 6.0

ReviewGuard aligns LLM peer reviews with future citations via impact-aligned RL, achieving Spearman ρ=0.776 on rejected-then-published AI/ML papers versus 0.492 for human reviewers and flagging 5.6× more high-impact cases.

When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

Benchmark study finds calibrated rule-based controller outperforms six DRL algorithms on cost for adaptive resource control across workloads, with action-space mismatch explaining large differences in constraint violations.

Learning in Low-Dimensional Subspaces: Orthogonal Bottlenecks for Reinforcement Learning

cs.LG · 2026-05-25 · unverdicted · novelty 6.0

Orthogonal bottlenecks constrain RL encoder features to low-dimensional subspaces while preserving expressivity and gradient dynamics under linear realizability when dimension exceeds the value function's intrinsic rank.

DemoEvolve: Overcoming Sparse Feedback in Agentic Harness Evolution with Demonstrations

cs.AI · 2026-05-23 · unverdicted · novelty 6.0

DemoEvolve bootstraps harness evolution with demonstrations to achieve more stable and effective edits than self-rollout search in sparse-feedback environments like Balatro.

Understanding Goal Generalisation in Sequential Reinforcement Learning

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Empirical analysis of over 100 sequential RL training pipelines across 250+ OOD environments finds salient features drive generalization and early goals persist, with latent policy gradients simulating latent variable evolution to predict OOD behavior from training history.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning cs.LG · 2019-10-01 · conditional · none · ref 10
AWR learns policies via advantage-weighted supervised regression on actions, achieving competitive off-policy performance on Gym tasks and strong results from static data alone.
Attentive Multi-Task Deep Reinforcement Learning cs.LG · 2019-07-05 · unverdicted · none · ref 21
Attention mechanism dynamically groups task knowledge at state granularity in multi-task DRL to enable positive transfer and avoid negative transfer, matching or exceeding prior methods with fewer parameters.
Optimal Use of Experience in First Person Shooter Environments cs.LG · 2019-06-24 · unverdicted · none · ref 1
Empirical tests in VizDoom show multiple DQN updates per step do not improve performance after learning rate adjustment, with a 4:1 update-to-step ratio optimal before significant degradation.

Human-level control through deep reinforcement learning

hub tools

citation-role summary

citation-polarity summary

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer