hub

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.Science, 362(6419):1140–1144

Silver, D · 2018 · DOI 10.1126/science.aar6404

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

open at publisher browse 11 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

PMCTS: Particle Monte Carlo Tree Search for Principled Parallelized Inference Time Scaling

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

PMCTS is a new parallel MCTS variant that preserves formal policy improvement guarantees and scales with parallel compute, outperforming heuristic baselines in tested domains.

Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search

cs.LG · 2025-12-25 · unverdicted · novelty 7.0

Inverse-RPO derives two variance-aware prior-based UCT policies from UCB-V that outperform PUCT on benchmarks with no extra cost.

ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

cs.RO · 2026-05-19 · accept · novelty 6.0 · 2 refs

ARC-RL is a new suite of four MuJoCo continuous-control environments featuring game-inspired hexapod and quadruped morphologies, a single closed-form multi-component reward function, CPG demonstrators, and empirical comparisons of online and offline-to-online RL algorithms.

Towards Real-time Control of a CartPole System on a Quantum Computer

quant-ph · 2026-05-03 · unverdicted · novelty 6.0

A single-qubit quantum reinforcement learning agent solves CartPole faster than classical networks and quantifies shot-count versus control-frequency requirements for real-time closed-loop control on NISQ hardware, including direct electronics programming to reduce latency.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

(1D) Ordered Tokens Enable Efficient Test-Time Search

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

Coarse-to-fine 1D token sequences in autoregressive models enable stronger test-time search and even training-free text-to-image generation guided by verifiers, outperforming traditional 2D grid tokenization.

NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria

cs.LG · 2025-10-21 · unverdicted · novelty 6.0

NashPG is a policy-gradient method with iteratively refined regularization that guarantees monotonic convergence to Nash equilibria in two-player zero-sum extensive-form games and scales to large benchmarks.

Aristotle: IMO-level Automated Theorem Proving

cs.AI · 2025-10-01 · unverdicted · novelty 6.0

Aristotle reaches gold-medal-equivalent performance on 2025 IMO problems via integrated Lean proof search, informal lemma formalization, and a dedicated geometry solver.

GIFT: Global stabilisation via Intrinsic Fine Tuning

cs.LG · 2026-04-25 · unverdicted · novelty 5.0

GIFT fine-tunes deep RL policies with a stability-focused reward to improve global stability while preserving task performance.

What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?

cs.AI · 2025-12-30 · unverdicted · novelty 5.0

An empirical study of JEPA world models identifies architecture, training objective, and planning choices that yield a model outperforming DINO-WM and V-JEPA-2-AC on navigation and manipulation tasks.

Scheduling Discovery in the 2020s

astro-ph.IM · 2019-07-17 · unverdicted · novelty 2.0

Advocates developing high-quality open-source scheduling software and linking observation planning with data analysis for future astronomical surveys.

citing papers explorer

Showing 11 of 11 citing papers.

PMCTS: Particle Monte Carlo Tree Search for Principled Parallelized Inference Time Scaling cs.LG · 2026-05-09 · unverdicted · none · ref 3 · 2 links
PMCTS is a new parallel MCTS variant that preserves formal policy improvement guarantees and scales with parallel compute, outperforming heuristic baselines in tested domains.
Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search cs.LG · 2025-12-25 · unverdicted · none · ref 2
Inverse-RPO derives two variance-aware prior-based UCT policies from UCB-V that outperform PUCT on benchmarks with no extra cost.
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders cs.RO · 2026-05-19 · accept · none · ref 31 · 2 links
ARC-RL is a new suite of four MuJoCo continuous-control environments featuring game-inspired hexapod and quadruped morphologies, a single closed-form multi-component reward function, CPG demonstrators, and empirical comparisons of online and offline-to-online RL algorithms.
Towards Real-time Control of a CartPole System on a Quantum Computer quant-ph · 2026-05-03 · unverdicted · none · ref 16
A single-qubit quantum reinforcement learning agent solves CartPole faster than classical networks and quantifies shot-count versus control-frequency requirements for real-time closed-loop control on NISQ hardware, including direct electronics programming to reduce latency.
Evaluation-driven Scaling for Scientific Discovery cs.LG · 2026-04-21 · unverdicted · none · ref 120
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
(1D) Ordered Tokens Enable Efficient Test-Time Search cs.CV · 2026-04-16 · unverdicted · none · ref 2
Coarse-to-fine 1D token sequences in autoregressive models enable stronger test-time search and even training-free text-to-image generation guided by verifiers, outperforming traditional 2D grid tokenization.
NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria cs.LG · 2025-10-21 · unverdicted · none · ref 34
NashPG is a policy-gradient method with iteratively refined regularization that guarantees monotonic convergence to Nash equilibria in two-player zero-sum extensive-form games and scales to large benchmarks.
Aristotle: IMO-level Automated Theorem Proving cs.AI · 2025-10-01 · unverdicted · none · ref 40
Aristotle reaches gold-medal-equivalent performance on 2025 IMO problems via integrated Lean proof search, informal lemma formalization, and a dedicated geometry solver.
GIFT: Global stabilisation via Intrinsic Fine Tuning cs.LG · 2026-04-25 · unverdicted · none · ref 13
GIFT fine-tunes deep RL policies with a stability-focused reward to improve global stability while preserving task performance.
What Drives Success in Physical Planning with Joint-Embedding Predictive World Models? cs.AI · 2025-12-30 · unverdicted · none · ref 62
An empirical study of JEPA world models identifies architecture, training objective, and planning choices that yield a model outperforming DINO-WM and V-JEPA-2-AC on navigation and manipulation tasks.
Scheduling Discovery in the 2020s astro-ph.IM · 2019-07-17 · unverdicted · none · ref 33
Advocates developing high-quality open-source scheduling software and linking observation planning with data analysis for future astronomical surveys.

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.Science, 362(6419):1140–1144

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer