Mastering the game of go without human knowledge.nature, 550(7676):354–359

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al · 2017

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning

quant-ph · 2026-05-22 · unverdicted · novelty 7.0

CRiSP uses neural-guided MCTS and curriculum learning to insert Clifford prefixes before parameterized rotations in VQAs, yielding mean 3.17x and max 45x gains in energy accuracy on 22-qubit QAOA benchmarks versus prior Clifford initializers.

State-Centric Decision Process

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.

IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

IRIS unifies self-play fine-tuning under an interpolative Rényi objective with adaptive alpha scheduling and reports better benchmark scores than baselines while surpassing full supervised fine-tuning with only 13% of the annotated data.

Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms

cs.LG · 2026-03-15 · unverdicted · novelty 6.0

A projection-based visualization of critic match loss landscapes that reveals optimization paths and stability characteristics in online actor-critic reinforcement learning.

RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion

cs.LG · 2026-02-18 · unverdicted · novelty 6.0

RIDER improves RNA 3D structural similarity by over 100% using RL-guided diffusion and discovers non-native sequence designs.

InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling

cs.CL · 2025-08-12 · unverdicted · novelty 6.0

InternBootcamp supplies 1000+ verifiable, auto-generated task environments across domains that enable task scaling to improve LLM reasoning, producing a 32B model with state-of-the-art results on the new Bootcamp-EVAL benchmark.

When Does Non-Uniform Replay Matter in Reinforcement Learning?

cs.LG · 2026-05-11 · unverdicted · novelty 5.0 · 3 refs

Non-uniform replay helps most when replay volume is low; high-entropy sampling remains important, and a truncated geometric distribution delivers better sample efficiency with negligible overhead.

Playing Dice with the Universe: Programming Quantum Computers to Play Traditional Games

cs.ET · 2026-04-26 · unverdicted · novelty 5.0

A quantum annealer can play tic-tac-toe by encoding only the game rules and sampling from paths leading to wins or losses.

citing papers explorer

Showing 8 of 8 citing papers.

Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning quant-ph · 2026-05-22 · unverdicted · none · ref 93
CRiSP uses neural-guided MCTS and curriculum learning to insert Clifford prefixes before parameterized rotations in VQAs, yielding mean 3.17x and max 45x gains in energy accuracy on 22-qubit QAOA benchmarks versus prior Clifford initializers.
State-Centric Decision Process cs.AI · 2026-05-12 · unverdicted · none · ref 36
SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.
IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning cs.LG · 2026-04-22 · unverdicted · none · ref 64
IRIS unifies self-play fine-tuning under an interpolative Rényi objective with adaptive alpha scheduling and reports better benchmark scores than baselines while surpassing full supervised fine-tuning with only 13% of the annotated data.
Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms cs.LG · 2026-03-15 · unverdicted · none · ref 2
A projection-based visualization of critic match loss landscapes that reveals optimization paths and stability characteristics in online actor-critic reinforcement learning.
RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion cs.LG · 2026-02-18 · unverdicted · none · ref 43
RIDER improves RNA 3D structural similarity by over 100% using RL-guided diffusion and discovers non-native sequence designs.
InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling cs.CL · 2025-08-12 · unverdicted · none · ref 41
InternBootcamp supplies 1000+ verifiable, auto-generated task environments across domains that enable task scaling to improve LLM reasoning, producing a 32B model with state-of-the-art results on the new Bootcamp-EVAL benchmark.
When Does Non-Uniform Replay Matter in Reinforcement Learning? cs.LG · 2026-05-11 · unverdicted · none · ref 34 · 3 links
Non-uniform replay helps most when replay volume is low; high-entropy sampling remains important, and a truncated geometric distribution delivers better sample efficiency with negligible overhead.
Playing Dice with the Universe: Programming Quantum Computers to Play Traditional Games cs.ET · 2026-04-26 · unverdicted · none · ref 7
A quantum annealer can play tic-tac-toe by encoding only the game rules and sampling from paths leading to wins or losses.

Mastering the game of go without human knowledge.nature, 550(7676):354–359

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer