CRiSP uses neural-guided MCTS and curriculum learning to insert Clifford prefixes before parameterized rotations in VQAs, yielding mean 3.17x and max 45x gains in energy accuracy on 22-qubit QAOA benchmarks versus prior Clifford initializers.
Mastering the game of go without human knowledge.nature, 550(7676):354–359
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
background 2polarities
background 2representative citing papers
SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.
IRIS unifies self-play fine-tuning under an interpolative Rényi objective with adaptive alpha scheduling and reports better benchmark scores than baselines while surpassing full supervised fine-tuning with only 13% of the annotated data.
A projection-based visualization of critic match loss landscapes that reveals optimization paths and stability characteristics in online actor-critic reinforcement learning.
RIDER improves RNA 3D structural similarity by over 100% using RL-guided diffusion and discovers non-native sequence designs.
InternBootcamp supplies 1000+ verifiable, auto-generated task environments across domains that enable task scaling to improve LLM reasoning, producing a 32B model with state-of-the-art results on the new Bootcamp-EVAL benchmark.
Non-uniform replay helps most when replay volume is low; high-entropy sampling remains important, and a truncated geometric distribution delivers better sample efficiency with negligible overhead.
A quantum annealer can play tic-tac-toe by encoding only the game rules and sampling from paths leading to wins or losses.
citing papers explorer
-
Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning
CRiSP uses neural-guided MCTS and curriculum learning to insert Clifford prefixes before parameterized rotations in VQAs, yielding mean 3.17x and max 45x gains in energy accuracy on 22-qubit QAOA benchmarks versus prior Clifford initializers.
-
State-Centric Decision Process
SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.
-
IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning
IRIS unifies self-play fine-tuning under an interpolative Rényi objective with adaptive alpha scheduling and reports better benchmark scores than baselines while surpassing full supervised fine-tuning with only 13% of the annotated data.
-
Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms
A projection-based visualization of critic match loss landscapes that reveals optimization paths and stability characteristics in online actor-critic reinforcement learning.
-
RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion
RIDER improves RNA 3D structural similarity by over 100% using RL-guided diffusion and discovers non-native sequence designs.
-
InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling
InternBootcamp supplies 1000+ verifiable, auto-generated task environments across domains that enable task scaling to improve LLM reasoning, producing a 32B model with state-of-the-art results on the new Bootcamp-EVAL benchmark.
-
When Does Non-Uniform Replay Matter in Reinforcement Learning?
Non-uniform replay helps most when replay volume is low; high-entropy sampling remains important, and a truncated geometric distribution delivers better sample efficiency with negligible overhead.
-
Playing Dice with the Universe: Programming Quantum Computers to Play Traditional Games
A quantum annealer can play tic-tac-toe by encoding only the game rules and sampling from paths leading to wins or losses.