Nature , year=

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez et al · 2017 · DOI 10.1038/nature24270

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it

open at publisher browse 22 citing papers

citation-role summary

background 2 method 1

citation-polarity summary

background 1 unclear 1 use method 1

representative citing papers

Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin

cs.CR · 2026-05-22 · unverdicted · novelty 7.0

An RL-guided MCTS proof search for Tamarin finds more and shorter proofs than standard search across 16 protocol models.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairs

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

TESSERA combines LLMs as local policy and evaluator with MCTS on knowledge graphs to compose mechanistic drug-disease explanations.

On-line Learning in Tree MDPs by Treating Policies as Bandit Arms

cs.AI · 2026-05-06 · unverdicted · novelty 7.0

Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on terminal-state gaps rather than all policies.

Towards AI-assisted Neutrino Flavor Theory Design

hep-ph · 2025-06-09 · unverdicted · novelty 7.0

AMBer applies reinforcement learning with physics feedback to automate construction of neutrino flavor models that minimize free parameters, validated on known cases and extended to a new symmetry group.

Parametric Open Source Games

cs.GT · 2026-06-25 · unverdicted · novelty 6.0

Introduces parametric open-source games as continuous analogues of program equilibria, proves equilibrium existence, and derives an exact coupling threshold for cooperation in symmetric 2x2 games under gradient ascent.

Improved bounds for the double cap conjecture

math.CO · 2026-05-27 · unverdicted · novelty 6.0

Improved upper bound α_3 ≤ 0.2953 for Witsenhausen's problem in dimension 3 via harmonic analysis, geometric fractional chromatic number, and a computer-searched 33-point set.

ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

cs.RO · 2026-05-19 · accept · novelty 6.0 · 2 refs

ARC-RL is a new suite of four MuJoCo continuous-control environments featuring game-inspired hexapod and quadruped morphologies, a single closed-form multi-component reward function, CPG demonstrators, and empirical comparisons of online and offline-to-online RL algorithms.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

cs.CL · 2026-04-03 · unverdicted · novelty 6.0

Vocabulary dropout prevents diversity collapse in LLM co-evolution by masking proposer logits, yielding average +4.4 point solver gains on mathematical reasoning benchmarks at 8B scale.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

Scaling Laws for Transfer

cs.LG · 2021-02-02 · unverdicted · novelty 6.0

Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

The FIL Hypothesis: Inductive Biases Help with Kernel Engineering

cs.AI · 2026-06-29 · unverdicted · novelty 5.0

The FIL Hypothesis claims that inductive biases outperform purely data-driven methods on GPU programming tasks with non-trivial feedback loops.

Scientific discovery as meta-optimization: a combinatorial optimization case study

cs.AI · 2026-06-25 · unverdicted · novelty 5.0

Introduces consensus objective aggregation for meta-optimization of scientific discovery and reports improved scaling and speedup for 3-SAT algorithm discovery using digital MemComputing machines.

DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer

cs.CL · 2026-06-03 · unverdicted · novelty 5.0

DuDi is a dual-signal distillation method with cross-lingual verbalizer that improves multilingual SLM performance on SEA languages and outperforms baselines on SEA-HELM.

VET: A Framework for Analyzing AI Discourse

cs.AI · 2026-06-01 · unverdicted · novelty 5.0

Introduces the VET framework to categorize and critique polarized AI narratives including hype, doom, denial, and normalcy.

GIFT: Global stabilisation via Intrinsic Fine Tuning

cs.LG · 2026-04-25 · unverdicted · novelty 5.0

GIFT fine-tunes deep RL policies with a stability-focused reward to improve global stability while preserving task performance.

From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR

cs.SD · 2026-04-22 · unverdicted · novelty 5.0

A two-stage OMR pipeline decodes symbol candidates into polyphonic score structures via topology recognition with probability-guided search.

A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

cs.AI · 2025-01-27 · unverdicted · novelty 5.0

A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.

AI-Enabled Serious Games: Integrating Intelligence and Adaptivity in Training Systems

cs.AI · 2026-05-21 · unverdicted · novelty 3.0

The chapter synthesizes the history of adaptive learning systems and examines how AI can provide instructional intelligence and real-time adaptivity in serious games while highlighting challenges such as explainability and limited long-term outcome data.

Scheduling Discovery in the 2020s

astro-ph.IM · 2019-07-17 · unverdicted · novelty 2.0

Advocates developing high-quality open-source scheduling software and linking observation planning with data analysis for future astronomical surveys.

citing papers explorer

Showing 1 of 1 citing paper after filters.

From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR cs.SD · 2026-04-22 · unverdicted · none · ref 11
A two-stage OMR pipeline decodes symbol candidates into polyphonic score structures via topology recognition with probability-guided search.

Nature , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer