Gerald Tesauro

URLhttp://dx · 2017 · DOI 10.1038/nature24270

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open at publisher browse 8 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairs

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

TESSERA combines LLMs as local policy and evaluator with MCTS on knowledge graphs to compose mechanistic drug-disease explanations.

On-line Learning in Tree MDPs by Treating Policies as Bandit Arms

cs.AI · 2026-05-06 · unverdicted · novelty 7.0

Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on terminal-state gaps rather than all policies.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

cs.CL · 2026-04-03 · unverdicted · novelty 6.0

Vocabulary dropout prevents diversity collapse in LLM co-evolution by masking proposer logits, yielding average +4.4 point solver gains on mathematical reasoning benchmarks at 8B scale.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

GIFT: Global stabilisation via Intrinsic Fine Tuning

cs.LG · 2026-04-25 · unverdicted · novelty 5.0

GIFT fine-tunes deep RL policies with a stability-focused reward to improve global stability while preserving task performance.

From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR

cs.SD · 2026-04-22 · unverdicted · novelty 5.0

A two-stage OMR pipeline decodes symbol candidates into polyphonic score structures via topology recognition with probability-guided search.

citing papers explorer

Showing 8 of 8 citing papers.

LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairs cs.AI · 2026-05-10 · unverdicted · none · ref 5
TESSERA combines LLMs as local policy and evaluator with MCTS on knowledge graphs to compose mechanistic drug-disease explanations.
On-line Learning in Tree MDPs by Treating Policies as Bandit Arms cs.AI · 2026-05-06 · unverdicted · none · ref 46
Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on terminal-state gaps rather than all policies.
Evaluation-driven Scaling for Scientific Discovery cs.LG · 2026-04-21 · unverdicted · none · ref 119
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution cs.CL · 2026-04-03 · unverdicted · none · ref 33
Vocabulary dropout prevents diversity collapse in LLM co-evolution by masking proposer logits, yielding average +4.4 point solver gains on mathematical reasoning benchmarks at 8B scale.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 284
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 206
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
GIFT: Global stabilisation via Intrinsic Fine Tuning cs.LG · 2026-04-25 · unverdicted · none · ref 12
GIFT fine-tunes deep RL policies with a stability-focused reward to improve global stability while preserving task performance.
From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR cs.SD · 2026-04-22 · unverdicted · none · ref 11
A two-stage OMR pipeline decodes symbol candidates into polyphonic score structures via topology recognition with probability-guided search.

Gerald Tesauro

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer