pith. machine review for the scientific record. sign in

Scaling test-time compute for llm agents

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1 baseline 1

citation-polarity summary

years

2026 5

verdicts

UNVERDICTED 5

representative citing papers

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

citing papers explorer

Showing 5 of 5 citing papers.