SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2representative citing papers
ThetaEvolve enables small open-source LLMs to achieve new best-known bounds on open problems such as circle packing by combining test-time RL with a large program database and lazy penalties.
citing papers explorer
-
Evaluation-driven Scaling for Scientific Discovery
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
-
ThetaEvolve: Test-time Learning on Open Problems
ThetaEvolve enables small open-source LLMs to achieve new best-known bounds on open problems such as circle packing by combining test-time RL with a large program database and lazy penalties.