pith. machine review for the scientific record. sign in

arxiv: 2511.02864 · v3 · submitted 2025-11-03 · 💻 cs.NE · cs.AI· math.CA· math.CO· math.MG

Recognition: unknown

Mathematical exploration and discovery at scale

Authors on Pith no claims yet
classification 💻 cs.NE cs.AImath.CAmath.COmath.MG
keywords mathematicalalphaevolveproblemsevolutionaryresultssolutionsableautomated
0
0 comments X
read the original abstract

AlphaEvolve (Novikov et al., 2025) is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and refines algorithmic solutions to challenging scientific and practical problems. In this paper we showcase AlphaEvolve as a tool for autonomously discovering novel mathematical constructions and advancing our understanding of long-standing open problems. To demonstrate its breadth, we considered a list of 67 problems spanning mathematical analysis, combinatorics, geometry, and number theory. The system rediscovered the best known solutions in most of the cases and discovered improved solutions in several. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, highlighting the potential for significant new ways of interaction between mathematicians and AI systems. We present AlphaEvolve as a powerful new tool for mathematical discovery, capable of exploring vast search spaces to solve complex optimization problems at scale, often with significantly reduced requirements on preparation and computation time.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AI co-mathematician: Accelerating mathematicians with agentic AI

    cs.AI 2026-05 unverdicted novelty 7.0

    An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.

  2. Out-of-the-Box Global Optimization for Packing Problems: New Models and Improved Solutions

    math.OC 2026-05 unverdicted novelty 7.0

    New nonlinear formulations for geometric packing problems, solved with FICO Xpress and SCIP, produce improved and first-known solutions for several variants.

  3. $k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture

    cs.MS 2026-04 accept novelty 7.0

    k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.

  4. Learning to Discover at Test Time

    cs.LG 2026-01 unverdicted novelty 7.0

    TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.

  5. GRAFT-ATHENA: Self-Improving Agentic Teams for Autonomous Discovery and Evolutionary Numerical Algorithms

    cs.LG 2026-05 unverdicted novelty 6.0

    GRAFT-ATHENA projects combinatorial method choices into factored trees that embed as fingerprints in a metric space, enabling an agentic system to accumulate experience across domains and autonomously discover new num...

  6. AI co-mathematician: Accelerating mathematicians with agentic AI

    cs.AI 2026-05 unverdicted novelty 6.0

    An interactive AI workbench called the AI co-mathematician supports open-ended mathematical research and achieves a new high score of 48% on FrontierMath Tier 4.

  7. Intentmaking and Sensemaking: Human Interaction with AI-Guided Mathematical Discovery

    cs.AI 2026-05 unverdicted novelty 6.0

    Expert mathematicians using an AI coding agent for discovery engage in repeated cycles of intentmaking to define goals and sensemaking to interpret outputs.

  8. Using Large Language Models as a Co-Author in Undergraduate Quantum Group Research

    math.HO 2026-05 unverdicted novelty 6.0

    An AI model produced a new formula for a central element of U_q(so_12) at the quality level of advanced undergraduate research, along with faster computation via SageMath, prompting changes in mentorship practices.

  9. Evaluation-driven Scaling for Scientific Discovery

    cs.LG 2026-04 unverdicted novelty 6.0

    SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster ...

  10. SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models

    cs.CL 2026-04 unverdicted novelty 6.0

    SPS interleaves RL and IRL to counteract probability squeezing in LLM reasoning trajectories, improving Pass@k on five benchmarks while identifying an empirical upper bound on multi-sample performance.

  11. Computer Architecture's AlphaZero Moment: Automated Discovery in an Encircled World

    cs.AR 2026-03 conditional novelty 6.0

    Automated architectural discovery engines can outperform human design teams by exploring massive design spaces and compressing development cycles from months to weeks.

  12. Benchmarking Compound AI Applications for Hardware-Software Co-Design

    cs.DC 2026-03 unverdicted novelty 6.0

    Introduces a benchmarking suite for compound AI applications to support cross-stack performance, cost, and resource analysis for hardware-software co-design.

  13. How to Use Deep Learning to Identify Sufficient Conditions: A Case Study on Stanley's $e$-Positivity

    math.CO 2025-11 unverdicted novelty 6.0

    Deep learning identifies co-triangle-free graphs as e-positive and proves e-positivity for claw-free claw-contractible-free graphs on 10 and 11 vertices, resolving an open conjecture.

  14. Grokability in five inequalities

    math.PR 2026-05 unverdicted novelty 5.0

    Five improved inequalities were found with AI help: better Gaussian perimeter bounds for convex sets, sharper L2-L1 moments on the Hamming cube, a strengthened autoconvolution inequality, improved g-Sidon set bounds, ...

  15. Heterogeneous Scientific Foundation Model Collaboration

    cs.AI 2026-04 unverdicted novelty 5.0

    Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.

  16. pAI/MSc: ML Theory Research with Humans on the Loop

    cs.AI 2026-04 unverdicted novelty 5.0

    pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript dra...

  17. ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms

    cs.LG 2025-12 unverdicted novelty 5.0

    ATHENA introduces an agentic team framework that autonomously manages the end-to-end computational research lifecycle via a knowledge-driven HENA loop to achieve validation errors of 10^{-14} in scientific computing a...

  18. AI for Mathematics: Progress, Challenges, and Prospects

    math.HO 2026-01 unverdicted novelty 4.0

    AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.