SMCEvolve: Principled Scientific Discovery via Sequential Monte Carlo Evolution

Huminhao Zhu; Jiachen Jiang; Zhihui Zhu

arxiv: 2605.15308 · v1 · pith:ATDEY3YCnew · submitted 2026-05-14 · 💻 cs.AI · cs.LG· cs.MA

SMCEvolve: Principled Scientific Discovery via Sequential Monte Carlo Evolution

Jiachen Jiang , Huminhao Zhu , Zhihui Zhu This is my paper

Pith reviewed 2026-05-19 16:17 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.MA

keywords program evolutionsequential monte carloLLM-driven discoveryscientific discoveryprogram searchevolutionary algorithmsconvergence analysis

0 comments

The pith

SMCEvolve recasts program search as sampling from a reward-tilted target distribution and approximates it with a Sequential Monte Carlo sampler.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to give LLM-driven program evolution a principled foundation instead of ad-hoc heuristics. It treats the task of finding high-reward programs as drawing samples from a distribution that favors better solutions, then uses Sequential Monte Carlo to generate those samples step by step. This perspective directly yields three concrete mechanisms: adaptive resampling of parent programs, a mixture of mutation steps with acceptance rules, and built-in rules for deciding when the search has converged. The work also supplies a finite-sample analysis that limits how many LLM queries are needed to reach a chosen level of approximation accuracy. Experiments across mathematical problems, algorithm design, symbolic regression, and full ML research pipelines show the method exceeds prior evolving systems while stopping itself after fewer calls.

Core claim

SMCEvolve recasts program search as sampling from a reward-tilted target distribution and approximates it with a Sequential Monte Carlo (SMC) sampler. From this view, three core mechanisms emerge as principled components: adaptive parent resampling, mixture of mutation with acceptance, and automatic convergence control. We further provide a finite-sample complexity analysis that bounds the LLM-call budget required to reach a target approximation error.

What carries the argument

Sequential Monte Carlo sampler that draws from a reward-tilted distribution over programs, supplying adaptive parent resampling, mutation-plus-acceptance steps, and automatic stopping rules.

If this is right

The search reaches a target approximation error after a bounded number of LLM queries.
The process terminates automatically once the convergence criterion is met.
Performance improves over prior LLM evolution systems on math, algorithm efficiency, symbolic regression, and end-to-end ML tasks while using fewer calls.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sampling view could be applied to other LLM-based generation tasks that currently rely on heuristic selection.
The complexity bound offers a way to compare different mutation operators by how efficiently they reduce approximation error.
Automatic convergence control may reduce the need for manual hyper-parameter tuning when deploying these systems in new domains.

Load-bearing premise

The chosen mutation operators and acceptance probabilities inside the SMC framework produce unbiased samples from the intended reward-tilted distribution without extra corrections.

What would settle it

A direct check that the distribution of final programs deviates from the reward-tilted target or that the observed error after the predicted number of LLM calls exceeds the bound given by the complexity analysis.

Figures

Figures reproduced from arXiv: 2605.15308 by Huminhao Zhu, Jiachen Jiang, Zhihui Zhu.

**Figure 1.** Figure 1: LLM-driven program evolution targeting the reward-tilted distribution via Sequential Monte Carlo (SMC). The contour depicts the distribution over the program space; each white dot is a particle (program) in the evolving population, and the dashed circle marks the high-reward region. Starting from particles drawn from the LLM prior p0 (left), where the high-reward region is rarely covered, the population is… view at source ↗

**Figure 2.** Figure 2: Overview of SMCEVOLVE. Left: the main loop. Starting from N initial programs, each of T iterations (i) reweights and resamples particles, (ii) mutates each parent through a K-step MH chain of LLM proposals and accept/reject updates, and (iii) schedules the next λt via ESS, terminating when λt = 1. Right, top to bottom: the evolving process on circle packing (best-so-far reward across iterations); the mixtu… view at source ↗

**Figure 3.** Figure 3: Case study of SMCEVOLVE on circle packing, illustrating the three mechanisms. (a) Resampling probability of each particle (ranked by reward) across iterations. (b) Per-iteration selection counts and cumulative MH acceptance rates of the four mutation kernels M(1), . . . , M(4) . (c) The ESS curves ESS(λ) at each iteration (left) and the resulting schedule λt (right); each λt is set where ESS(λ) crosses the… view at source ↗

**Figure 4.** Figure 4: AutoResearch domain [21]. The SMCEvolve terminated automatically under ESS-control. 4.2 Ablation Study We ablate the key designs—adaptive parent resampling and the mixture of mutation kernels in Section 2—as well as the hyperparameters N (particle count) and K (MH chain length). Each variant changes a single choice while holding the LLM-call budget fixed ( [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Per-iteration SMC flow on Circle Packing N=21 (I=2 islands, N=8 particles per island, K=2 MH proposals). Rows of each panel correspond to reweight, resample, and mutate; node fill encodes reward (red–yellow–green); thick rings in the reweight row mark inter-island migrants; resample arrows show the multinomial replication map; on mutate nodes, stroke color distinguishes the edit operator (blue = local diff… view at source ↗

read the original abstract

LLM-driven program evolution has emerged as a powerful tool for automated scientific discovery, yet existing frameworks offer no principled guide for designing their individual components and provide no guarantee that the search converges. We introduce SMCEvolve, which recasts program search as sampling from a reward-tilted target distribution and approximates it with a Sequential Monte Carlo (SMC) sampler. From this view, three core mechanisms emerge as principled components: adaptive parent resampling, mixture of mutation with acceptance, and automatic convergence control. We further provide a finite-sample complexity analysis that bounds the LLM-call budget required to reach a target approximation error. Across math, algorithm efficiency, symbolic regression, and end-to-end ML research benchmarks, SMCEvolve surpasses state-of-the-art evolving systems while using fewer LLM calls under self-determined termination. The code is available at https://github.com/kongwanbianjinyu/SMCEvolve.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SMCEvolve frames LLM program evolution as SMC sampling from a reward-tilted target and adds a finite-sample complexity bound, but the bound rests on kernel assumptions that are not verified.

read the letter

The main point is that SMCEvolve recasts LLM program evolution as sampling from a reward-tilted distribution using Sequential Monte Carlo, along with a finite-sample complexity bound on the number of LLM calls. The paper reports better results than existing systems across math, algorithm efficiency, symbolic regression, and ML research tasks, with fewer calls and automatic stopping. It does well by organizing the components into principled parts: adaptive parent resampling, mixture mutations with acceptance, and convergence control. This gives a clearer methodology than prior ad-hoc evolutionary approaches. The finite-sample analysis is an attempt to quantify the budget needed for a target error, which could be useful for planning experiments. Releasing the code allows others to inspect and build on it. The soft spots are in how well the theory supports the claims. The bound assumes the LLM mutation kernel is irreducible and satisfies the conditions for unbiased SMC approximation, like detailed balance with the acceptance step. But LLM prompts are unlikely to guarantee transitions between all possible programs or exact reversibility, and the paper does not show verification of these properties or any correction terms. The benchmarks show empirical gains but lack details on error metrics, baseline comparisons, or checks for sampling bias, so the performance claims are hard to fully assess. This paper targets researchers working on automated scientific discovery who want more than black-box evolution. Readers interested in applying sampling algorithms to generative models might find the mapping interesting and get practical value from the benchmarks and code. It deserves a serious referee because the core idea of using SMC for structured search is worth examining, even with the current gaps. I recommend sending it to peer review so the authors can address the kernel assumptions and provide more supporting analysis for the bound.

Referee Report

1 major / 2 minor

Summary. The paper introduces SMCEvolve, which recasts LLM-driven program evolution as sampling from a reward-tilted target distribution π(x) ∝ r(x)·p0(x) and approximates it via a Sequential Monte Carlo (SMC) sampler. It identifies three principled components—adaptive parent resampling, mixture of mutation with acceptance, and automatic convergence control—supplies a finite-sample complexity analysis bounding the LLM-call budget for a target approximation error, and reports superior performance over state-of-the-art evolving systems on math, algorithm efficiency, symbolic regression, and end-to-end ML research benchmarks while using fewer calls under self-determined termination. Code is released at the provided GitHub link.

Significance. If the finite-sample analysis is valid and the empirical gains hold under the stated conditions, the work would offer a theoretically grounded alternative to ad-hoc LLM evolution frameworks, with potential to improve reliability and efficiency in automated scientific discovery. The explicit release of code supports reproducibility, which strengthens the contribution.

major comments (1)

[§4] §4 (finite-sample complexity analysis): The claimed bound on LLM calls to reach a target total-variation or KL error presupposes that the mutation kernel K (LLM prompt plus acceptance step) is irreducible on the program space and satisfies detailed balance with the acceptance probability so that the SMC particle system remains unbiased for π. No verification, proof sketch, or empirical diagnostic of these kernel properties is supplied, yet they are load-bearing for both the unbiasedness claim and the complexity guarantee; without them the analysis does not apply to the implemented operators.

minor comments (2)

[§5] The abstract and §5 experiments mention “fewer LLM calls” and “self-determined termination” but do not tabulate the exact call counts or termination criteria against each baseline, making direct comparison harder to assess.
Notation for the reward-tilted target π(x) and the base distribution p0(x) is introduced without an explicit equation reference in the early sections; adding a numbered display equation would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and for identifying a key point in our finite-sample analysis. We address the concern directly below and are prepared to strengthen the manuscript accordingly.

read point-by-point responses

Referee: [§4] §4 (finite-sample complexity analysis): The claimed bound on LLM calls to reach a target total-variation or KL error presupposes that the mutation kernel K (LLM prompt plus acceptance step) is irreducible on the program space and satisfies detailed balance with the acceptance probability so that the SMC particle system remains unbiased for π. No verification, proof sketch, or empirical diagnostic of these kernel properties is supplied, yet they are load-bearing for both the unbiasedness claim and the complexity guarantee; without them the analysis does not apply to the implemented operators.

Authors: We appreciate this observation. The mutation kernel in Section 4 is defined as an LLM-generated proposal followed by an acceptance step whose probability is set to enforce detailed balance with respect to the target π(x) ∝ r(x)·p0(x), exactly as in the Metropolis-Hastings construction. This guarantees reversibility and unbiasedness of the SMC estimator by design. Irreducibility follows from the assumption that the LLM proposal can emit any syntactically valid program with positive probability, which is standard for expressive generative models and sufficient for the finite-sample bounds to hold. We acknowledge that a fully rigorous verification for a black-box LLM is non-trivial; however, the analysis is presented under these kernel assumptions, which are explicitly stated in the current text. In the revision we will add a short proof sketch of detailed balance, a paragraph clarifying the irreducibility assumption, and an empirical diagnostic (e.g., acceptance-rate statistics across runs) to support the claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation applies standard SMC theory to new domain

full rationale

The paper recasts program search as sampling from the reward-tilted target π(x) ∝ r(x) p0(x) and invokes standard SMC resampling, mutation, and acceptance steps together with known finite-sample complexity bounds on total-variation or KL error. These steps follow directly from the SMC literature without redefining any target quantity in terms of fitted outputs, without presenting a fitted parameter as a prediction, and without load-bearing self-citations or uniqueness theorems imported from the same authors. The mutation kernel is treated as an external modeling choice whose validity is assumed rather than derived from the paper's own results; the complexity bound is therefore not tautological with the experimental data. The derivation chain remains self-contained against external SMC benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard assumptions of Sequential Monte Carlo theory and on the premise that LLM-generated mutations can be treated as valid proposal distributions; no free parameters, new entities, or ad-hoc axioms are enumerated in the abstract.

axioms (1)

domain assumption LLM mutation operators can serve as proposal distributions whose acceptance probabilities remain well-defined under the reward-tilted target
Invoked when the paper states that mixture of mutation with acceptance forms a core principled component.

pith-pipeline@v0.9.0 · 5689 in / 1144 out tokens · 54386 ms · 2026-05-19T16:17:05.087504+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Lemma 2.1 (Reward-tilted target distribution). The unique maximizer of (1) is p*(x|q) ≜ 1/Z(q) p0(x|q) e^{β R(x)}
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MH accept/reject for pt-invariance ... αt(x, x′) = min{1, e^{βt(R(x′)−R(x))}}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 5 internal anchors

[1]

Reevo: Large language models as hyper-heuristics with reflective evolution,

H. Ye, J. Wang, Z. Cao, F. Berto, C. Hua, H. Kim, J. Park, and G. Song, “Reevo: Large language models as hyper-heuristics with reflective evolution,”Advances in neural information processing systems, vol. 37, pp. 43571–43608, 2024

work page 2024
[2]

arXiv preprint arXiv:2401.02051 , year=

F. Liu, X. Tong, M. Yuan, X. Lin, F. Luo, Z. Wang, Z. Lu, and Q. Zhang, “Evolution of heuristics: Towards efficient automatic algorithm design using large language model,”arXiv preprint arXiv:2401.02051, 2024

work page arXiv 2024
[3]

Mathematical discoveries from program search with large language models,

B. Romera-Paredes, M. Barekatain, A. Novikov, M. Balog, M. P. Kumar, E. Dupont, F. J. Ruiz, J. S. Ellenberg, P. Wang, O. Fawzi,et al., “Mathematical discoveries from program search with large language models,”Nature, vol. 625, no. 7995, pp. 468–475, 2024

work page 2024
[4]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

A. Novikov, N. V˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. Ruiz, A. Mehrabian,et al., “Alphaevolve: A coding agent for scientific and algorithmic discovery,”arXiv preprint arXiv:2506.13131, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

R. T. Lange, Y . Imajuku, and E. Cetin, “Shinkaevolve: Towards open-ended and sample-efficient program evolution,”arXiv preprint arXiv:2509.19349, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Mathematical exploration and discovery at scale

B. Georgiev, J. Gómez-Serrano, T. Tao, and A. Z. Wagner, “Mathematical exploration and discovery at scale,”arXiv preprint arXiv:2511.02864, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Llm-sr: Scientific equation discovery via programming with large language models,

P. Shojaee, K. Meidani, S. Gupta, A. B. Farimani, and C. K. Reddy, “Llm-sr: Scientific equation discovery via programming with large language models,”arXiv preprint arXiv:2404.18400, 2024

work page arXiv 2024
[8]

Llema: Evolutionary search with llms for multi-objective materials discovery,

N. Abhyankar, S. Kabra, S. Desai, and C. K. Reddy, “Llema: Evolutionary search with llms for multi-objective materials discovery,”

work page
[9]

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

J. S. Chan, N. Chowdhury, O. Jaffe, J. Aung, D. Sherburn, E. Mays, G. Starace, K. Liu, L. Maksin, T. Patwardhan,et al., “Mle-bench: Evaluating machine learning agents on machine learning engineering,”arXiv preprint arXiv:2410.07095, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering, October 2025.https://arxiv.org/abs/2506.09050

Y . Imajuku, K. Horie, Y . Iwata, K. Aoki, N. Takahashi, and T. Akiba, “Ale-bench: A benchmark for long-horizon objective-driven algorithm engineering,”arXiv preprint arXiv:2506.09050, 2025

work page arXiv 2025
[11]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Y . Yamada, R. T. Lange, C. Lu, S. Hu, C. Lu, J. Foerster, J. Clune, and D. Ha, “The ai scientist- v2: Workshop-level automated scientific discovery via agentic tree search,”arXiv preprint arXiv:2504.08066, 2025. 11

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

Deltaevolve: Accelerating scientific discovery through momentum-driven evolution,

J. Jiang, T. Ding, and Z. Zhu, “Deltaevolve: Accelerating scientific discovery through momentum-driven evolution,”arXiv preprint arXiv:2602.02919, 2026

work page arXiv 2026
[13]

AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization, February 2026

M. Cemri, S. Agrawal, A. Gupta, S. Liu, A. Cheng, Q. Mang, A. Naren, L. E. Erdogan, K. Sen, M. Zaharia,et al., “Adaevolve: Adaptive llm driven zeroth-order optimization,”arXiv preprint arXiv:2602.20133, 2026

work page arXiv 2026
[14]

Chi, Fernando Pereira, Wang-Cheng Kang, Derek Zhiyuan Cheng, and Beidou Wang

M. Yan, B. Peng, B. Coleman, Z. Chen, Z. Xie, S. Chen, Z. He, N. Sachdeva, I. Ye, W. Wang, et al., “Pacevolve: Enabling long-horizon progress-aware consistent evolution,”arXiv preprint arXiv:2601.10657, 2026

work page arXiv 2026
[15]

Pan, Alexander Du, Kurt Keutzer, Alvin Cheung, Alexandros G

S. Liu, S. Agarwal, M. Maheswaran, M. Cemri, Z. Li, Q. Mang, A. Naren, E. Boneh, A. Cheng, M. Z. Pan,et al., “Evox: Meta-evolution for automated discovery,”arXiv preprint arXiv:2602.23413, 2026

work page arXiv 2026
[16]

Sequential monte carlo samplers,

P. Del Moral, A. Doucet, and A. Jasra, “Sequential monte carlo samplers,”Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 68, no. 3, pp. 411–436, 2006

work page 2006
[17]

An invitation to sequential monte carlo samplers,

C. Dai, J. Heng, P. E. Jacob, and N. Whiteley, “An invitation to sequential monte carlo samplers,” Journal of the American Statistical Association, vol. 117, no. 539, pp. 1587–1600, 2022

work page 2022
[18]

A tutorial on thompson sampling,

D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, “A tutorial on thompson sampling,”Foundations and Trends in Machine Learning, vol. 11, no. 1, pp. 1–96, 2018

work page 2018
[19]

K., Krupke, D., Kidger, P., Sajed, T., Stellato, B., Park, J., et al

O. Press, B. Amos, H. Zhao, Y . Wu, S. K. Ainsworth, D. Krupke, P. Kidger, T. Sajed, B. Stellato, J. Park, N. Bosch, E. Meril, A. Steppi, A. Zharmagambetov, F. Zhang, D. Perez-Pineiro, A. Mercurio, N. Zhan, T. Abramovich, K. Lieret, H. Zhang, S. Huang, M. Bethge, and O. Press, “Algotune: Can language models speed up general-purpose numerical programs?,”ar...

work page arXiv 2025
[20]

Llm- srbench: A new benchmark for scientific equation discovery with large language models,

P. Shojaee, N.-H. Nguyen, K. Meidani, A. B. Farimani, K. D. Doan, and C. K. Reddy, “Llm- srbench: A new benchmark for scientific equation discovery with large language models,”arXiv preprint arXiv:2504.10415, 2025

work page arXiv 2025
[21]

autoresearch: Ai agents running research experiments

A. Karpathy, “autoresearch: Ai agents running research experiments.”https://github.com/ karpathy/autoresearch, 2026. GitHub repository

work page 2026
[22]

S. P. Meyn and R. L. Tweedie,Markov chains and stochastic stability. Springer Science & Business Media, 2012

work page 2012
[23]

Finite-sample complexity of sequential monte carlo estimators,

J. Marion, J. Mathews, and S. C. Schmidler, “Finite-sample complexity of sequential monte carlo estimators,”The Annals of Statistics, vol. 51, no. 3, pp. 1357–1375, 2023

work page 2023
[24]

Sequential monte carlo steering of large language models using probabilistic programs,

A. K. Lew, Z.-X. Tan, G. Grand, and V . K. Mansinghka, “Sequential monte carlo steering of large language models using probabilistic programs,”arXiv preprint arXiv:2306.03081, 2023

work page arXiv 2023
[25]

Probabilistic inference in language models via twisted sequential Monte Carlo,

S. Zhao, R. Brekelmans, A. Makhzani, and R. B. Grosse, “Probabilistic inference in language models via twisted sequential Monte Carlo,” inProceedings of the 41st International Conference on Machine Learning, vol. 235 ofProceedings of Machine Learning Research, pp. 60704–60748, PMLR, 2024

work page 2024
[26]

Step-by-step reasoning for math problems via twisted sequential Monte Carlo,

S. Feng, X. Kong, S. Ma, A. Zhang, D. Yin, C. Wang, R. Pang, and Y . Yang, “Step-by-step reasoning for math problems via twisted sequential Monte Carlo,” inInternational Conference on Learning Representations, 2025

work page 2025
[27]

Del Moral,Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications

P. Del Moral,Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Probability and Its Applications, New York: Springer, 2004

work page 2004
[28]

Chopin and O

N. Chopin and O. Papaspiliopoulos,An Introduction to Sequential Monte Carlo. Springer Series in Statistics, Cham: Springer, 2020

work page 2020
[29]

Annealed importance sampling,

R. M. Neal, “Annealed importance sampling,”Statistics and Computing, vol. 11, no. 2, pp. 125– 139, 2001. 12

work page 2001
[30]

Optimization by simulated annealing,

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,”Science, vol. 220, no. 4598, pp. 671–680, 1983

work page 1983
[31]

The cross-entropy method for combinatorial and continuous optimization,

R. Y . Rubinstein, “The cross-entropy method for combinatorial and continuous optimization,” Methodology and Computing in Applied Probability, vol. 1, no. 2, pp. 127–190, 1999

work page 1999
[32]

Improved particle filter for nonlinear problems,

J. Carpenter, P. Clifford, and P. Fearnhead, “Improved particle filter for nonlinear problems,” in IEE Proceedings – Radar, Sonar and Navigation, vol. 146, pp. 2–7, IET, 1999

work page 1999
[33]

Comparison of resampling schemes for particle filtering,

R. Douc, O. Cappé, and E. Moulines, “Comparison of resampling schemes for particle filtering,” Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis (ISPA 2005), pp. 64–69, 2005

work page 2005
[34]

Monte carlo filter and smoother for non-gaussian nonlinear state space models,

G. Kitagawa, “Monte carlo filter and smoother for non-gaussian nonlinear state space models,” Journal of Computational and Graphical Statistics, vol. 5, no. 1, pp. 1–25, 1996. 13 This appendix provides supplementary material for SMCEVOLVE. We first position the method relative to prior work in Appendix A, then give the derivations and proofs supporting th...

work page 1996
[35]

the bridge weights satisfy supx wt(x)≤W and zt−1/zt ≤Z for every t, with γ≜W·Z (this is AS1 of 23)

work page
[36]

close-to-uniform

there exists τ <∞ such that τMt(η⋆, ω⋆)≤τ for every t, at the internal precision/warmness level chosen by the analysis (Theorem 1 of 23 usesω ⋆ = 2andη ⋆ = 1/(8N T)); 3.Γ≜sup x dπ/dµ0(x)<∞. Then for every ∥f∥ ∞ ≤1 and ϵ >0 , there exists a choice of (N, T, K) such that the SMC empirical measure ηN T satisfies P(|ηN T (f)−π(f)| ≤ϵ)≥3/4 , and the total kern...

work page 2024

[1] [1]

Reevo: Large language models as hyper-heuristics with reflective evolution,

H. Ye, J. Wang, Z. Cao, F. Berto, C. Hua, H. Kim, J. Park, and G. Song, “Reevo: Large language models as hyper-heuristics with reflective evolution,”Advances in neural information processing systems, vol. 37, pp. 43571–43608, 2024

work page 2024

[2] [2]

arXiv preprint arXiv:2401.02051 , year=

F. Liu, X. Tong, M. Yuan, X. Lin, F. Luo, Z. Wang, Z. Lu, and Q. Zhang, “Evolution of heuristics: Towards efficient automatic algorithm design using large language model,”arXiv preprint arXiv:2401.02051, 2024

work page arXiv 2024

[3] [3]

Mathematical discoveries from program search with large language models,

B. Romera-Paredes, M. Barekatain, A. Novikov, M. Balog, M. P. Kumar, E. Dupont, F. J. Ruiz, J. S. Ellenberg, P. Wang, O. Fawzi,et al., “Mathematical discoveries from program search with large language models,”Nature, vol. 625, no. 7995, pp. 468–475, 2024

work page 2024

[4] [4]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

A. Novikov, N. V˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. Ruiz, A. Mehrabian,et al., “Alphaevolve: A coding agent for scientific and algorithmic discovery,”arXiv preprint arXiv:2506.13131, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[5] [5]

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

R. T. Lange, Y . Imajuku, and E. Cetin, “Shinkaevolve: Towards open-ended and sample-efficient program evolution,”arXiv preprint arXiv:2509.19349, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[6] [6]

Mathematical exploration and discovery at scale

B. Georgiev, J. Gómez-Serrano, T. Tao, and A. Z. Wagner, “Mathematical exploration and discovery at scale,”arXiv preprint arXiv:2511.02864, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

Llm-sr: Scientific equation discovery via programming with large language models,

P. Shojaee, K. Meidani, S. Gupta, A. B. Farimani, and C. K. Reddy, “Llm-sr: Scientific equation discovery via programming with large language models,”arXiv preprint arXiv:2404.18400, 2024

work page arXiv 2024

[8] [8]

Llema: Evolutionary search with llms for multi-objective materials discovery,

N. Abhyankar, S. Kabra, S. Desai, and C. K. Reddy, “Llema: Evolutionary search with llms for multi-objective materials discovery,”

work page

[9] [9]

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

J. S. Chan, N. Chowdhury, O. Jaffe, J. Aung, D. Sherburn, E. Mays, G. Starace, K. Liu, L. Maksin, T. Patwardhan,et al., “Mle-bench: Evaluating machine learning agents on machine learning engineering,”arXiv preprint arXiv:2410.07095, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering, October 2025.https://arxiv.org/abs/2506.09050

Y . Imajuku, K. Horie, Y . Iwata, K. Aoki, N. Takahashi, and T. Akiba, “Ale-bench: A benchmark for long-horizon objective-driven algorithm engineering,”arXiv preprint arXiv:2506.09050, 2025

work page arXiv 2025

[11] [11]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Y . Yamada, R. T. Lange, C. Lu, S. Hu, C. Lu, J. Foerster, J. Clune, and D. Ha, “The ai scientist- v2: Workshop-level automated scientific discovery via agentic tree search,”arXiv preprint arXiv:2504.08066, 2025. 11

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [12]

Deltaevolve: Accelerating scientific discovery through momentum-driven evolution,

J. Jiang, T. Ding, and Z. Zhu, “Deltaevolve: Accelerating scientific discovery through momentum-driven evolution,”arXiv preprint arXiv:2602.02919, 2026

work page arXiv 2026

[13] [13]

AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization, February 2026

M. Cemri, S. Agrawal, A. Gupta, S. Liu, A. Cheng, Q. Mang, A. Naren, L. E. Erdogan, K. Sen, M. Zaharia,et al., “Adaevolve: Adaptive llm driven zeroth-order optimization,”arXiv preprint arXiv:2602.20133, 2026

work page arXiv 2026

[14] [14]

Chi, Fernando Pereira, Wang-Cheng Kang, Derek Zhiyuan Cheng, and Beidou Wang

M. Yan, B. Peng, B. Coleman, Z. Chen, Z. Xie, S. Chen, Z. He, N. Sachdeva, I. Ye, W. Wang, et al., “Pacevolve: Enabling long-horizon progress-aware consistent evolution,”arXiv preprint arXiv:2601.10657, 2026

work page arXiv 2026

[15] [15]

Pan, Alexander Du, Kurt Keutzer, Alvin Cheung, Alexandros G

S. Liu, S. Agarwal, M. Maheswaran, M. Cemri, Z. Li, Q. Mang, A. Naren, E. Boneh, A. Cheng, M. Z. Pan,et al., “Evox: Meta-evolution for automated discovery,”arXiv preprint arXiv:2602.23413, 2026

work page arXiv 2026

[16] [16]

Sequential monte carlo samplers,

P. Del Moral, A. Doucet, and A. Jasra, “Sequential monte carlo samplers,”Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 68, no. 3, pp. 411–436, 2006

work page 2006

[17] [17]

An invitation to sequential monte carlo samplers,

C. Dai, J. Heng, P. E. Jacob, and N. Whiteley, “An invitation to sequential monte carlo samplers,” Journal of the American Statistical Association, vol. 117, no. 539, pp. 1587–1600, 2022

work page 2022

[18] [18]

A tutorial on thompson sampling,

D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, “A tutorial on thompson sampling,”Foundations and Trends in Machine Learning, vol. 11, no. 1, pp. 1–96, 2018

work page 2018

[19] [19]

K., Krupke, D., Kidger, P., Sajed, T., Stellato, B., Park, J., et al

O. Press, B. Amos, H. Zhao, Y . Wu, S. K. Ainsworth, D. Krupke, P. Kidger, T. Sajed, B. Stellato, J. Park, N. Bosch, E. Meril, A. Steppi, A. Zharmagambetov, F. Zhang, D. Perez-Pineiro, A. Mercurio, N. Zhan, T. Abramovich, K. Lieret, H. Zhang, S. Huang, M. Bethge, and O. Press, “Algotune: Can language models speed up general-purpose numerical programs?,”ar...

work page arXiv 2025

[20] [20]

Llm- srbench: A new benchmark for scientific equation discovery with large language models,

P. Shojaee, N.-H. Nguyen, K. Meidani, A. B. Farimani, K. D. Doan, and C. K. Reddy, “Llm- srbench: A new benchmark for scientific equation discovery with large language models,”arXiv preprint arXiv:2504.10415, 2025

work page arXiv 2025

[21] [21]

autoresearch: Ai agents running research experiments

A. Karpathy, “autoresearch: Ai agents running research experiments.”https://github.com/ karpathy/autoresearch, 2026. GitHub repository

work page 2026

[22] [22]

S. P. Meyn and R. L. Tweedie,Markov chains and stochastic stability. Springer Science & Business Media, 2012

work page 2012

[23] [23]

Finite-sample complexity of sequential monte carlo estimators,

J. Marion, J. Mathews, and S. C. Schmidler, “Finite-sample complexity of sequential monte carlo estimators,”The Annals of Statistics, vol. 51, no. 3, pp. 1357–1375, 2023

work page 2023

[24] [24]

Sequential monte carlo steering of large language models using probabilistic programs,

A. K. Lew, Z.-X. Tan, G. Grand, and V . K. Mansinghka, “Sequential monte carlo steering of large language models using probabilistic programs,”arXiv preprint arXiv:2306.03081, 2023

work page arXiv 2023

[25] [25]

Probabilistic inference in language models via twisted sequential Monte Carlo,

S. Zhao, R. Brekelmans, A. Makhzani, and R. B. Grosse, “Probabilistic inference in language models via twisted sequential Monte Carlo,” inProceedings of the 41st International Conference on Machine Learning, vol. 235 ofProceedings of Machine Learning Research, pp. 60704–60748, PMLR, 2024

work page 2024

[26] [26]

Step-by-step reasoning for math problems via twisted sequential Monte Carlo,

S. Feng, X. Kong, S. Ma, A. Zhang, D. Yin, C. Wang, R. Pang, and Y . Yang, “Step-by-step reasoning for math problems via twisted sequential Monte Carlo,” inInternational Conference on Learning Representations, 2025

work page 2025

[27] [27]

Del Moral,Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications

P. Del Moral,Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Probability and Its Applications, New York: Springer, 2004

work page 2004

[28] [28]

Chopin and O

N. Chopin and O. Papaspiliopoulos,An Introduction to Sequential Monte Carlo. Springer Series in Statistics, Cham: Springer, 2020

work page 2020

[29] [29]

Annealed importance sampling,

R. M. Neal, “Annealed importance sampling,”Statistics and Computing, vol. 11, no. 2, pp. 125– 139, 2001. 12

work page 2001

[30] [30]

Optimization by simulated annealing,

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,”Science, vol. 220, no. 4598, pp. 671–680, 1983

work page 1983

[31] [31]

The cross-entropy method for combinatorial and continuous optimization,

R. Y . Rubinstein, “The cross-entropy method for combinatorial and continuous optimization,” Methodology and Computing in Applied Probability, vol. 1, no. 2, pp. 127–190, 1999

work page 1999

[32] [32]

Improved particle filter for nonlinear problems,

J. Carpenter, P. Clifford, and P. Fearnhead, “Improved particle filter for nonlinear problems,” in IEE Proceedings – Radar, Sonar and Navigation, vol. 146, pp. 2–7, IET, 1999

work page 1999

[33] [33]

Comparison of resampling schemes for particle filtering,

R. Douc, O. Cappé, and E. Moulines, “Comparison of resampling schemes for particle filtering,” Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis (ISPA 2005), pp. 64–69, 2005

work page 2005

[34] [34]

Monte carlo filter and smoother for non-gaussian nonlinear state space models,

G. Kitagawa, “Monte carlo filter and smoother for non-gaussian nonlinear state space models,” Journal of Computational and Graphical Statistics, vol. 5, no. 1, pp. 1–25, 1996. 13 This appendix provides supplementary material for SMCEVOLVE. We first position the method relative to prior work in Appendix A, then give the derivations and proofs supporting th...

work page 1996

[35] [35]

the bridge weights satisfy supx wt(x)≤W and zt−1/zt ≤Z for every t, with γ≜W·Z (this is AS1 of 23)

work page

[36] [36]

close-to-uniform

there exists τ <∞ such that τMt(η⋆, ω⋆)≤τ for every t, at the internal precision/warmness level chosen by the analysis (Theorem 1 of 23 usesω ⋆ = 2andη ⋆ = 1/(8N T)); 3.Γ≜sup x dπ/dµ0(x)<∞. Then for every ∥f∥ ∞ ≤1 and ϵ >0 , there exists a choice of (N, T, K) such that the SMC empirical measure ηN T satisfies P(|ηN T (f)−π(f)| ≤ϵ)≥3/4 , and the total kern...

work page 2024