Fine-tuning Large Language Model for Automated Algorithm Design

Fei Liu; Qingfu Zhang; Rui Zhang; Xi Lin; Zhichao Lu

arxiv: 2507.10614 · v2 · pith:HEYM2DWNnew · submitted 2025-07-13 · 💻 cs.LG · cs.AI

Fine-tuning Large Language Model for Automated Algorithm Design

Fei Liu , Rui Zhang , Xi Lin , Zhichao Lu , Qingfu Zhang This is my paper

Pith reviewed 2026-05-21 23:32 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords large language modelsfine-tuningautomated algorithm designdirect preference optimizationalgorithm generationgeneralizationadmissible set problem

0 comments

The pith

Fine-tuning lets smaller LLMs outperform off-the-shelf versions and match larger ones on algorithm design tasks

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates whether LLMs need task-specific adaptation for automated algorithm design rather than relying on general coding abilities. It introduces a diversity-aware rank-based sampling method to select balanced training data and applies direct preference optimization to better align outputs with desired algorithm performance. On the admissible set problem, the fine-tuned 1B-parameter Llama model significantly beats its original counterpart and matches the performance of an 8B-parameter model. The same fine-tuned models also improve results on related algorithm design tasks that use different settings. These outcomes indicate that targeted fine-tuning can make LLMs more capable and efficient tools for generating algorithms.

Core claim

Fine-tuned LLMs using Diversity-Aware Rank-based sampling and direct preference optimization can significantly outperform their off-the-shelf counterparts, with the smaller Llama-3.2-1B-Instruct matching the larger Llama-3.1-8B-Instruct on the admissible set problem, and demonstrate promising generalization to related tasks with varying settings.

What carries the argument

Diversity-Aware Rank-based (DAR) sampling strategy to balance training data diversity and quality, paired with direct preference optimization to align LLM outputs with task objectives.

If this is right

Smaller fine-tuned models become practical substitutes for larger general LLMs inside search routines that generate candidate algorithms.
Task-specific adaptation improves the quality of iteratively refined algorithm proposals without increasing model size.
Observed generalization to related tasks with changed settings supports using one fine-tuned model across multiple algorithm design problems.
Embedding these adapted LLMs in automated design loops can reduce the number of iterations needed to reach high-performing algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the fine-tuning process to include a wider variety of algorithm design domains could create models that handle entirely new optimization problems with little extra training.
Pairing the fine-tuned LLMs with search methods other than the ones tested here might produce stronger hybrid systems for discovering algorithms.
Measuring how well the fine-tuned models hold up when the surrounding search routine or evaluation metric is altered would test the robustness of the reported gains.

Load-bearing premise

The three algorithm design tasks and the chosen evaluation settings are sufficiently representative for claiming broader utility and generalization of the fine-tuned models.

What would settle it

A test on a new, unrelated algorithm design task where the fine-tuned models perform no better than or worse than their off-the-shelf versions would show that the gains do not generalize.

Figures

Figures reproduced from arXiv: 2507.10614 by Fei Liu, Qingfu Zhang, Rui Zhang, Xi Lin, Zhichao Lu.

**Figure 1.** Figure 1: Upper section: LLM-based automated algorithm design methods iteratively refine and optimize algorithms. Through this, algorithms and their fitness are preserved in the database D. The knowledge and experiences incorporated in the database subsequently improve the capabilities of the LLM. Lower section: (a) Traditional sampling relies on continuous fitness values and often suffers from unstable preference g… view at source ↗

**Figure 2.** Figure 2: Comparison on varying preference pair sampling settings. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Violin plot comparison on the performance of fine-tuned LLMs and base model. Each [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Convergence curve comparison on the performance of top-5 algorithms generated by [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Convergence curve comparison on the performance of top-5 algorithms generated by [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Convergence curve comparison on the performance of top-5 algorithms generated by the [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

The integration of large language models (LLMs) into automated algorithm design has shown promising potential. A prevalent approach embeds LLMs within search routines to iteratively generate and refine candidate algorithms. However, most existing methods rely on off-the-shelf LLMs trained for general coding tasks, leaving a key question open: Do we need LLMs specifically tailored for algorithm design? If so, how can such LLMs be effectively obtained and how well can they generalize across different algorithm design tasks? In this paper, we take a preliminary step toward answering these questions by exploring fine-tuning of LLMs for algorithm design. We introduce a Diversity-Aware Rank-based (DAR) sampling strategy to balance training data diversity and quality, then we leverage direct preference optimization to efficiently align LLM outputs with task objectives. Our experiments are primarily conducted on Llama-3.2-1B-Instruct and Llama-3.1-8BInstruct across three distinct algorithm design tasks, with openPangu-Embedded models additionally included as auxiliary comparisons on the admissible set problem. Results suggest that fine-tuned LLMs can significantly outperform their off-the-shelf counterparts with the smaller Llama-3.2-1B-Instruct and match the larger Llama-3.1-8B-Instruct on the admissible set problem. Moreover, we observe promising generalization: LLMs fine-tuned on specific algorithm design tasks also improve performance on related tasks with varying settings. These findings highlight the value of task-specific adaptation for LLMs in algorithm design and open new avenues for future research. Our code is publicly available at https://github.com/RayZhhh/dpo-aad.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript explores fine-tuning LLMs for automated algorithm design. It introduces a Diversity-Aware Rank-based (DAR) sampling strategy to curate training data and applies direct preference optimization (DPO) to align model outputs with task objectives. Experiments focus on Llama-3.2-1B-Instruct and Llama-3.1-8B-Instruct across three algorithm design tasks (with auxiliary comparisons using openPangu-Embedded models on the admissible set problem). The central claims are that fine-tuned smaller models significantly outperform their off-the-shelf versions and can match larger models on the admissible set problem, while also exhibiting promising cross-task generalization to related settings with varying parameters.

Significance. If the empirical outcomes prove robust, the work provides evidence that task-specific fine-tuning can meaningfully improve LLM utility in algorithm design beyond general-purpose coding models. The combination of DAR sampling for data balance and DPO for efficient alignment is a practical contribution. Public code release at the cited GitHub repository supports reproducibility and is a clear strength.

major comments (2)

[§4] §4 (Experimental results): The manuscript reports positive outcomes for fine-tuned models outperforming baselines on the admissible set problem but provides insufficient detail on exact task definitions, baseline implementations, hyperparameter choices, and statistical significance testing. This information is required to confirm that the reported gains (e.g., smaller model matching larger model) are not attributable to post-hoc selection or uncontrolled variance.
[§5] §5 (Generalization experiments): The claim that fine-tuning on specific tasks yields improvements on related tasks with varying settings is central to the broader utility argument. However, without explicit characterization of how the three tasks differ in search structure or objective, it remains unclear whether observed transfer reflects robust generalization or narrow overlap in underlying optimization patterns.

minor comments (2)

[Abstract] Abstract: 'Llama-3.1-8BInstruct' is missing a hyphen and should read 'Llama-3.1-8B-Instruct' for consistency with other model names.
[Figures and Tables] Throughout: Some figure captions and table headers could more explicitly state the evaluation metric (e.g., success rate or objective value) and number of runs to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance and reproducibility. We address each major comment below and revise the manuscript to incorporate the suggested improvements for greater clarity and rigor.

read point-by-point responses

Referee: [§4] §4 (Experimental results): The manuscript reports positive outcomes for fine-tuned models outperforming baselines on the admissible set problem but provides insufficient detail on exact task definitions, baseline implementations, hyperparameter choices, and statistical significance testing. This information is required to confirm that the reported gains (e.g., smaller model matching larger model) are not attributable to post-hoc selection or uncontrolled variance.

Authors: We agree that the current presentation of experimental details in §4 is insufficient for full reproducibility and verification of the claims. In the revised manuscript, we will expand this section to include: precise definitions and formulations of all three algorithm design tasks (including objectives, constraints, and input/output formats); complete descriptions of baseline implementations (e.g., prompting strategies for off-the-shelf LLMs and any other compared methods); the full set of hyperparameters used for DAR sampling, DPO training, and inference; and results from statistical significance tests (such as paired t-tests or bootstrap confidence intervals) on the performance differences. These additions will directly address concerns about post-hoc selection or variance and strengthen the evidence that the smaller fine-tuned model can match the larger one on the admissible set problem. revision: yes
Referee: [§5] §5 (Generalization experiments): The claim that fine-tuning on specific tasks yields improvements on related tasks with varying settings is central to the broader utility argument. However, without explicit characterization of how the three tasks differ in search structure or objective, it remains unclear whether observed transfer reflects robust generalization or narrow overlap in underlying optimization patterns.

Authors: We acknowledge that an explicit comparison of task differences would better support the generalization claims. In the revised manuscript, we will add a new subsection or table in §5 that characterizes the three tasks along dimensions such as search space structure (e.g., discrete combinatorial vs. parameterized continuous elements), objective functions, and the specific parameter variations used in the transfer experiments. This will help clarify the degree of overlap versus robust transfer. We maintain that the empirical improvements on related tasks with varying settings provide evidence for the value of task-specific adaptation, but the added characterization will allow readers to better evaluate the scope of generalization. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical fine-tuning study with independent experimental validation

full rationale

The paper reports an empirical fine-tuning experiment on LLMs for three algorithm design tasks using DAR sampling and DPO. No equations, predictions, or first-principles derivations are present that reduce outputs to inputs by construction. Claims rest on direct performance measurements (e.g., fine-tuned Llama-3.2-1B matching larger models on admissible set) and observed cross-task generalization, with public code enabling external reproduction. No self-citation load-bearing, uniqueness theorems, or ansatz smuggling appear in the central arguments; the study is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view limits visibility into any hidden parameters; DAR sampling likely introduces tunable diversity and rank thresholds, but none are explicitly quantified here. No new entities or unstated axioms are described.

pith-pipeline@v0.9.0 · 5832 in / 1106 out tokens · 48554 ms · 2026-05-21T23:32:37.488979+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a Diversity-Aware Rank-based (DAR) sampling strategy to balance training data diversity and quality, then we leverage direct preference optimization to efficiently align LLM outputs with task objectives.
IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Results suggest that fine-tuned LLMs can significantly outperform their off-the-shelf counterparts ... on the admissible set problem.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
physics.flu-dyn 2026-05 conditional novelty 7.0

AI CFD Scientist autonomously finds a Spalart-Allmaras turbulence correction that lowers wall-friction error by 7.89% versus DNS on the periodic hill case using vision-language physics verification.
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
physics.flu-dyn 2026-05 conditional novelty 7.0

AI CFD Scientist autonomously discovers a Spalart-Allmaras runtime correction reducing lower-wall Cf RMSE by 7.89% on the periodic hill at Reh=5600 while using a vision-language gate to detect 14 of 16 silent failures...
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents
physics.flu-dyn 2026-05 unverdicted novelty 6.0

An integrated AI agent framework for CFD uses vision-based physics gates to autonomously discover a Spalart-Allmaras runtime correction that cuts lower-wall skin-friction error by 7.89% versus DNS on the periodic hill...
Rethinking Efficiency in Neural Combinatorial Optimization: Batched Preference Optimization with Mamba
cs.LG 2026-02 unverdicted novelty 6.0

ECO uses supervised warm-up plus iterative batched DPO on a Mamba backbone to reach top neural performance on TSP and CVRP while lowering memory growth and raising throughput.
Glia: A Human-Inspired AI for Automated Systems Design and Optimization
cs.AI 2025-10 unverdicted novelty 6.0

Glia deploys a multi-agent LLM workflow with reasoning, experimentation, and analysis agents to generate interpretable algorithms for request routing, scheduling, and auto-scaling in distributed GPU clusters, reaching...
RL4RLA: Teaching ML to Discover Randomized Linear Algebra Algorithms Through Curriculum Design and Graph-Based Search
cs.LG 2026-05 unverdicted novelty 5.0

RL4RLA is a reinforcement learning framework that discovers interpretable symbolic randomized linear algebra algorithms by combining curriculum learning and graph-based search to overcome sparse rewards and large sear...

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 4 Pith papers · 6 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Large language models for mathematical reasoning: Progresses and challenges

Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, and Wenpeng Yin. Large language models for mathematical reasoning: Progresses and challenges. arXiv preprint arXiv:2402.00157, 2024

work page arXiv 2024
[3]

Machine learning for combinatorial optimization: a methodological tour d’horizon

Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research, 290 0 (2): 0 405--421, 2021

work page 2021
[4]

RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Federico Berto, Chuanbo Hua, Junyoung Park, Laurin Luttmann, Yining Ma, Fanchen Bu, Jiarui Wang, Haoran Ye, Minsu Kim, Sanghyeok Choi, Nayeli Gast Zepeda, Andr\'e Hottung, Jianan Zhou, Jieyi Bi, Yu Hu, Fei Liu, Hyeonah Kim, Jiwoo Son, Haeyeon Kim, Davide Angioni, Wouter Kool, Zhiguang Cao, Jie Zhang, Kijung Shin, Cathy Wu, Sungsoo Ahn, Guojie Song, Changh...

work page 2025
[5]

Learning to optimize: A primer and a benchmark

Tianlong Chen, Xiaohan Chen, Wuyang Chen, Zhangyang Wang, Howard Heaton, Jialin Liu, and Wotao Yin. Learning to optimize: A primer and a benchmark. The Journal of Machine Learning Research, 23 0 (1): 0 8562--8620, 2022

work page 2022
[6]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, et al. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Performance assessment of the metaheuristic optimization algorithms: an exhaustive review

A Hanif Halim, Idris Ismail, and Swagatam Das. Performance assessment of the metaheuristic optimization algorithms: an exhaustive review. Artificial Intelligence Review, 54 0 (3): 0 2323--2409, 2021

work page 2021
[9]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1 0 (2): 0 3, 2022

work page 2022
[10]

Orlm: A customizable framework in training large models for automated optimization modeling

Chenyu Huang, Zhengyang Tang, Shixi Hu, Ruoqing Jiang, Xin Zheng, Dongdong Ge, Benyou Wang, and Zizhuo Wang. Orlm: A customizable framework in training large models for automated optimization modeling. Operations Research, 2025 a

work page 2025
[11]

Calm: Co-evolution of algorithms and language model for automatic heuristic design

Ziyao Huang, Weiwei Wu, Kui Wu, Jianping Wang, and Wei-Bin Lee. Calm: Co-evolution of algorithms and language model for automatic heuristic design. arXiv preprint arXiv:2505.12285, 2025 b

work page arXiv 2025
[12]

Feature construction for meta-heuristic algorithm recommendation of capacitated vehicle routing problems

Hao Jiang, Yuhang Wang, Ye Tian, Xingyi Zhang, and Jianhua Xiao. Feature construction for meta-heuristic algorithm recommendation of capacitated vehicle routing problems. ACM Transactions on Evolutionary Learning and Optimization, 1 0 (1): 0 1--28, 2021

work page 2021
[13]

A Survey on Large Language Models for Code Generation

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation. arXiv preprint arXiv:2406.00515, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Attention, Learn to Solve Routing Problems!

Wouter Kool, Herke Van Hoof, and Max Welling. Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023

work page 2023
[16]

Algorithm evolution using large language model

Fei Liu, Xialiang Tong, Mingxuan Yuan, and Qingfu Zhang. Algorithm evolution using large language model. arXiv preprint arXiv:2311.15249, 2023

work page arXiv 2023
[17]

Multi-task learning for routing problem with cross-problem zero-shot generalization

Fei Liu, Xi Lin, Zhenkun Wang, Qingfu Zhang, Tong Xialiang, and Mingxuan Yuan. Multi-task learning for routing problem with cross-problem zero-shot generalization. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.\ 1898--1908, 2024 a

work page 1908
[18]

Evolution of heuristics: Towards efficient automatic algorithm design using large language model

Fei Liu, Tong Xialiang, Mingxuan Yuan, Xi Lin, Fu Luo, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Evolution of heuristics: Towards efficient automatic algorithm design using large language model. In International Conference on Machine Learning, pp.\ 32201--32223. PMLR, 2024 b

work page 2024
[19]

A systematic survey on large language models for algorithm design

Fei Liu, Yiming Yao, Ping Guo, Zhiyuan Yang, Zhe Zhao, Xi Lin, Xialiang Tong, Mingxuan Yuan, Zhichao Lu, Zhenkun Wang, et al. A systematic survey on large language models for algorithm design. arXiv preprint arXiv:2410.14716, 2024 c

work page arXiv 2024
[20]

Llm4ad: A platform for algorithm design with large language model

Fei Liu, Rui Zhang, Zhuoliang Xie, Rui Sun, Kai Li, Xi Lin, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Llm4ad: A platform for algorithm design with large language model. arXiv preprint arXiv:2412.17287, 2024 d

work page arXiv 2024
[21]

Toward automated algorithm design: A survey and practical guide to meta-black-box-optimization

Zeyuan Ma, Hongshu Guo, Yue-Jiao Gong, Jun Zhang, and Kay Chen Tan. Toward automated algorithm design: A survey and practical guide to meta-black-box-optimization. IEEE Transactions on Evolutionary Computation, 2025

work page 2025
[22]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alexander Novikov, Ng \^a n V \ u , Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery. arXiv preprint arXiv:2506.13131, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Training language models to follow instructions with human feedback

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35: 0 27730--27744, 2022

work page 2022
[24]

Direct preference optimization: Your language model is secretly a reward model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36: 0 53728--53741, 2023

work page 2023
[25]

Mathematical discoveries from program search with large language models

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 625 0 (7995): 0 468--475, 2024

work page 2024
[26]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

Algorithm discovery with LLM s: Evolutionary search meets reinforcement learning

Anja S urina, Amin Mansouri, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, and Caglar Gulcehre. Algorithm discovery with LLM s: Evolutionary search meets reinforcement learning. In ICLR Workshop Scaling Self-Improving Foundation Models without Human Supervision, 2025. URL https://openreview.net/forum?id=1kAwyBpoO1

work page 2025
[28]

Learn to optimize-a brief overview

Ke Tang and Xin Yao. Learn to optimize-a brief overview. National Science Review, pp.\ nwae132, 2024

work page 2024
[29]

Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics

Niki van Stein and Thomas B \"a ck. Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics. IEEE Transactions on Evolutionary Computation, 2024

work page 2024
[30]

Pointer networks

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. Advances in neural information processing systems, 28, 2015

work page 2015
[31]

Trl: Transformer reinforcement learning

Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. Trl: Transformer reinforcement learning. https://github.com/huggingface/trl, 2020

work page 2020
[32]

Hydra: Automatically configuring algorithms for portfolio-based selection

Lin Xu, Holger Hoos, and Kevin Leyton-Brown. Hydra: Automatically configuring algorithms for portfolio-based selection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, pp.\ 210--216, 2010

work page 2010
[33]

Multi-objective evolution of heuristic using large language model

Shunyu Yao, Fei Liu, Xi Lin, Zhichao Lu, Zhenkun Wang, and Qingfu Zhang. Multi-objective evolution of heuristic using large language model. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp.\ 27144--27152, 2025

work page 2025
[34]

Evolve cost-aware acquisition functions using large language models

Yiming Yao, Fei Liu, Ji Cheng, and Qingfu Zhang. Evolve cost-aware acquisition functions using large language models. In International Conference on Parallel Problem Solving from Nature, pp.\ 374--390. Springer, 2024

work page 2024
[35]

Reevo: Large language models as hyper-heuristics with reflective evolution

Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. Reevo: Large language models as hyper-heuristics with reflective evolution. arXiv preprint arXiv:2402.01145, 2024

work page arXiv 2024
[36]

Understanding the importance of evolutionary search in automated heuristic design with large language models

Rui Zhang, Fei Liu, Xi Lin, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Understanding the importance of evolutionary search in automated heuristic design with large language models. In International Conference on Parallel Problem Solving from Nature, pp.\ 185--202. Springer, 2024

work page 2024
[37]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[38]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[39]

h P h:w 1. Q*p t ]D Ѥ Mf蚋f2 uLS3u<5B]zoe W _ 4 BH 'SP

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page arXiv 2018

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

Large language models for mathematical reasoning: Progresses and challenges

Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, and Wenpeng Yin. Large language models for mathematical reasoning: Progresses and challenges. arXiv preprint arXiv:2402.00157, 2024

work page arXiv 2024

[3] [3]

Machine learning for combinatorial optimization: a methodological tour d’horizon

Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research, 290 0 (2): 0 405--421, 2021

work page 2021

[4] [4]

RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Federico Berto, Chuanbo Hua, Junyoung Park, Laurin Luttmann, Yining Ma, Fanchen Bu, Jiarui Wang, Haoran Ye, Minsu Kim, Sanghyeok Choi, Nayeli Gast Zepeda, Andr\'e Hottung, Jianan Zhou, Jieyi Bi, Yu Hu, Fei Liu, Hyeonah Kim, Jiwoo Son, Haeyeon Kim, Davide Angioni, Wouter Kool, Zhiguang Cao, Jie Zhang, Kijung Shin, Cathy Wu, Sungsoo Ahn, Guojie Song, Changh...

work page 2025

[5] [5]

Learning to optimize: A primer and a benchmark

Tianlong Chen, Xiaohan Chen, Wuyang Chen, Zhangyang Wang, Howard Heaton, Jialin Liu, and Wotao Yin. Learning to optimize: A primer and a benchmark. The Journal of Machine Learning Research, 23 0 (1): 0 8562--8620, 2022

work page 2022

[6] [6]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, et al. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Performance assessment of the metaheuristic optimization algorithms: an exhaustive review

A Hanif Halim, Idris Ismail, and Swagatam Das. Performance assessment of the metaheuristic optimization algorithms: an exhaustive review. Artificial Intelligence Review, 54 0 (3): 0 2323--2409, 2021

work page 2021

[9] [9]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1 0 (2): 0 3, 2022

work page 2022

[10] [10]

Orlm: A customizable framework in training large models for automated optimization modeling

Chenyu Huang, Zhengyang Tang, Shixi Hu, Ruoqing Jiang, Xin Zheng, Dongdong Ge, Benyou Wang, and Zizhuo Wang. Orlm: A customizable framework in training large models for automated optimization modeling. Operations Research, 2025 a

work page 2025

[11] [11]

Calm: Co-evolution of algorithms and language model for automatic heuristic design

Ziyao Huang, Weiwei Wu, Kui Wu, Jianping Wang, and Wei-Bin Lee. Calm: Co-evolution of algorithms and language model for automatic heuristic design. arXiv preprint arXiv:2505.12285, 2025 b

work page arXiv 2025

[12] [12]

Feature construction for meta-heuristic algorithm recommendation of capacitated vehicle routing problems

Hao Jiang, Yuhang Wang, Ye Tian, Xingyi Zhang, and Jianhua Xiao. Feature construction for meta-heuristic algorithm recommendation of capacitated vehicle routing problems. ACM Transactions on Evolutionary Learning and Optimization, 1 0 (1): 0 1--28, 2021

work page 2021

[13] [13]

A Survey on Large Language Models for Code Generation

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation. arXiv preprint arXiv:2406.00515, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Attention, Learn to Solve Routing Problems!

Wouter Kool, Herke Van Hoof, and Max Welling. Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [15]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023

work page 2023

[16] [16]

Algorithm evolution using large language model

Fei Liu, Xialiang Tong, Mingxuan Yuan, and Qingfu Zhang. Algorithm evolution using large language model. arXiv preprint arXiv:2311.15249, 2023

work page arXiv 2023

[17] [17]

Multi-task learning for routing problem with cross-problem zero-shot generalization

Fei Liu, Xi Lin, Zhenkun Wang, Qingfu Zhang, Tong Xialiang, and Mingxuan Yuan. Multi-task learning for routing problem with cross-problem zero-shot generalization. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.\ 1898--1908, 2024 a

work page 1908

[18] [18]

Evolution of heuristics: Towards efficient automatic algorithm design using large language model

Fei Liu, Tong Xialiang, Mingxuan Yuan, Xi Lin, Fu Luo, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Evolution of heuristics: Towards efficient automatic algorithm design using large language model. In International Conference on Machine Learning, pp.\ 32201--32223. PMLR, 2024 b

work page 2024

[19] [19]

A systematic survey on large language models for algorithm design

Fei Liu, Yiming Yao, Ping Guo, Zhiyuan Yang, Zhe Zhao, Xi Lin, Xialiang Tong, Mingxuan Yuan, Zhichao Lu, Zhenkun Wang, et al. A systematic survey on large language models for algorithm design. arXiv preprint arXiv:2410.14716, 2024 c

work page arXiv 2024

[20] [20]

Llm4ad: A platform for algorithm design with large language model

Fei Liu, Rui Zhang, Zhuoliang Xie, Rui Sun, Kai Li, Xi Lin, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Llm4ad: A platform for algorithm design with large language model. arXiv preprint arXiv:2412.17287, 2024 d

work page arXiv 2024

[21] [21]

Toward automated algorithm design: A survey and practical guide to meta-black-box-optimization

Zeyuan Ma, Hongshu Guo, Yue-Jiao Gong, Jun Zhang, and Kay Chen Tan. Toward automated algorithm design: A survey and practical guide to meta-black-box-optimization. IEEE Transactions on Evolutionary Computation, 2025

work page 2025

[22] [22]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alexander Novikov, Ng \^a n V \ u , Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery. arXiv preprint arXiv:2506.13131, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

Training language models to follow instructions with human feedback

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35: 0 27730--27744, 2022

work page 2022

[24] [24]

Direct preference optimization: Your language model is secretly a reward model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36: 0 53728--53741, 2023

work page 2023

[25] [25]

Mathematical discoveries from program search with large language models

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 625 0 (7995): 0 468--475, 2024

work page 2024

[26] [26]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

Algorithm discovery with LLM s: Evolutionary search meets reinforcement learning

Anja S urina, Amin Mansouri, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, and Caglar Gulcehre. Algorithm discovery with LLM s: Evolutionary search meets reinforcement learning. In ICLR Workshop Scaling Self-Improving Foundation Models without Human Supervision, 2025. URL https://openreview.net/forum?id=1kAwyBpoO1

work page 2025

[28] [28]

Learn to optimize-a brief overview

Ke Tang and Xin Yao. Learn to optimize-a brief overview. National Science Review, pp.\ nwae132, 2024

work page 2024

[29] [29]

Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics

Niki van Stein and Thomas B \"a ck. Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics. IEEE Transactions on Evolutionary Computation, 2024

work page 2024

[30] [30]

Pointer networks

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. Advances in neural information processing systems, 28, 2015

work page 2015

[31] [31]

Trl: Transformer reinforcement learning

Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. Trl: Transformer reinforcement learning. https://github.com/huggingface/trl, 2020

work page 2020

[32] [32]

Hydra: Automatically configuring algorithms for portfolio-based selection

Lin Xu, Holger Hoos, and Kevin Leyton-Brown. Hydra: Automatically configuring algorithms for portfolio-based selection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, pp.\ 210--216, 2010

work page 2010

[33] [33]

Multi-objective evolution of heuristic using large language model

Shunyu Yao, Fei Liu, Xi Lin, Zhichao Lu, Zhenkun Wang, and Qingfu Zhang. Multi-objective evolution of heuristic using large language model. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp.\ 27144--27152, 2025

work page 2025

[34] [34]

Evolve cost-aware acquisition functions using large language models

Yiming Yao, Fei Liu, Ji Cheng, and Qingfu Zhang. Evolve cost-aware acquisition functions using large language models. In International Conference on Parallel Problem Solving from Nature, pp.\ 374--390. Springer, 2024

work page 2024

[35] [35]

Reevo: Large language models as hyper-heuristics with reflective evolution

Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. Reevo: Large language models as hyper-heuristics with reflective evolution. arXiv preprint arXiv:2402.01145, 2024

work page arXiv 2024

[36] [36]

Understanding the importance of evolutionary search in automated heuristic design with large language models

Rui Zhang, Fei Liu, Xi Lin, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Understanding the importance of evolutionary search in automated heuristic design with large language models. In International Conference on Parallel Problem Solving from Nature, pp.\ 185--202. Springer, 2024

work page 2024

[37] [37]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[38] [38]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[39] [39]

h P h:w 1. Q*p t ]D Ѥ Mf蚋f2 uLS3u<5B]zoe W _ 4 BH 'SP

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page arXiv 2018