{"total":15,"items":[{"citing_arxiv_id":"2605.20086","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"What Do Evolutionary Coding Agents Evolve?","primary_cat":"cs.NE","submitted_at":"2026-05-19T16:41:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Evolutionary coding agents achieve most benchmark gains through a small subset of edit types and by cycling previously deleted code lines rather than developing new algorithmic structures.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15334","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From I/O to Code with Discovery Agent","primary_cat":"cs.LG","submitted_at":"2026-05-14T18:57:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DIO-Agent frames IO2Code as LLM-driven evolutionary search over programs with a Transformation Priority Premise to favor simple hypotheses, outperforming baselines on a new IO2CodeBench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15026","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SemaTune: Semantic-Aware Online OS Tuning with Large Language Models","primary_cat":"cs.OS","submitted_at":"2026-05-14T16:25:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SemaTune uses LLM guidance with semantic context to tune up to 41 Linux OS parameters, delivering 72.5% performance gains over defaults and 153.3% over non-LLM baselines on 13 workloads while avoiding degraded states.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"kernel configuration, but as a one-off pre-deployment step. Recent congestion-control optimization work [37] use LLMs mostly offline or outside the control loop. AIOS [ 64] and Herding LLaMaS [43] are broader OS-flavored agentic vi- sions rather than live online autotuners. A more agentic line uses LLMs to generate or evolve policies directly: Al- phaEvolve [66], Duel-Evolve [50], OpenEvolve [ 6], AdaE- volve [14], and SkyDiscover [62] couple LLMs with evaluator- driven evolutionary search; Barbarians at the Gate [18] ar- gues that many systems problems are amenable to this dis- covery style; and Glia [35] and sched-agent / SchedCP [89] 12 SemaTune : Semantic-Aware Online OS Tuning with Large Language Models bring similar reasoning to systems design and scheduler-"},{"citing_arxiv_id":"2605.15221","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Effective Harness Engineering for Algorithm Discovery with Coding Agents","primary_cat":"cs.SE","submitted_at":"2026-05-13T06:33:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Under fixed token budget on Circle Packing, deeper per-candidate reasoning beats generating more shallow candidates, and capable models produce evaluation hacks at higher rates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09018","ref_index":9,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evolutionary Ensemble of Agents","primary_cat":"cs.NE","submitted_at":"2026-05-09T15:56:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08678","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI","primary_cat":"cs.LG","submitted_at":"2026-05-09T04:29:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MLS-Bench shows that current AI agents fall short of reliably inventing generalizable ML methods, with engineering tuning easier than genuine invention.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"A growing set of benchmarks evaluates these emerging capabilities [7, 18, 55, 58, 73]. Self-evolving agents and evaluation.The paradigm of LLMs has evolved from single-turn question answering [8] toward agents that iterate over extended horizons [ 81, 114]. Self-evolving systems iteratively refine solutions through evolutionary search [15, 67, 79, 86], open-ended self-improving loops [5, 44, 48, 72], and test-time training [74, 89, 102, 115, 119]. However, these systems have been demonstrated primarily on specific optimization problems, such as circle packing, contest-style algorithm search, kernel optimization, and activation-function search [ 67, 100, 102, 115]. Such settings are narrow in domain and do not capture whether a discovery is scalable and generalizable."},{"citing_arxiv_id":"2605.08520","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration","primary_cat":"cs.LG","submitted_at":"2026-05-08T22:04:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[1] Nemo rl: A scalable and efficient post-training library.https://github.com/NVIDIA-NeMo/ RL, 2025. GitHub repository. [2] L. A. Agrawal, S. Tan, D. Soylu, N. Ziems, R. Khare, K. Opsahl-Ong, A. Singhvi, H. Shandilya, M. J. Ryan, M. Jiang, et al. Gepa: Reflective prompt evolution can outperform reinforcement learning.arXiv preprint arXiv:2507.19457, 2025. [3] H. Assumpção, D. Ferreira, L. Campos, and F. Murai. Codeevolve: An open source evolutionary coding agent for algorithm discovery and optimization.arXiv preprint arXiv:2510.14150, 2025. [4] J. Fang, Y . Peng, X. Zhang, Y . Wang, X. Yi, G. Zhang, Y . Xu, B. Wu, S. Liu, Z. Li, et al. A comprehensive survey of self-evolving ai agents: A new paradigm bridging foundation models"},{"citing_arxiv_id":"2605.07572","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Open-Ended Task Discovery via Bayesian Optimization","primary_cat":"cs.AI","submitted_at":"2026-05-08T10:43:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Generate-Select-Refine is an open-ended Bayesian optimization method that generates tasks and concentrates evaluations on the best one with only logarithmic regret overhead relative to standard single-task optimization.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"[8] Dhruv Agarwal, Bodhisattwa Prasad Majumder, Reece Adamson, Megha Chakravorty, Satvika Reddy Gavireddy, Aditya Parashar, Harshit Surana, Bhavana Dalvi Mishra, Andrew McCallum, Ashish Sabharwal, and Peter Clark. Autodiscovery: Open-ended scientific discovery via bayesian surprise. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. [9] Henrique Assumpção, Diego Ferreira, Leandro Campos, and Fabricio Murai. CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization.arXiv preprint arXiv:2510.14150, 2025. [10] Raul Astudillo and Peter Frazier. Bayesian optimization of composite functions. InInternational Conference on Machine Learning (ICML), pages 354-363."},{"citing_arxiv_id":"2605.07039","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents","primary_cat":"cs.LG","submitted_at":"2026-05-07T23:38:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing, recommendation, and protein tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Language agents mirror human causal reasoning biases. how can we help them think like scientists? InSecond Conference on Language Modeling, 2025. [3] Parth Asawa, Alan Zhu, Abby O'Neill, Matei Zaharia, Alexandros G Dimakis, and Joseph E Gonzalez. How to train your advisor: Steering black-box llms with advisor models.arXiv preprint arXiv:2510.02453, 2025. [4] Henrique Assumpção, Diego Ferreira, Leandro Campos, and Fabricio Murai. Codeevolve: An open source evolutionary coding agent for algorithm discovery and optimization.arXiv preprint arXiv:2510.14150, 2025. [5] Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, et al."},{"citing_arxiv_id":"2604.25083","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization","primary_cat":"cs.AI","submitted_at":"2026-04-28T00:31:55+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"access pattern and select the corresponding prefetching mecha- nism, while delta-correlation designs such as Berti [30] learn recur- ring per-IP delta sequences to predict future addresses. Lookahead prefetchers such as SPP [23] recursively follow predicted deltas to issue prefetches multiple steps ahead, and reinforcement-learning prefetchers such as Pythia [5] learn which prefetches to issue online, using reward signals derived from prefetch accuracy and timeliness. Prefetching has been studied for decades and remains an active area of research, with each generation of designs targeting a different slice of the access-pattern space. Branch Prediction.Branch predictors guess the direction and target of upcoming branches so the front-end can keep fetching"},{"citing_arxiv_id":"2604.19341","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evaluation-driven Scaling for Scientific Discovery","primary_cat":"cs.LG","submitted_at":"2026-04-21T11:24:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"best and second-best results are highlighted in bold and underlined, respectively. Method Model CP26\" CP32\" AlphaEvolve [ 89] Gemini-2.0 Pro+ Flash 2.635862 2.937944 AlphaEvolve V2 [ 38] Gemini-2.0 Pro+ Flash 2.635983 2.939572 ShinkaEvolve [ 63] Mixed 2.635982 - ThetaEvolve [ 152] Distill-Qwen3-8B 2.635983 - TTT-Discover [ 166] Qwen3-8B 2.635983 2.939572 CodeEvolve [ 7] Qwen3-Coder-30B 2.635980 2.939560 OpenEvolve [ 6] Qwen3-Coder-30B - 2.931560 SIMPLETES gpt-oss-20b 2.635983 2.939572 SIMPLETES gpt-oss-120b 2.635983 2.939572 homogeneous packing. 3.5.3 Hadamard Maximum Determinant Overview. The Hadamard Maximum Determinant, Order 29 task belongs to the classical maximal- determinant problem forf1, 1g-matrices, a central benchmark in extremal matrix theory and D-optimal"},{"citing_arxiv_id":"2604.16625","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation","primary_cat":"cs.CL","submitted_at":"2026-04-17T18:25:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AdaExplore improves correctness and speed of Triton kernel generation by converting recurring failures into a memory of rules and organizing search as a tree that mixes local refinements with larger regenerations, yielding 3.12x and 1.72x speedups on KernelBench Level-2 and Level-3 within 100 steps.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"examples above (3-6 operations are recommended) * Maintains proper PyTorch patterns and code quality Code structure requirements: * AModelclass inheriting fromnn.Module * An__init__method if needed for parameters * Aforwardmethod with the computation * Aget_inputs()function * Aget_init_inputs()function * Configuration variables Example Files (for structure reference) [Example 1] [Example 2] [Example 3] Available PyTorch Layers/Operations to Incorporate [Operation 1] [Operation 2] [Operation 3] Generate anew, complexexample that: * Uses the structure from the examples above * Incorporates one or more of the provided PyTorch layers * Creates a more complex computation pattern (e.g., combining multiple layers, using different tensor shapes, etc."},{"citing_arxiv_id":"2604.18607","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution","primary_cat":"cs.NE","submitted_at":"2026-04-12T12:42:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TurboEvolve improves LLM program evolution by running parallel islands with LLM-generated diverse candidates that carry self-assigned weights, an adaptive scheduler, and clustered seed injection to reach stronger solutions at lower evaluation budgets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.07144","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics","primary_cat":"cs.DC","submitted_at":"2026-04-08T14:37:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"LLMs with the iterative refinement of evolutionary algorithms to automatically discover high-performing programs. LLM-driven program synthesis.The LLM-driven program synthesis includes four primary stages: ( i)Seed initialization and program representationdetermines the starting population and the level of abstraction at which evolution operates; e.g., AlphaEvolve [ 21] and CodeEvolve [ 30] support codebase-scale evolution spanning multiple functions and programming languages. (ii)LLM-driven variationreplaces the hand-crafted mutation and crossover operators of classical genetic programming [31] with the semantic code-transformation capabilities of LLMs; AlphaEvolve [21] employs an LLM ensemble with structured diff-based modifications for"},{"citing_arxiv_id":"2604.03473","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evolutionary Search for Automated Design of Uncertainty Quantification Methods","primary_cat":"cs.CL","submitted_at":"2026-04-03T21:41:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLM-driven evolutionary search discovers unsupervised UQ methods as Python programs that improve ROC-AUC by up to 6.7% over manual baselines on atomic claim verification across 9 datasets with OOD generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}