pith. machine review for the scientific record. sign in

arxiv: 2604.25083 · v1 · submitted 2026-04-28 · 💻 cs.AI · cs.AR

Recognition: unknown

Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:38 UTC · model grok-4.3

classification 💻 cs.AI cs.AR
keywords Agentic AIComputer ArchitectureLLM Code EvolutionCache ReplacementData PrefetchingBranch PredictionMicroarchitecture OptimizationDesign Space Exploration
0
0 comments X

The pith

An LLM agent evolves microarchitecture policies that match or exceed hand-crafted state-of-the-art designs across cache replacement, prefetching, and branch prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Agentic Architect, a framework in which a human architect defines an optimization target, a seed design, a scoring function, and a benchmark split, after which an LLM proposes successive code edits to the microarchitectural policy and receives cycle-accurate simulation results as feedback. The approach is tested on three established domains: cache replacement, data prefetching, and branch prediction. In each domain the evolved designs reach or surpass the performance of current expert policies, with the strongest cache replacement design delivering a 1.062 times geomean IPC improvement over LRU and a small edge over the prior best. The work shows that the value lies less in inventing entirely new mechanisms and more in discovering effective combinations of known techniques under automated search.

Core claim

Agentic Architect is an end-to-end framework that lets an LLM explore and refine microarchitectural implementations by editing code and receiving simulation feedback within human-specified constraints. On cache replacement the best evolved design achieves 1.062 times geomean IPC over LRU and 0.6 percent over Mockingjay. The branch predictor reaches 1.100 times over Bimodal and 1.5 percent over its Hashed Perceptron seed. The prefetcher reaches 1.76 times over no prefetching, 17 percent over its VA/AMPM Lite seed, and 21 percent over SMS. Components in the evolved designs frequently correspond to known techniques, but the performance gain comes from how those components are coordinated. Seed,

What carries the argument

LLM-driven iterative code evolution guided by cycle-accurate simulation scores and human-specified objectives, constraints, and benchmark splits.

If this is right

  • The architect's role shifts from manually writing policies to specifying objectives, seeds, scoring functions, and evaluation criteria.
  • Search performance is bounded by seed quality: evolution can refine and extend an existing mechanism but cannot compensate for a weak foundation.
  • Objectives, constraints, and prompt guidance directly affect both reliability and how well designs generalize beyond the chosen benchmark split.
  • Novelty in successful designs appears mainly in the coordination of known techniques rather than in the invention of new primitive mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same loop of LLM code edits plus simulator feedback could be applied to other hardware design problems such as memory controllers or on-chip networks.
  • Replacing simulation feedback with measurements from real silicon would test whether the evolved designs remain effective under manufacturing variation and workload changes.
  • Because the framework is open-source, community extensions to new simulators and additional domains become straightforward to explore.

Load-bearing premise

An LLM can reliably discover and coordinate useful microarchitectural mechanisms from code-level edits when guided only by simulation feedback and high-level human goals.

What would settle it

Running the framework on a fresh benchmark split or simulator where every evolved design underperforms its human-provided seed by a consistent margin would show that the search cannot generalize beyond the supplied training data.

Figures

Figures reproduced from arXiv: 2604.25083 by Alexander Blasberg, Dimitrios Skarlatos, Vasilis Kypriotis.

Figure 1
Figure 1. Figure 1: Overview of the Agentic Architect framework and the evolution loop. Human icons denote inputs provided by the view at source ↗
Figure 2
Figure 2. Figure 2: Abbreviated system prompts for the full and mini view at source ↗
Figure 3
Figure 3. Figure 3: Geomean IPC speedup across all three domains. Left: view at source ↗
Figure 4
Figure 4. Figure 4: Per-trace IPC speedup over LRU for cache replace view at source ↗
Figure 8
Figure 8. Figure 8: Framework comparison across all three domains. view at source ↗
Figure 7
Figure 7. Figure 7: MPKI improvement vs. IPC improvement across view at source ↗
Figure 11
Figure 11. Figure 11: Training vs. held-out trace geomean IPC speedup view at source ↗
Figure 12
Figure 12. Figure 12: Per-trace IPC speedup over no prefetch for the view at source ↗
Figure 13
Figure 13. Figure 13: Storage–performance Pareto frontier across eval view at source ↗
Figure 14
Figure 14. Figure 14: Architecture growth from seed to evolved design view at source ↗
read the original abstract

Rapid advances in Large Language Models (LLMs) create new opportunities by enabling efficient exploration of broad, complex design spaces. This is particularly valuable in computer architecture, where performance depends on microarchitectural designs and policies drawn from vast combinatorial spaces. We introduce Agentic Architect, an agentic AI framework for computer architecture design exploration and optimization that combines LLM-driven code evolution with cycle-accurate simulation. The human architect specifies the optimization target, seed design, scoring function, simulator interface, and benchmark split, while the LLM explores implementations within these constraints. Across cache replacement, data prefetching, and branch prediction, Agentic Architect matches or exceeds state-of-the-art designs. Our best evolved cache replacement design achieves a 1.062x geomean IPC speedup over LRU, 0.6% over Mockingjay (1.056x). Our evolved branch predictor achieves a 1.100x geomean IPC speedup over Bimodal, 1.5% over its Hashed Perceptron seed (1.085x). Finally, our evolved prefetcher achieves a 1.76x geomean IPC speedup over no prefetching, 17% over its VA/AMPM Lite seed (1.59x) and 21% over SMS (1.55x). Our analysis surfaces several findings about agentic AI-driven microarchitecture design. Across evolved designs, components often correspond to known techniques; the novelty lies in how they are coordinated. The architect's role is shifting, but the human remains central. Seed quality bounds what search can achieve: evolution can refine and extend an existing mechanism, but cannot compensate for a weak foundation. Likewise, objectives, constraints, and prompt guidance affect reliability and generalization. Overall, Agentic Architect is the first end-to-end open-source framework for agentic AI architecture exploration and optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Agentic Architect, an agentic AI framework that uses LLMs to evolve code implementations for microarchitectural components (cache replacement, data prefetching, branch prediction) via iterative edits guided by cycle-accurate simulation feedback on human-specified benchmarks, objectives, and seeds. It reports concrete geomean IPC speedups over baselines and SOTA (1.062x over LRU and 0.6% over Mockingjay for cache replacement; 1.100x over Bimodal and 1.5% over Hashed Perceptron for branch prediction; 1.76x over no prefetching, 17% over VA/AMPM Lite, and 21% over SMS for prefetching). Analysis finds that evolved designs typically coordinate known techniques rather than invent new ones, with human architects remaining central in setting constraints.

Significance. If the results hold under broader testing, the work is significant for demonstrating a practical, open-source end-to-end framework for LLM-driven exploration of vast combinatorial design spaces in computer architecture. Strengths include reliance on external cycle-accurate simulators and standard benchmarks for performance measurement (avoiding circularity), the release of the framework as open-source, and the empirical finding that search often rediscovers and coordinates established mechanisms, which lends plausibility. It usefully reframes the architect's role around objective specification rather than manual implementation.

major comments (2)
  1. [Results section] Results section (and abstract): The central claims that evolved designs 'match or exceed state-of-the-art' rest on IPC numbers obtained by evolving against a fixed benchmark split. The manuscript notes that 'objectives, constraints, and prompt guidance affect reliability and generalization' yet provides no held-out traces, cross-benchmark results, or distribution-shift experiments to quantify how much of the reported speedups (1.062x, 1.100x, 1.76x) survive beyond the optimization set. This is load-bearing for the headline performance assertions.
  2. [Abstract and Results] Abstract and Results: The reported speedups (e.g., 0.6% over Mockingjay, 1.5% over the Hashed Perceptron seed) are presented without stating the number of independent evolutionary runs, random seeds, or any statistical measures such as standard deviation or confidence intervals. This omission makes it difficult to assess whether the small margins reflect reliable improvements or run-to-run variability.
minor comments (2)
  1. [Analysis section] The analysis of evolved designs could include a quantitative breakdown (e.g., a table) of how frequently components match known techniques versus novel combinations, to strengthen the claim that 'novelty lies in how they are coordinated.'
  2. [Figures] Figure captions and legends should more explicitly link visual elements (e.g., performance curves or code snippets) to the specific IPC metrics and baselines being compared.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We appreciate the recognition of the framework's open-source release and the empirical observation that evolved designs often coordinate known techniques. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Results section] Results section (and abstract): The central claims that evolved designs 'match or exceed state-of-the-art' rest on IPC numbers obtained by evolving against a fixed benchmark split. The manuscript notes that 'objectives, constraints, and prompt guidance affect reliability and generalization' yet provides no held-out traces, cross-benchmark results, or distribution-shift experiments to quantify how much of the reported speedups (1.062x, 1.100x, 1.76x) survive beyond the optimization set. This is load-bearing for the headline performance assertions.

    Authors: We agree that the absence of held-out or cross-benchmark evaluations limits the strength of generalization claims. The manuscript demonstrates that the framework, when guided by human-specified objectives, constraints, and benchmark splits, can produce designs competitive with or superior to prior SOTA on the standard benchmarks used for evaluation in the field. However, we did not perform explicit distribution-shift or held-out trace experiments. In the revised manuscript we will add an explicit Limitations subsection in Results that states the scope of the reported speedups, reiterates the human-specified nature of the benchmark split, and outlines future work on cross-benchmark validation. This clarification will prevent overstatement while preserving the core contribution of the end-to-end framework. revision: partial

  2. Referee: [Abstract and Results] Abstract and Results: The reported speedups (e.g., 0.6% over Mockingjay, 1.5% over the Hashed Perceptron seed) are presented without stating the number of independent evolutionary runs, random seeds, or any statistical measures such as standard deviation or confidence intervals. This omission makes it difficult to assess whether the small margins reflect reliable improvements or run-to-run variability.

    Authors: We acknowledge the omission. Because the LLM-driven evolution is stochastic, reporting run-to-run variability is necessary for interpreting the smaller margins. In the revised version we will state the number of independent evolutionary runs performed for each component (cache replacement, branch prediction, prefetching), the random seeds used where applicable, and include standard deviations or confidence intervals for the reported geomean IPC speedups. These additions will be placed in both the abstract (concise form) and the Results section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical IPC results from external simulators on standard benchmarks

full rationale

The paper presents an agentic framework for evolving microarchitectural code via LLM edits guided by human-specified objectives and evaluated through cycle-accurate simulation on fixed benchmark splits. All reported speedups (e.g., 1.062x over LRU, 1.100x over Bimodal, 1.76x over no prefetching) are direct outputs of those external simulators rather than quantities derived from any internal equations, fitted parameters, or self-referential definitions within the paper. No derivation chain, uniqueness theorem, ansatz, or self-citation is invoked to justify the performance numbers; the central claims rest on observable simulation outcomes that could in principle be reproduced or falsified independently. The absence of held-out traces or cross-benchmark results raises a generalization concern but does not constitute circularity in the reported results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that LLMs can generate syntactically valid and functionally improvable microarchitecture code when given simulation feedback. No new physical entities are postulated. No numerical parameters are fitted inside the paper; all scoring and constraints are human-specified.

axioms (1)
  • domain assumption Large language models can iteratively improve microarchitectural code when provided with cycle-accurate performance feedback
    This is the core mechanism of the agentic loop described in the abstract.

pith-pipeline@v0.9.0 · 5649 in / 1428 out tokens · 51504 ms · 2026-05-07T16:38:52.335981+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 39 canonical work pages · 3 internal anchors

  1. [1]

    Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. InProceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT). 303–

  2. [2]

    doi:10.1145/2628071.2628092

  3. [3]

    2025.System Card: Claude Opus 4 & Claude Sonnet 4

    Anthropic. 2025.System Card: Claude Opus 4 & Claude Sonnet 4. Technical Report. https://www.anthropic.com/transparency

  4. [4]

    Henrique Assumpç ao, Diego Ferreira, Leandro Campos, and Fabricio Murai

  5. [5]

    arXiv preprint arXiv:2510.14150 , year =

    CodeEvolve: an Open Source Evolutionary Coding Agent for Algorithmic Discovery and Optimization. arXiv preprint arXiv:2510.14150. https://arxiv.org/ abs/2510.14150

  6. [6]

    Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002. Finite-time Analysis of the Multiarmed Bandit Problem.Machine Learning47, 2–3 (2002), 235–256. doi:10.1023/A:1013689704352

  7. [7]

    Uncovering In-DRAM RowHammer Protection Mechanisms:A New Methodology, Custom RowHammer Patterns, and Implications,

    Rahul Bera, Konstantinos Kanellopoulos, Anant V. Nori, Taha Shahroodi, Sreeni- vas Subramoney, and Onur Mutlu. 2021. Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning. InProceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MI- CRO). 1121–1137. doi:10.1145/3466752.3480114

  8. [8]

    Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques(Toronto, Ontario, Canada)(PACT ’08). Association for Computing Machinery, New York, NY, USA, 72–81. doi...

  9. [9]

    Herbie Bradley, Honglu Fan, Theodoros Galanos, Ryan Zhou, Daniel Scott, and Joel Lehman. 2024. The OpenELM Library: Leveraging Progress in Language Models for Novel Evolutionary Algorithms. InGenetic Programming Theory and Practice XX. Springer. doi:10.1007/978-981-99-8413-8_10

  10. [10]

    James Bucek, Klaus-Dieter Lange, and Jóakim von Kistowski. 2018. SPEC CPU2017: Next-Generation Compute Benchmark. InCompanion of the 2018 ACM/SPEC International Conference on Performance Engineering. 41–42. doi:10. 1145/3185768.3185771 12 Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization

  11. [11]

    Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, Alex Dimakis, and Ion Stoica. 2026. AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization. arXiv:2602.20133 [cs.NE] https://arxiv.org/abs/2602.20133

  12. [12]

    Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Bowen Wang, Alex Krentsel, Tian Xia, Mert Cemri, Jongseok Park, Shuo Yang, Jeff Chen, Lakshya Agrawal, Aditya Desai, Jiarong Xing, Koushik Sen, Matei Zaharia, and Ion Stoica. 2025. Barbarians at the Gate: How AI is Upending Systems Research. arXiv preprint arXiv:2510.06189. https://arxiv.org/abs/2510.06189

  13. [13]

    Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware.SIGPLAN Not.47, 4 (March 2012), 37–48. doi:10.1145/2248487.2150982

  14. [14]

    John W. C. Fu, Janak H. Patel, and Bob L. Janssens. 1992. Stride Directed Prefetch- ing in Scalar Processors. InProceedings of the 25th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 102–110. doi:10.1145/144965.145006

  15. [15]

    Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, and Christina Delimitrou. 2019. An Open-S...

  16. [16]

    Gemini Team, Google DeepMind. 2025. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities. arXiv preprint arXiv:2507.06261. https://arxiv.org/abs/2507.06261

  17. [17]

    The championship simulator: Architectural simulation for education and competition,

    Nathan Gober, Gino Chacon, Lei Wang, Paul V. Gratz, Daniel A. Jiménez, Elvira Teran, Seth Pugsley, and Jinchun Kim. 2022. The Championship Sim- ulator: Architectural Simulation for Education and Competition. arXiv preprint arXiv:2210.14324. https://arxiv.org/abs/2210.14324

  18. [18]

    Raghav Gupta, Akanksha Jain, Abraham Gonzalez, Alexander Novikov, Po-Sen Huang, Matej Balog, Marvin Eisenberger, Sergey Shirobokov, Ngân V˜u, Martin Dixon, Borivoje Nikolić, Parthasarathy Ranganathan, and Sagar Karandikar

  19. [19]

    arXiv:2602.22425 [cs.AI] https://arxiv.org/abs/2602.22425

    ArchAgent: Agentic AI-driven Computer Architecture Discovery. arXiv:2602.22425 [cs.AI] https://arxiv.org/abs/2602.22425

  20. [20]

    Yasuo Ishii, Mary Inaba, and Kei Hiraki. 2009. Access Map Pattern Matching for Data Cache Prefetch. InProceedings of the 23rd International Conference on Supercomputing (ICS). 499–500. doi:10.1145/1542275.1542349

  21. [21]

    Akanksha Jain and Calvin Lin. 2016. Back to the Future: Leveraging Belady’s Algorithm for Improved Cache Replacement. InProceedings of the 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 78–89. doi:10. 1109/ISCA.2016.17

  22. [22]

    Theobald, Simon C

    Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr., and Joel Emer. 2010. High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP). InProceedings of the 37th Annual International Symposium on Computer Architec- ture (ISCA). 60–71. doi:10.1145/1815961.1815971

  23. [23]

    Daniel A. Jiménez. 2014. Strided Sampling Hashed Perceptron Predictor. JILP Special Issue on the 4th Championship Branch Prediction Competition (CBP-4). https://jilp.org/cbp2014/paper/DanielJimenez.pdf

  24. [24]

    Daniel A. Jiménez. 2016. Multiperspective Perceptron Predictor. Proceedings of the 5th Championship Branch Prediction Competition (CBP-5). https://jilp.org/ cbp2016/paper/DanielJimenez1.pdf

  25. [25]

    Jiménez and Calvin Lin

    Daniel A. Jiménez and Calvin Lin. 2001. Dynamic Branch Prediction with Per- ceptrons. InProceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA). 197–206. doi:10.1109/HPCA.2001.903263

  26. [26]

    Pugsley, Paul V

    Jinchun Kim, Seth H. Pugsley, Paul V. Gratz, A. L. Narasimha Reddy, Chris Wilkerson, and Zeshan A. Chishti. 2016. Path Confidence based Lookahead Prefetching. InProceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). doi:10.1109/MICRO.2016.7783763

  27. [27]

    2026.Kimi K2.5: Visual Agentic Intelligence

    Kimi Team. 2026.Kimi K2.5: Visual Agentic Intelligence. Technical Report. Moonshot AI. https://github.com/MoonshotAI/Kimi-K2.5

  28. [28]

    Chi, Jeffrey Dean, and Neoklis Polyzotis

    Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. InProceedings of the International Conference on Management of Data (SIGMOD). 489–504. doi:10.1145/3183713.3196909

  29. [29]

    Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin. 2025. ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution. arXiv preprint arXiv:2509.19349. https://arxiv.org/abs/2509.19349

  30. [30]

    Gang Liu, Yihan Zhu, Jie Chen, and Meng Jiang. 2025. Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research. arXiv preprint arXiv:2510.06056. https://arxiv.org/abs/2510.06056

  31. [31]

    Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. InProceedings of the VLDB Endowment, Vol. 12. 1705–1718. doi:10.14778/3342263.3342644

  32. [32]

    Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating Search Spaces by Map- ping Elites. arXiv preprint arXiv:1504.04909. https://arxiv.org/abs/1504.04909

  33. [33]

    Agustín Navarro-Torres, Biswabandan Panda, Jesús Alastruey-Benedé, Pablo Ibá nez, Víctor Vi nals Yúfera, and Alberto Ros. 2022. Berti: an Accurate Local- Delta Data Prefetcher. InProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). doi:10.1109/MICRO56248.2022.00072

  34. [34]

    Alexander Novikov, Ngân V u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Ba- log. 2025. AlphaEvolve: A Coding Agent for Scientific a...

  35. [35]

    2026.GPT-5.3-Codex System Card

    OpenAI. 2026.GPT-5.3-Codex System Card. Technical Report. https://openai. com/index/gpt-5-3-codex-system-card/

  36. [36]

    Samuel Pakalapati and Biswabandan Panda. 2020. Bouquet of Instruction Point- ers: Instruction Pointer Classifier-based Spatial Hardware Prefetching. InPro- ceedings of the 47th Annual International Symposium on Computer Architecture (ISCA). doi:10.1109/ISCA45697.2020.00021

  37. [37]

    Qureshi, Aamer Jaleel, Yale N

    Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer

  38. [38]

    IEEE Micro28, 1 (2008), 91–98

    Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching. IEEE Micro28, 1 (2008), 91–98. doi:10.1109/MM.2008.14

  39. [39]

    Pawan Kumar, Emilien Dupont, Francisco J

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi

  40. [40]

    , author Barekatain, M

    Mathematical Discoveries from Program Search with Large Language Models.Nature625 (2024), 468–475. doi:10.1038/s41586-023-06924-6

  41. [41]

    Karthikeyan Sankaralingam. 2026. Computer Architecture’s AlphaZero Moment: Automated Discovery in an Encircled World. arXiv preprint arXiv:2604.03312. https://arxiv.org/abs/2604.03312

  42. [42]

    André Seznec. 2011. A New Case for the TAGE Branch Predictor. InProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, 117–127. doi:10.1145/2155620.2155635

  43. [43]

    Ishan Shah, Akanksha Jain, and Calvin Lin. 2022. Effective Mimicry of Belady’s MIN Policy. InProceedings of the 28th IEEE International Symposium on High- Performance Computer Architecture (HPCA). doi:10.1109/HPCA53966.2022.00048

  44. [44]

    2025.OpenEvolve: an open-source evolutionary coding agent

    Asankhaya Sharma. 2025.OpenEvolve: an open-source evolutionary coding agent. https://github.com/algorithmicsuperintelligence/openevolve

  45. [45]

    Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin. 2019. Applying Deep Learning to the Cache Replacement Problem. InProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 413–

  46. [46]

    doi:10.1145/3352460.3358319

  47. [47]

    Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos

    Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2006. Spatial Memory Streaming. InProceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA). 252–263. doi:10.1109/ISCA.2006.38

  48. [48]

    Standard Performance Evaluation Corporation. 2006. SPEC CPU2006 Benchmark Suite. https://www.spec.org/cpu2006/

  49. [49]

    Nihat Engin Toklu, Timothy Atkinson, Vojtěch Micka, Paweł Liskowski, and Rupesh Kumar Srivastava. 2023. EvoTorch: Scalable Evolutionary Computation in Python. arXiv preprint arXiv:2302.12600. https://arxiv.org/abs/2302.12600

  50. [50]

    Rodriguez, Wendy A

    Giuseppe Vietri, Liana V. Rodriguez, Wendy A. Martinez, Steven Lyons, Jason Liu, Raju Rangaswami, Ming Zhao, and Giri Narasimhan. 2018. Driving Cache Replacement with ML-based LeCaR. In10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage). https://www.usenix.org/conference/ hotstorage18/presentation/vietri

  51. [51]

    Yiping Wang, Shao-Rong Su, Zhiyuan Zeng, Eva Xu, Liliang Ren, Xinyu Yang, Zeyi Huang, Xuehai He, Luyao Ma, Baolin Peng, Hao Cheng, Pengcheng He, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, and Yelong Shen. 2025. ThetaE- volve: Test-time Learning on Open Problems. arXiv preprint arXiv:2511.23473. https://arxiv.org/abs/2511.23473

  52. [52]

    Steely Jr., and Joel Emer

    Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely Jr., and Joel Emer. 2011. SHiP: Signature-based Hit Predictor for High Performance Caching. InProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 430–441. doi:10.1145/2155620.2155671

  53. [53]

    Tse-Yu Yeh and Yale N. Patt. 1991. Two-level adaptive training branch prediction. InProceedings of the 24th Annual International Symposium on Microarchitecture (Albuquerque, New Mexico, Puerto Rico)(MICRO 24). Association for Computing Machinery, New York, NY, USA, 51–61. doi:10.1145/123465.123475

  54. [54]

    Vinson Young, Chiachen Chou, Aamer Jaleel, and Moin Qureshi. 2017. SHiP + + : Enhancing Signature-Based Hit Predictor for Improved Cache Performance. https://api.semanticscholar.org/CorpusID:43689204

  55. [55]

    Kaiyang Zhao, Yuang Chen, Xenia Xu, Dan Schatzberg, Nastaran Hajinaza, Rupin Vakharwala, Andy Anderson, and Dimitrios Skarlatos. 2025. Learning to Walk: Architecting Learned Virtual Memory Translation. InProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture (MICRO ’25). Association for Computing Machinery, New York, NY, USA, 1777–1...