pith. sign in

arxiv: 2606.21836 · v1 · pith:IH6IMRV7new · submitted 2026-06-20 · 💻 cs.AR · cs.AI

AgentDSE: Reasoning-Augmented Architectural Design Space Exploration

Pith reviewed 2026-06-26 11:29 UTC · model grok-4.3

classification 💻 cs.AR cs.AI
keywords design space explorationLLM agentsarchitectural optimizationDNN acceleratorshardware-software co-designcache hierarchysimulator-in-the-loop
0
0 comments X

The pith

An off-the-shelf LLM coding agent automates architectural design space exploration by reasoning through constraints, achieving competitive results with up to 100 times fewer simulator evaluations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that traditional architectural design space exploration wastes simulator calls by treating them as black-box oracles. It shows that a general-purpose large language model can instead drive the search by reasoning about physical constraints, bottlenecks, data reuse patterns, and workload structures. AgentDSE implements this as a simulator-in-the-loop process that generates code, receives feedback, and iterates without any fine-tuning or extra databases. Experiments across DNN accelerators, hardware-software co-design, and CPU caches demonstrate that this yields designs of equal or better quality using one to two orders of magnitude fewer evaluations. The method also records the reasoning steps, making the search process transparent.

Core claim

AgentDSE uses a general-purpose LLM coding agent to automate the architectural-reasoning loop in design space exploration. By interacting directly with simulators, the agent reasons about constraints and bottlenecks to propose and refine designs. This approach delivers competitive or superior design quality across DNN accelerator mapping, hardware/software co-design, and CPU cache-hierarchy optimization while requiring up to two orders of magnitude fewer evaluations than conventional methods. No model fine-tuning, precomputed design databases, or domain-specific optimizer code is needed.

What carries the argument

AgentDSE, a simulator-in-the-loop methodology driven by a general-purpose LLM coding agent that automates reasoning about physical constraints, performance bottlenecks, data reuse, and workload structures.

If this is right

  • Up to two orders of magnitude reduction in simulator evaluations while maintaining or improving design quality.
  • The method applies without modification to DNN accelerator mapping, hardware/software co-design, and CPU cache-hierarchy optimization.
  • Search decisions become inspectable through generated traces of hypotheses and simulator interactions.
  • No requirement for model fine-tuning, precomputed design databases, or domain-specific optimizer code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • LLM agents could extend to other engineering domains that rely on expensive simulations for optimization.
  • The inspectable traces might enable debugging of both the designs and the underlying simulators.
  • Hybrid systems combining LLM reasoning with gradient-based or evolutionary optimizers could yield further efficiency gains.

Load-bearing premise

A general-purpose large language model coding agent without fine-tuning can reliably automate the architectural-reasoning loop by reasoning through physical constraints, performance bottlenecks, data reuse, and workload structures.

What would settle it

Testing the agent on a novel hardware architecture where its pre-trained knowledge of performance trade-offs does not apply, and measuring whether the reduction in evaluations is lost.

Figures

Figures reproduced from arXiv: 2606.21836 by Chenyu Wang, David Kong, Duane S. Boning, Jiahe Caroline Shi, Vijay Janapa Reddi, Yilun Du, Zishen Wan.

Figure 1
Figure 1. Figure 1: Motivation for architectural-reasoning-guided DSE. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of AgentDSE for history-aware agentic design-space search. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Strict apples-to-apples per-layer comparison: DOSA in per-layer mode [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Convergence on ResNet50 Layer 46: best-so-far EDP versus real [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Traditional architectural design space exploration (DSE) is highly inefficient, typically requiring tens of thousands of simulator evaluations across various optimization methods. This inefficiency arises because conventional methods treat the simulator as a black-box oracle. In contrast, human architects effectively guide exploration by reasoning through physical constraints, performance bottlenecks, data reuse, and workload structures. To bridge this gap, we introduce AgentDSE, a simulator-in-the-loop methodology driven by a general-purpose large language model (LLM) coding agent. AgentDSE automates this architectural-reasoning loop without requiring model fine-tuning, precomputed design databases, or domain-specific optimizer code. Across deep neural network (DNN) accelerator mapping, hardware/software co-design, and CPU cache-hierarchy optimization, AgentDSE achieves competitive or better design quality with up to two orders of magnitude fewer evaluations. AgentDSE also produces inspectable traces that surface architectural hypotheses, performance cliffs, implicit priors, and simulator artifacts, making every search decision traceable rather than buried in optimizer state.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper introduces AgentDSE, a simulator-in-the-loop methodology driven by a general-purpose LLM coding agent that automates architectural reasoning about physical constraints, bottlenecks, data reuse, and workload structures. Unlike conventional black-box DSE methods requiring tens of thousands of evaluations, AgentDSE is evaluated across DNN accelerator mapping, hardware/software co-design, and CPU cache-hierarchy optimization, claiming competitive or superior design quality with up to two orders of magnitude fewer simulator calls while producing inspectable traces of hypotheses, cliffs, priors, and artifacts.

Significance. If the empirical results hold, the work is significant for demonstrating that untuned general-purpose LLMs can automate the architectural-reasoning loop across distinct DSE domains without precomputed databases or domain-specific optimizers, substantially reducing evaluation counts while adding traceability. The inspectable traces constitute a clear strength, as they surface decisions that are typically opaque in optimizer state.

minor comments (1)
  1. [Abstract] Abstract: the claim of 'up to two orders of magnitude fewer evaluations' would benefit from a parenthetical note on the specific baselines (e.g., genetic algorithms, Bayesian optimization) and the domains in which the largest reductions occur.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The referee's description accurately reflects the core claims of AgentDSE regarding simulator-in-the-loop LLM-driven reasoning, evaluation reduction, and traceability across the three DSE domains.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical LLM-agent methodology for architectural DSE and reports experimental outcomes on evaluation counts and design quality across three domains. No equations, parameter fittings, derivations, or self-citation chains appear in the abstract or described structure that reduce any claimed result to its own inputs by construction. The central claims rest on observable simulator runs and are externally falsifiable, making the work self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone; the central claim rests on the unverified effectiveness of the LLM reasoning loop.

pith-pipeline@v0.9.1-grok · 5725 in / 1123 out tokens · 19555 ms · 2026-06-26T11:29:33.461690+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 3 linked inside Pith

  1. [1]

    Sample-efficient mapspace optimization for dnn accelerators with bayesian learning,

    G. Dinh, I. K. J. Valsala, H. Luo, C. Hong, Y . Cho, J. Demmel, S. Li, and Y . Liu, “Sample-efficient mapspace optimization for dnn accelerators with bayesian learning,” inArchitecture and System Support for Transformer Models (ASSYST@ ISCA 2023), 2023

  2. [2]

    gem5 co-pilot: AI assistant agent for architectural design space exploration,

    Z. Fu, A. Manley, and M. Alian, “gem5 co-pilot: AI assistant agent for architectural design space exploration,”arXiv preprint arXiv:2510.19577, 2025

  3. [3]

    Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration,

    H. Genc, S. Kim, A. Amid, A. Haj-Ali, V . Iyer, P. Prakash, J. Zhao, D. Grubb, H. Liew, H. Mao, A. J. Ou, C. Schmidt, S. Steffl, J. C. Wright, I. Stoica, J. Ragan-Kelley, K. Asanovic, B. Nikolic, and Y . S. Shao, “Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration,” inDesign Automation Conference (DAC), 2021, pp. 769–774

  4. [4]

    ORFS-agent: Tool- using agents for chip design optimization,

    A. Ghose, A. B. Kahng, S. Kundu, and Z. Wang, “ORFS-agent: Tool- using agents for chip design optimization,” inProceedings of the 7th ACM/IEEE Symposium on Machine Learning for CAD (MLCAD), 2025, pp. 1–13

  5. [5]

    The championship simulator: Ar- chitectural simulation for education and competition,

    N. Gober, G. Chacon, L. Wang, P. V . Gratz, D. A. Jim ´enez, E. Teran, S. Pugsley, and J. Kim, “The championship simulator: Ar- chitectural simulation for education and competition,”arXiv preprint arXiv:2210.14324, 2022

  6. [6]

    The Llama 3 herd of models,

    A. Grattafioriet al., “The Llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

  7. [7]

    ArchAgent: Agentic AI-driven computer architecture discovery,

    R. Gupta, A. Jain, A. Gonzalez, A. Novikov, P.-S. Huang, M. Balog, M. Eisenberger, S. Shirobokov, N. V ˜u, M. Dixon, B. Nikoli ´c, P. Ran- ganathan, and S. Karandikar, “ArchAgent: Agentic AI-driven computer architecture discovery,”arXiv preprint arXiv:2602.22425, 2026

  8. [8]

    DOSA: Differentiable model-based one-loop search for DNN accelerators,

    C. Hong, Q. Huang, G. Dinh, M. Subedar, and Y . S. Shao, “DOSA: Differentiable model-based one-loop search for DNN accelerators,” in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023, pp. 209–224

  9. [9]

    GAMMA: Automating the HW mapping of DNN models on accelerators via genetic algorithm,

    S.-C. Kao and T. Krishna, “GAMMA: Automating the HW mapping of DNN models on accelerators via genetic algorithm,” inIEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2020

  10. [10]

    OpenVLA: An open-source vision-language-action model,

    M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. P. Foster, P. R. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “OpenVLA: An open-source vision-language-action model,” inProceedings of the 8th Conference on Robot Learning (CoRL), 2024, pp. 2679–2713

  11. [11]

    Automatic domain-specific soc design for autonomous unmanned aerial vehicles,

    S. Krishnan, Z. Wan, K. Bhardwaj, P. Whatmough, A. Faust, S. Neuman, G.-Y . Wei, D. Brooks, and V . J. Reddi, “Automatic domain-specific soc design for autonomous unmanned aerial vehicles,” in2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2022, pp. 300–317

  12. [12]

    ArchGym: An open-source gymnasium for machine learning assisted architecture design,

    S. Krishnan, A. Yazdanbakhsh, S. Prakash, J. Jabbour, I. Uchendu, S. Ghosh, B. Boroujerdian, D. Richins, D. Tripathy, A. Faust, and V . Janapa Reddi, “ArchGym: An open-source gymnasium for machine learning assisted architecture design,” inProceedings of the 50th Annual International Symposium on Computer Architecture (ISCA), 2023

  13. [13]

    MAESTRO: A data-centric approach to understand reuse, performance, and hardware cost of DNN mappings,

    H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V . Sarber, and T. Krishna, “MAESTRO: A data-centric approach to understand reuse, performance, and hardware cost of DNN mappings,” inIEEE/ACM International Symposium on Microarchitecture (MICRO), 2019

  14. [14]

    ChipNeMo: Domain-adapted LLMs for chip design,

    M. Liu, T.-D. Ene, R. Kirby, C. Cheng, N. Pinckney, R. Liang, J. Alben, H. Anand, S. Banerjee, I. Bayraktaroglu, B. Bhaskaran, B. Catanzaro, A. Chaudhuri, S. Clay, B. Dally, L. Dang, P. Deshpande, S. Dhodhi, S. Halepete, E. Hill, J. Hu, S. Jain, A. Jindal, B. Khailany, G. Kokai, K. Kunal, X. Li, C. Lind, H. Liu, S. Oberman, S. Omar, G. Pasandi, S. Pratty,...

  15. [15]

    Timeloop: A systematic approach to DNN accelerator evaluation,

    A. Parashar, P. Raina, Y . S. Shao, Y .-H. Chen, V . A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S. W. Keckler, and J. S. Emer, “Timeloop: A systematic approach to DNN accelerator evaluation,” in2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2019, pp. 304–315

  16. [16]

    Quarch: A benchmark for evaluating llm reasoning in computer architecture,

    S. Prakash, A. Cheng, A. Tschand, M. Mazumder, V . Gohil, J. Ma, J. Yik, Z. Wan, J. Quaye, E. L. Alvanakiet al., “Quarch: A benchmark for evaluating llm reasoning in computer architecture,”arXiv preprint arXiv:2510.22087, 2025

  17. [17]

    Economy of minds: Emerging multi-agent intelligence with economic interactions,

    Z. Qi, H. Su, A. Qu, C. Wang, Y . Yao, H. Zheng, K. Chattopad- hyay, G. Xu, Z. Wang, W. Yeet al., “Economy of minds: Emerging multi-agent intelligence with economic interactions,”arXiv preprint arXiv:2606.02859, 2026

  18. [18]

    SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis,

    R. Raj, S. Banerjee, N. Chandra, Z. Wan, J. Tong, A. Samajdhar, and T. Krishna, “SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis,” in2025 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 2025, pp. 186–200

  19. [19]

    A case for efficient accelerator design space exploration via bayesian optimization,

    B. Reagen, J. M. Hern ´andez-Lobato, R. Adolf, M. A. Gelbart, P. N. Whatmough, G.-Y . Wei, and D. M. Brooks, “A case for efficient accelerator design space exploration via bayesian optimization,” in IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), 2017, pp. 1–6

  20. [20]

    Computer architecture’s AlphaZero moment: Automated discovery in an encircled world,

    K. Sankaralingam, “Computer architecture’s AlphaZero moment: Automated discovery in an encircled world,”arXiv preprint arXiv:2604.03312, 2026

  21. [21]

    Practical Bayesian optimiza- tion of machine learning algorithms,

    J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian optimiza- tion of machine learning algorithms,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 25, 2012

  22. [22]

    Gibbon: Efficient co-exploration of NN model and processing-in- memory architecture,

    H. Sun, C. Wang, Z. Zhu, X. Ning, G. Dai, H. Yang, and Y . Wang, “Gibbon: Efficient co-exploration of NN model and processing-in- memory architecture,” in2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2022, pp. 867–872

  23. [23]

    GenAI for systems: Recurring challenges and design principles from software to silicon,

    A. Tschand, C. Wang, Z. Wan, A. Cheng, I. Cristescu, K. He, H. Huang, A. Ingare, A. Kangaslahti, S. Kangaslahti, T. Lebryk, H. Lin, J. J. Ma, A. Meterez, C. Mohri, D. Morwani, S. Qin, R. Rinberg, P. Rodriguez- Diaz, A. M. Taliotis, P. Undrum Fathi, R. Zhao, T. Zhou, and V . Janapa Reddi, “GenAI for systems: Recurring challenges and design principles from ...

  24. [24]

    EPIM: Efficient processing-in-memory accelerators based on epitome,

    C. Wang, Z. Dong, D. Zhou, Z. Zhu, Y . Wang, J. Feng, and K. Keutzer, “EPIM: Efficient processing-in-memory accelerators based on epitome,” inProceedings of the 61st ACM/IEEE Design Automation Conference, 2024, pp. 1–6

  25. [25]

    LLM-DSE: Searching accelerator parameters with LLM agents,

    H. Wang, X. Wu, Z. Ding, S. Zheng, C. Wang, N. Prakriya, T. Nowatzki, Y . Sun, and J. Cong, “LLM-DSE: Searching accelerator parameters with LLM agents,”arXiv preprint arXiv:2505.12188, 2025. 7

  26. [26]

    AI agentic programming: A survey of techniques, challenges, and opportunities,

    H. Wang, J. Gong, H. Zhang, J. Xu, and Z. Wang, “AI agentic programming: A survey of techniques, challenges, and opportunities,” arXiv preprint arXiv:2508.11126, 2025

  27. [27]

    ChatEDA: A large language model powered autonomous agent for EDA,

    H. Wu, Z. He, X. Zhang, X. Yao, S. Zheng, H. Zheng, and B. Yu, “ChatEDA: A large language model powered autonomous agent for EDA,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 43, no. 10, pp. 3184–3197, 2024

  28. [28]

    Accelergy: An architecture- level energy estimation methodology for accelerator designs,

    Y . N. Wu, J. S. Emer, and V . Sze, “Accelergy: An architecture- level energy estimation methodology for accelerator designs,” in2019 IEEE/ACM International Conference on Computer-Aided Design (IC- CAD), 2019, pp. 1–8. 8