AgentDSE: Reasoning-Augmented Architectural Design Space Exploration
Pith reviewed 2026-06-26 11:29 UTC · model grok-4.3
The pith
An off-the-shelf LLM coding agent automates architectural design space exploration by reasoning through constraints, achieving competitive results with up to 100 times fewer simulator evaluations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AgentDSE uses a general-purpose LLM coding agent to automate the architectural-reasoning loop in design space exploration. By interacting directly with simulators, the agent reasons about constraints and bottlenecks to propose and refine designs. This approach delivers competitive or superior design quality across DNN accelerator mapping, hardware/software co-design, and CPU cache-hierarchy optimization while requiring up to two orders of magnitude fewer evaluations than conventional methods. No model fine-tuning, precomputed design databases, or domain-specific optimizer code is needed.
What carries the argument
AgentDSE, a simulator-in-the-loop methodology driven by a general-purpose LLM coding agent that automates reasoning about physical constraints, performance bottlenecks, data reuse, and workload structures.
If this is right
- Up to two orders of magnitude reduction in simulator evaluations while maintaining or improving design quality.
- The method applies without modification to DNN accelerator mapping, hardware/software co-design, and CPU cache-hierarchy optimization.
- Search decisions become inspectable through generated traces of hypotheses and simulator interactions.
- No requirement for model fine-tuning, precomputed design databases, or domain-specific optimizer code.
Where Pith is reading between the lines
- LLM agents could extend to other engineering domains that rely on expensive simulations for optimization.
- The inspectable traces might enable debugging of both the designs and the underlying simulators.
- Hybrid systems combining LLM reasoning with gradient-based or evolutionary optimizers could yield further efficiency gains.
Load-bearing premise
A general-purpose large language model coding agent without fine-tuning can reliably automate the architectural-reasoning loop by reasoning through physical constraints, performance bottlenecks, data reuse, and workload structures.
What would settle it
Testing the agent on a novel hardware architecture where its pre-trained knowledge of performance trade-offs does not apply, and measuring whether the reduction in evaluations is lost.
Figures
read the original abstract
Traditional architectural design space exploration (DSE) is highly inefficient, typically requiring tens of thousands of simulator evaluations across various optimization methods. This inefficiency arises because conventional methods treat the simulator as a black-box oracle. In contrast, human architects effectively guide exploration by reasoning through physical constraints, performance bottlenecks, data reuse, and workload structures. To bridge this gap, we introduce AgentDSE, a simulator-in-the-loop methodology driven by a general-purpose large language model (LLM) coding agent. AgentDSE automates this architectural-reasoning loop without requiring model fine-tuning, precomputed design databases, or domain-specific optimizer code. Across deep neural network (DNN) accelerator mapping, hardware/software co-design, and CPU cache-hierarchy optimization, AgentDSE achieves competitive or better design quality with up to two orders of magnitude fewer evaluations. AgentDSE also produces inspectable traces that surface architectural hypotheses, performance cliffs, implicit priors, and simulator artifacts, making every search decision traceable rather than buried in optimizer state.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AgentDSE, a simulator-in-the-loop methodology driven by a general-purpose LLM coding agent that automates architectural reasoning about physical constraints, bottlenecks, data reuse, and workload structures. Unlike conventional black-box DSE methods requiring tens of thousands of evaluations, AgentDSE is evaluated across DNN accelerator mapping, hardware/software co-design, and CPU cache-hierarchy optimization, claiming competitive or superior design quality with up to two orders of magnitude fewer simulator calls while producing inspectable traces of hypotheses, cliffs, priors, and artifacts.
Significance. If the empirical results hold, the work is significant for demonstrating that untuned general-purpose LLMs can automate the architectural-reasoning loop across distinct DSE domains without precomputed databases or domain-specific optimizers, substantially reducing evaluation counts while adding traceability. The inspectable traces constitute a clear strength, as they surface decisions that are typically opaque in optimizer state.
minor comments (1)
- [Abstract] Abstract: the claim of 'up to two orders of magnitude fewer evaluations' would benefit from a parenthetical note on the specific baselines (e.g., genetic algorithms, Bayesian optimization) and the domains in which the largest reductions occur.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The referee's description accurately reflects the core claims of AgentDSE regarding simulator-in-the-loop LLM-driven reasoning, evaluation reduction, and traceability across the three DSE domains.
Circularity Check
No significant circularity
full rationale
The paper describes an empirical LLM-agent methodology for architectural DSE and reports experimental outcomes on evaluation counts and design quality across three domains. No equations, parameter fittings, derivations, or self-citation chains appear in the abstract or described structure that reduce any claimed result to its own inputs by construction. The central claims rest on observable simulator runs and are externally falsifiable, making the work self-contained against the listed circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Sample-efficient mapspace optimization for dnn accelerators with bayesian learning,
G. Dinh, I. K. J. Valsala, H. Luo, C. Hong, Y . Cho, J. Demmel, S. Li, and Y . Liu, “Sample-efficient mapspace optimization for dnn accelerators with bayesian learning,” inArchitecture and System Support for Transformer Models (ASSYST@ ISCA 2023), 2023
2023
-
[2]
gem5 co-pilot: AI assistant agent for architectural design space exploration,
Z. Fu, A. Manley, and M. Alian, “gem5 co-pilot: AI assistant agent for architectural design space exploration,”arXiv preprint arXiv:2510.19577, 2025
arXiv 2025
-
[3]
Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration,
H. Genc, S. Kim, A. Amid, A. Haj-Ali, V . Iyer, P. Prakash, J. Zhao, D. Grubb, H. Liew, H. Mao, A. J. Ou, C. Schmidt, S. Steffl, J. C. Wright, I. Stoica, J. Ragan-Kelley, K. Asanovic, B. Nikolic, and Y . S. Shao, “Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration,” inDesign Automation Conference (DAC), 2021, pp. 769–774
2021
-
[4]
ORFS-agent: Tool- using agents for chip design optimization,
A. Ghose, A. B. Kahng, S. Kundu, and Z. Wang, “ORFS-agent: Tool- using agents for chip design optimization,” inProceedings of the 7th ACM/IEEE Symposium on Machine Learning for CAD (MLCAD), 2025, pp. 1–13
2025
-
[5]
The championship simulator: Ar- chitectural simulation for education and competition,
N. Gober, G. Chacon, L. Wang, P. V . Gratz, D. A. Jim ´enez, E. Teran, S. Pugsley, and J. Kim, “The championship simulator: Ar- chitectural simulation for education and competition,”arXiv preprint arXiv:2210.14324, 2022
arXiv 2022
-
[6]
A. Grattafioriet al., “The Llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024
Pith/arXiv arXiv 2024
-
[7]
ArchAgent: Agentic AI-driven computer architecture discovery,
R. Gupta, A. Jain, A. Gonzalez, A. Novikov, P.-S. Huang, M. Balog, M. Eisenberger, S. Shirobokov, N. V ˜u, M. Dixon, B. Nikoli ´c, P. Ran- ganathan, and S. Karandikar, “ArchAgent: Agentic AI-driven computer architecture discovery,”arXiv preprint arXiv:2602.22425, 2026
arXiv 2026
-
[8]
DOSA: Differentiable model-based one-loop search for DNN accelerators,
C. Hong, Q. Huang, G. Dinh, M. Subedar, and Y . S. Shao, “DOSA: Differentiable model-based one-loop search for DNN accelerators,” in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023, pp. 209–224
2023
-
[9]
GAMMA: Automating the HW mapping of DNN models on accelerators via genetic algorithm,
S.-C. Kao and T. Krishna, “GAMMA: Automating the HW mapping of DNN models on accelerators via genetic algorithm,” inIEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2020
2020
-
[10]
OpenVLA: An open-source vision-language-action model,
M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. P. Foster, P. R. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “OpenVLA: An open-source vision-language-action model,” inProceedings of the 8th Conference on Robot Learning (CoRL), 2024, pp. 2679–2713
2024
-
[11]
Automatic domain-specific soc design for autonomous unmanned aerial vehicles,
S. Krishnan, Z. Wan, K. Bhardwaj, P. Whatmough, A. Faust, S. Neuman, G.-Y . Wei, D. Brooks, and V . J. Reddi, “Automatic domain-specific soc design for autonomous unmanned aerial vehicles,” in2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2022, pp. 300–317
2022
-
[12]
ArchGym: An open-source gymnasium for machine learning assisted architecture design,
S. Krishnan, A. Yazdanbakhsh, S. Prakash, J. Jabbour, I. Uchendu, S. Ghosh, B. Boroujerdian, D. Richins, D. Tripathy, A. Faust, and V . Janapa Reddi, “ArchGym: An open-source gymnasium for machine learning assisted architecture design,” inProceedings of the 50th Annual International Symposium on Computer Architecture (ISCA), 2023
2023
-
[13]
MAESTRO: A data-centric approach to understand reuse, performance, and hardware cost of DNN mappings,
H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V . Sarber, and T. Krishna, “MAESTRO: A data-centric approach to understand reuse, performance, and hardware cost of DNN mappings,” inIEEE/ACM International Symposium on Microarchitecture (MICRO), 2019
2019
-
[14]
ChipNeMo: Domain-adapted LLMs for chip design,
M. Liu, T.-D. Ene, R. Kirby, C. Cheng, N. Pinckney, R. Liang, J. Alben, H. Anand, S. Banerjee, I. Bayraktaroglu, B. Bhaskaran, B. Catanzaro, A. Chaudhuri, S. Clay, B. Dally, L. Dang, P. Deshpande, S. Dhodhi, S. Halepete, E. Hill, J. Hu, S. Jain, A. Jindal, B. Khailany, G. Kokai, K. Kunal, X. Li, C. Lind, H. Liu, S. Oberman, S. Omar, G. Pasandi, S. Pratty,...
arXiv 2024
-
[15]
Timeloop: A systematic approach to DNN accelerator evaluation,
A. Parashar, P. Raina, Y . S. Shao, Y .-H. Chen, V . A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S. W. Keckler, and J. S. Emer, “Timeloop: A systematic approach to DNN accelerator evaluation,” in2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2019, pp. 304–315
2019
-
[16]
Quarch: A benchmark for evaluating llm reasoning in computer architecture,
S. Prakash, A. Cheng, A. Tschand, M. Mazumder, V . Gohil, J. Ma, J. Yik, Z. Wan, J. Quaye, E. L. Alvanakiet al., “Quarch: A benchmark for evaluating llm reasoning in computer architecture,”arXiv preprint arXiv:2510.22087, 2025
arXiv 2025
-
[17]
Economy of minds: Emerging multi-agent intelligence with economic interactions,
Z. Qi, H. Su, A. Qu, C. Wang, Y . Yao, H. Zheng, K. Chattopad- hyay, G. Xu, Z. Wang, W. Yeet al., “Economy of minds: Emerging multi-agent intelligence with economic interactions,”arXiv preprint arXiv:2606.02859, 2026
Pith/arXiv arXiv 2026
-
[18]
SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis,
R. Raj, S. Banerjee, N. Chandra, Z. Wan, J. Tong, A. Samajdhar, and T. Krishna, “SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis,” in2025 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 2025, pp. 186–200
2025
-
[19]
A case for efficient accelerator design space exploration via bayesian optimization,
B. Reagen, J. M. Hern ´andez-Lobato, R. Adolf, M. A. Gelbart, P. N. Whatmough, G.-Y . Wei, and D. M. Brooks, “A case for efficient accelerator design space exploration via bayesian optimization,” in IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), 2017, pp. 1–6
2017
-
[20]
Computer architecture’s AlphaZero moment: Automated discovery in an encircled world,
K. Sankaralingam, “Computer architecture’s AlphaZero moment: Automated discovery in an encircled world,”arXiv preprint arXiv:2604.03312, 2026
Pith/arXiv arXiv 2026
-
[21]
Practical Bayesian optimiza- tion of machine learning algorithms,
J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian optimiza- tion of machine learning algorithms,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 25, 2012
2012
-
[22]
Gibbon: Efficient co-exploration of NN model and processing-in- memory architecture,
H. Sun, C. Wang, Z. Zhu, X. Ning, G. Dai, H. Yang, and Y . Wang, “Gibbon: Efficient co-exploration of NN model and processing-in- memory architecture,” in2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2022, pp. 867–872
2022
-
[23]
GenAI for systems: Recurring challenges and design principles from software to silicon,
A. Tschand, C. Wang, Z. Wan, A. Cheng, I. Cristescu, K. He, H. Huang, A. Ingare, A. Kangaslahti, S. Kangaslahti, T. Lebryk, H. Lin, J. J. Ma, A. Meterez, C. Mohri, D. Morwani, S. Qin, R. Rinberg, P. Rodriguez- Diaz, A. M. Taliotis, P. Undrum Fathi, R. Zhao, T. Zhou, and V . Janapa Reddi, “GenAI for systems: Recurring challenges and design principles from ...
arXiv 2026
-
[24]
EPIM: Efficient processing-in-memory accelerators based on epitome,
C. Wang, Z. Dong, D. Zhou, Z. Zhu, Y . Wang, J. Feng, and K. Keutzer, “EPIM: Efficient processing-in-memory accelerators based on epitome,” inProceedings of the 61st ACM/IEEE Design Automation Conference, 2024, pp. 1–6
2024
-
[25]
LLM-DSE: Searching accelerator parameters with LLM agents,
H. Wang, X. Wu, Z. Ding, S. Zheng, C. Wang, N. Prakriya, T. Nowatzki, Y . Sun, and J. Cong, “LLM-DSE: Searching accelerator parameters with LLM agents,”arXiv preprint arXiv:2505.12188, 2025. 7
arXiv 2025
-
[26]
AI agentic programming: A survey of techniques, challenges, and opportunities,
H. Wang, J. Gong, H. Zhang, J. Xu, and Z. Wang, “AI agentic programming: A survey of techniques, challenges, and opportunities,” arXiv preprint arXiv:2508.11126, 2025
arXiv 2025
-
[27]
ChatEDA: A large language model powered autonomous agent for EDA,
H. Wu, Z. He, X. Zhang, X. Yao, S. Zheng, H. Zheng, and B. Yu, “ChatEDA: A large language model powered autonomous agent for EDA,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 43, no. 10, pp. 3184–3197, 2024
2024
-
[28]
Accelergy: An architecture- level energy estimation methodology for accelerator designs,
Y . N. Wu, J. S. Emer, and V . Sze, “Accelergy: An architecture- level energy estimation methodology for accelerator designs,” in2019 IEEE/ACM International Conference on Computer-Aided Design (IC- CAD), 2019, pp. 1–8. 8
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.