A3D: Agentic AI flow for autonomous Accelerator Design
Pith reviewed 2026-05-19 16:17 UTC · model grok-4.3
The pith
An agentic AI flow automates the entire hardware accelerator design process for complex applications with no human intervention.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that by partitioning the accelerator design process among multiple specialist AI agents and verifier agents, orchestrating iterative loops, and incorporating agentic retrieval-augmented generation to access code and tool documentation, it is possible to achieve fully autonomous generation of accelerator designs. This is evidenced by the system's ability to produce designs for LAMMPS and QMCPACK applications using Claude Sonnet and Catapult HLS with zero human input, while also exploring design tradeoffs.
What carries the argument
The A3D agentic AI flow, which partitions tasks among specialist agents and verifier agents, orchestrates process loops, deploys pre-existing and custom tools, and applies agentic RAG to explore relevant documentation and code.
If this is right
- Accelerator designs can be created for irregular scientific codes that previously required extensive manual work.
- Design space exploration for speed versus area occurs automatically during the process.
- Application domain experts gain the ability to produce their own accelerators without deep hardware knowledge.
- The overall effort needed to develop accelerators is lowered through full automation.
Where Pith is reading between the lines
- This approach could be extended to automate other aspects of the hardware design flow, such as verification and testing.
- Improvements in base LLM capabilities would likely increase the complexity of applications that A3D can handle successfully.
- It suggests a path toward more accessible custom hardware for scientific computing communities.
- Integration with simulation feedback loops might allow iterative refinement of designs based on measured performance.
Load-bearing premise
LLMs can be made reliable for the full accelerator design pipeline through agent partitioning and retrieval augmentation even when applied to complex irregular scientific codes.
What would settle it
Implement the designs produced by A3D for QMCPACK in actual hardware or detailed simulation and check whether they run the quantum chemistry calculations correctly and deliver the promised performance benefits without requiring corrections.
Figures
read the original abstract
Accelerating applications through the design of hardware accelerators can significantly enhance system performance and energy efficiency. Despite advances, such as high-level synthesis (HLS), designing accelerators for complex applications still remains highly labor-intensive, demanding considerable expertise in understanding workloads to be accelerated, hardware design, micro-architecture, and EDA tool usage, posing challenges for application domain experts. Therefore, most accelerator solutions are limited to applications with a regular predictable dataflow. Advances in AI have enabled agents that perform autonomous planning, reasoning, execution and reflection, leading to unprecedented potential for automation through agentic AI. We present A3D, an Agentic AI flow for end-to-end Automation of hardware Accelerator Design. A3D automates workload analysis, performance bottleneck identification, code refactoring for HLS compatibility and micro-architecture generation. A3D also generates diverse accelerator designs by automatically exploring the speed-area tradeoff space. Recent efforts have explored the use of AI for specific tasks such as design space exploration in HLS, leaving several tasks to still be performed manually. A3D addresses the challenges in applying modern LLMs to accelerator design by judiciously partitioning tasks among specialist agents, orchestrating process loops with specialist and verifier agents, utilizing pre-existing and custom tools, and employing agentic RAG for codebase and proprietary EDA tool documentation exploration. Our implementation of A3D, using commercial components like Claude Sonnet 4.5 and the Catapult HLS tool, demonstrates its effectiveness by generating accelerator designs with no human intervention from complex scientific applications like LAMMPS (molecular dynamics simulation) and QMCPACK (quantum chemistry).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces A3D, an agentic AI flow for end-to-end automation of hardware accelerator design. It partitions tasks among specialist and verifier agents, orchestrates loops with agentic RAG for codebase and EDA tool documentation, and uses commercial components (Claude Sonnet 4.5 and Catapult HLS) to perform workload analysis, bottleneck identification, HLS-compatible refactoring, micro-architecture generation, and speed-area tradeoff exploration. The central claim is that this produces correct accelerator designs with no human intervention from complex irregular scientific codes such as LAMMPS (molecular dynamics) and QMCPACK (quantum chemistry).
Significance. If the end-to-end automation and correctness claims hold, the work would be significant for expanding hardware acceleration beyond regular dataflow applications to irregular scientific workloads that currently require substantial manual expertise. The agentic partitioning and RAG approach addresses practical LLM limitations in long-context hardware design tasks and could generalize to other EDA flows. The reliance on existing commercial tools makes the method immediately accessible for replication and extension.
major comments (2)
- [Abstract / Evaluation] Abstract and Evaluation section: the claim of successful generation of accelerator designs with 'no human intervention' is asserted but unsupported by any quantitative metrics (speedup, area, power, latency), error rates, failure-mode analysis, or comparison baselines against manual designs or prior HLS tools. Without these, the data cannot be assessed as supporting the central claim.
- [Implementation and Results] Implementation and Results: the load-bearing assertion that specialist+verifier agents plus agentic RAG produce functionally correct, synthesizable HLS code for irregular codes like LAMMPS and QMCPACK lacks any independent oracle (formal equivalence checking, exhaustive simulation, or post-synthesis gate-level verification) that would detect subtle dataflow, memory-ordering, or semantic errors introduced by the LLM agents.
minor comments (2)
- [Related Work] Related Work: the discussion of prior AI-assisted HLS design-space exploration would benefit from explicit citations and direct comparisons to specific recent systems rather than a general statement.
- [Figures] Figure and table captions: ensure all generated accelerator diagrams and tradeoff plots include clear labels for the original application, the generated micro-architecture, and any synthesis results.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review of our manuscript. The comments highlight important aspects of evaluation and verification that we address point by point below. We have revised the manuscript to incorporate additional evidence and clarifications where feasible.
read point-by-point responses
-
Referee: [Abstract / Evaluation] Abstract and Evaluation section: the claim of successful generation of accelerator designs with 'no human intervention' is asserted but unsupported by any quantitative metrics (speedup, area, power, latency), error rates, failure-mode analysis, or comparison baselines against manual designs or prior HLS tools. Without these, the data cannot be assessed as supporting the central claim.
Authors: We agree that the central claim requires stronger quantitative backing to allow proper assessment. In the revised manuscript we expand the Evaluation section to report concrete metrics for the LAMMPS and QMCPACK accelerators, including achieved speedups relative to software baselines, post-synthesis area and power estimates from Catapult, and latency figures. We also add a failure-mode analysis of iteration counts required by the agentic loops and a direct comparison against designs produced by conventional HLS flows that required manual refactoring. These additions supply the missing quantitative support for the no-human-intervention claim. revision: yes
-
Referee: [Implementation and Results] Implementation and Results: the load-bearing assertion that specialist+verifier agents plus agentic RAG produce functionally correct, synthesizable HLS code for irregular codes like LAMMPS and QMCPACK lacks any independent oracle (formal equivalence checking, exhaustive simulation, or post-synthesis gate-level verification) that would detect subtle dataflow, memory-ordering, or semantic errors introduced by the LLM agents.
Authors: We recognize the importance of independent verification for establishing functional correctness on irregular workloads. The current A3D implementation relies on the verifier agents together with Catapult HLS compilation and simulation against reference testbenches derived from the original applications. In the revision we clarify this verification workflow, describe the specific simulation checks performed to detect dataflow and memory-ordering discrepancies, and report any discrepancies observed during the agentic process. We also explicitly note the absence of formal equivalence checking or exhaustive gate-level verification as a limitation of the present study and outline it as future work, thereby addressing the concern without overstating the strength of the current evidence. revision: partial
Circularity Check
No circularity: system description and implementation results contain no self-referential derivations
full rationale
The paper presents a descriptive account of the A3D agentic flow, its partitioning into specialist and verifier agents, use of RAG, and reported end-to-end runs on LAMMPS and QMCPACK using Claude Sonnet and Catapult HLS. No equations, fitted parameters, or mathematical derivations appear in the provided text. The central claim of generating designs with no human intervention rests on the described implementation and tool usage rather than any step that reduces by construction to its own inputs or to a self-citation chain. No load-bearing premise is justified solely by prior work from the same authors, and the absence of a formal derivation chain makes circularity patterns inapplicable.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Modern LLMs can be partitioned into specialist agents whose outputs can be verified and corrected by other agents sufficiently well to produce correct HLS-compatible designs for complex codes.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean, AlexanderDuality.leanreality_from_one_distinction, alexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Table 1: Coverage of HLS workflow stages... A3D is the first to automate the full pipeline from application analysis through design space exploration
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, et al . 2024. Longbench: A bilingual, multitask benchmark for long context understanding. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). 3119–3137
work page 2024
-
[2]
Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Golnari, David A Clifton, et al. 2024. Renaissance: A survey into ai text-to-image generation in the era of large model.IEEE transactions on pattern analysis and machine intelligence47, 3 (2024), 2212–2231
work page 2024
-
[3]
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.IEEE journal of solid-state circuits52, 1 (2016), 127–138
work page 2016
-
[4]
Andrew A Chien, Allan Snavely, and Mark Gahagan. 2011. 10x10: A general- purpose architectural approach to heterogeneity and energy efficiency.Procedia Computer Science4 (2011), 1987–1996
work page 2011
-
[5]
Luca Collini, Siddharth Garg, and Ramesh Karri. 2025. C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap.ACM Transactions on Design Automation of Electronic Systems30, 6 (2025), 1–24
work page 2025
-
[6]
Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong, Huixia Li, Jiashi Li, Liang Li, Xiaojie Li, et al. 2025. Seedance 1.0: Exploring the boundaries of video generation models.arXiv preprint arXiv:2506.09113(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, Hiroaki Takada, and Katsuya Ishii. 2008. Chstone: A benchmark program suite for practical c-based high-level synthesis. In2008 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1192–1195
work page 2008
-
[8]
John L Hennessy and David A Patterson. 2019. A new golden age for computer architecture.Commun. ACM62, 2 (2019), 48–60
work page 2019
-
[9]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, chal- lenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55
work page 2025
-
[10]
Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2026. A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology35, 2 (2026), 1–72
work page 2026
-
[11]
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2024. Swe-bench: Can language models resolve real-world github issues?. InInternational Conference on Learning Representations, Vol. 2024. 54107–54157
work page 2024
-
[12]
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. InProceedings of the 44th annual international symposium on computer architecture. 1–12
work page 2017
-
[13]
Paul RC Kent, Abdulgani Annaberdiyev, Anouar Benali, M Chandler Bennett, Edgar Josué Landinez Borda, Peter Doak, Hongxia Hao, Kenneth D Jordan, Jaron T Krogel, Ilkka Kylänpää, et al. 2020. QMCPACK: Advances in the develop- ment, efficiency, and application of auxiliary field and real-space variational and diffusion quantum Monte Carlo.The Journal of chemi...
work page 2020
-
[14]
M Khan, Nowfel Mashnoor, Mohammad Akyash, Kimia Azar, and Hadi Kamali
- [15]
-
[16]
Sakari Lahti, Panu Sjövall, Jarno Vanne, and Timo D Hämäläinen. 2018. Are we there yet? A study on the state of high-level synthesis.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems38, 5 (2018), 898–911
work page 2018
-
[17]
Runkai Li, Jia Xiong, Xiuyuan He, Jiaqi Lv, Jieru Zhao, and Xi Wang. 2025. ChatHLS: Towards Systematic Design Automation and Optimization for High- Level Synthesis.arXiv preprint arXiv:2507.00642(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Grant Martin and Gary Smith. 2009. High-level synthesis: Past, present, and future.IEEE Design & Test of Computers26, 4 (2009), 18–25
work page 2009
-
[19]
Alexander Novikov, Ngân V ˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. 2025. AlphaEvolve: A coding agent for scientific an...
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [20]
- [21]
-
[22]
Qwen Team. 2026. Qwen3.5: Towards Native Multimodal Agents. https://qwen. ai/blog?id=qwen3.5
work page 2026
-
[23]
Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, and David Brooks. 2014. MachSuite: Benchmarks for accelerator design and customized architectures. In2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 110–119
work page 2014
-
[24]
RooCodeInc. [n. d.]. Roo-Code. https://github.com/RooCodeInc/Roo-Code GitHub repository
-
[25]
Nikola Samardzic, Axel Feldmann, Aleksandar Krastev, Srinivas Devadas, Ronald Dreslinski, Christopher Peikert, and Daniel Sanchez. 2021. F1: A fast and pro- grammable accelerator for fully homomorphic encryption. InMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 238–252
work page 2021
-
[26]
Benjamin Carrion Schafer and Zi Wang. 2019. High-level synthesis design space exploration: Past, present, and future.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems39, 10 (2019), 2628–2639
work page 2019
-
[27]
Atefeh Sohrabizadeh, Cody Hao Yu, Min Gao, and Jason Cong. 2022. AutoDSE: Enabling software programmers to design efficient FPGA accelerators.ACM Transactions on Design Automation of Electronic Systems (TODAES)27, 4 (2022), 1–27
work page 2022
-
[28]
Kling Team, Jialu Chen, Yuanzheng Ci, Xiangyu Du, Zipeng Feng, Kun Gai, Sainan Guo, Feng Han, Jingbin He, Kang He, et al. 2025. Kling-Omni Technical Report.arXiv preprint arXiv:2512.16776(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton. 2022. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales.Comp. Phys. Comm.2...
- [30]
-
[31]
Jing Wang, Shang Liu, Yao Lu, and Zhiyao Xie. 2025. HLSDebugger: Identification and Correction of Logic Bugs in HLS Code with LLM Solutions. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1–9. A3D: Agentic AI flow for autonomous Accelerator Design
work page 2025
-
[32]
Chenwei Xiong, Cheng Liu, Huawei Li, and Xiaowei Li. 2024. Hlspilot: Llm- based high-level synthesis. InProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design. 1–9
work page 2024
-
[33]
Kangwei Xu, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, and Bing Li. 2024. Automated c/c++ program repair for high-level synthesis via large language models. InProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD. 1–9
work page 2024
-
[34]
Kangwei Xu, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, and Bing Li. 2025. HLSRewriter: Efficient Refactoring and Optimization of C/C++ Code with LLMs for High-Level Synthesis.ACM Transactions on Design Automation of Electronic Systems(2025)
work page 2025
-
[35]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629(2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[36]
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. 2025. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models.arXiv preprint arXiv:2303.182231, 2 (2023), 1–124
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.