pith. sign in

arxiv: 2605.15237 · v1 · pith:RZXN3SBPnew · submitted 2026-05-14 · 💻 cs.AR · cs.AI

A3D: Agentic AI flow for autonomous Accelerator Design

Pith reviewed 2026-05-19 16:17 UTC · model grok-4.3

classification 💻 cs.AR cs.AI
keywords accelerator designagentic AIhigh-level synthesisLLM agentshardware automationscientific computingworkload analysisdesign space exploration
0
0 comments X

The pith

An agentic AI flow automates the entire hardware accelerator design process for complex applications with no human intervention.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to show that agentic AI can take over the labor-intensive tasks of hardware accelerator creation. Specialist agents handle workload analysis and bottleneck identification while others refactor code for high-level synthesis and generate micro-architectures. Verifier agents and retrieval mechanisms help maintain accuracy. This would allow application experts in fields like molecular dynamics to create efficient accelerators themselves rather than relying on hardware specialists.

Core claim

The paper claims that by partitioning the accelerator design process among multiple specialist AI agents and verifier agents, orchestrating iterative loops, and incorporating agentic retrieval-augmented generation to access code and tool documentation, it is possible to achieve fully autonomous generation of accelerator designs. This is evidenced by the system's ability to produce designs for LAMMPS and QMCPACK applications using Claude Sonnet and Catapult HLS with zero human input, while also exploring design tradeoffs.

What carries the argument

The A3D agentic AI flow, which partitions tasks among specialist agents and verifier agents, orchestrates process loops, deploys pre-existing and custom tools, and applies agentic RAG to explore relevant documentation and code.

If this is right

  • Accelerator designs can be created for irregular scientific codes that previously required extensive manual work.
  • Design space exploration for speed versus area occurs automatically during the process.
  • Application domain experts gain the ability to produce their own accelerators without deep hardware knowledge.
  • The overall effort needed to develop accelerators is lowered through full automation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could be extended to automate other aspects of the hardware design flow, such as verification and testing.
  • Improvements in base LLM capabilities would likely increase the complexity of applications that A3D can handle successfully.
  • It suggests a path toward more accessible custom hardware for scientific computing communities.
  • Integration with simulation feedback loops might allow iterative refinement of designs based on measured performance.

Load-bearing premise

LLMs can be made reliable for the full accelerator design pipeline through agent partitioning and retrieval augmentation even when applied to complex irregular scientific codes.

What would settle it

Implement the designs produced by A3D for QMCPACK in actual hardware or detailed simulation and check whether they run the quantum chemistry calculations correctly and deliver the promised performance benefits without requiring corrections.

Figures

Figures reproduced from arXiv: 2605.15237 by Abinand Nallathambi, Anand Raghunathan, Christopher Knight, Shantanu Ganguly, Wilfried Haensch.

Figure 1
Figure 1. Figure 1: A3D: Agentic AI flow for Accelerator Design. (a) End-to-end multi-agent pipeline spanning three phases: Analysis, [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: HLS Preparer agent refactoring floating-point types [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Specialist–Verifier iterative process loop. The veri [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Success rates for the HLS preparation phase under [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: CodeQL query dynamically generated by the Bot [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A subset of the data structures involved in [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Excerpt of the YAML design space specification [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Design space exploration results for Torsion_Angles(). Each point represents a successfully synthesized configuration; marker size encodes synthesis time (1.5–25 h). Green markers indicate the 7 Pareto-optimal designs spanning a 25× latency–2.1× area tradeoff [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
read the original abstract

Accelerating applications through the design of hardware accelerators can significantly enhance system performance and energy efficiency. Despite advances, such as high-level synthesis (HLS), designing accelerators for complex applications still remains highly labor-intensive, demanding considerable expertise in understanding workloads to be accelerated, hardware design, micro-architecture, and EDA tool usage, posing challenges for application domain experts. Therefore, most accelerator solutions are limited to applications with a regular predictable dataflow. Advances in AI have enabled agents that perform autonomous planning, reasoning, execution and reflection, leading to unprecedented potential for automation through agentic AI. We present A3D, an Agentic AI flow for end-to-end Automation of hardware Accelerator Design. A3D automates workload analysis, performance bottleneck identification, code refactoring for HLS compatibility and micro-architecture generation. A3D also generates diverse accelerator designs by automatically exploring the speed-area tradeoff space. Recent efforts have explored the use of AI for specific tasks such as design space exploration in HLS, leaving several tasks to still be performed manually. A3D addresses the challenges in applying modern LLMs to accelerator design by judiciously partitioning tasks among specialist agents, orchestrating process loops with specialist and verifier agents, utilizing pre-existing and custom tools, and employing agentic RAG for codebase and proprietary EDA tool documentation exploration. Our implementation of A3D, using commercial components like Claude Sonnet 4.5 and the Catapult HLS tool, demonstrates its effectiveness by generating accelerator designs with no human intervention from complex scientific applications like LAMMPS (molecular dynamics simulation) and QMCPACK (quantum chemistry).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces A3D, an agentic AI flow for end-to-end automation of hardware accelerator design. It partitions tasks among specialist and verifier agents, orchestrates loops with agentic RAG for codebase and EDA tool documentation, and uses commercial components (Claude Sonnet 4.5 and Catapult HLS) to perform workload analysis, bottleneck identification, HLS-compatible refactoring, micro-architecture generation, and speed-area tradeoff exploration. The central claim is that this produces correct accelerator designs with no human intervention from complex irregular scientific codes such as LAMMPS (molecular dynamics) and QMCPACK (quantum chemistry).

Significance. If the end-to-end automation and correctness claims hold, the work would be significant for expanding hardware acceleration beyond regular dataflow applications to irregular scientific workloads that currently require substantial manual expertise. The agentic partitioning and RAG approach addresses practical LLM limitations in long-context hardware design tasks and could generalize to other EDA flows. The reliance on existing commercial tools makes the method immediately accessible for replication and extension.

major comments (2)
  1. [Abstract / Evaluation] Abstract and Evaluation section: the claim of successful generation of accelerator designs with 'no human intervention' is asserted but unsupported by any quantitative metrics (speedup, area, power, latency), error rates, failure-mode analysis, or comparison baselines against manual designs or prior HLS tools. Without these, the data cannot be assessed as supporting the central claim.
  2. [Implementation and Results] Implementation and Results: the load-bearing assertion that specialist+verifier agents plus agentic RAG produce functionally correct, synthesizable HLS code for irregular codes like LAMMPS and QMCPACK lacks any independent oracle (formal equivalence checking, exhaustive simulation, or post-synthesis gate-level verification) that would detect subtle dataflow, memory-ordering, or semantic errors introduced by the LLM agents.
minor comments (2)
  1. [Related Work] Related Work: the discussion of prior AI-assisted HLS design-space exploration would benefit from explicit citations and direct comparisons to specific recent systems rather than a general statement.
  2. [Figures] Figure and table captions: ensure all generated accelerator diagrams and tradeoff plots include clear labels for the original application, the generated micro-architecture, and any synthesis results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review of our manuscript. The comments highlight important aspects of evaluation and verification that we address point by point below. We have revised the manuscript to incorporate additional evidence and clarifications where feasible.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Abstract and Evaluation section: the claim of successful generation of accelerator designs with 'no human intervention' is asserted but unsupported by any quantitative metrics (speedup, area, power, latency), error rates, failure-mode analysis, or comparison baselines against manual designs or prior HLS tools. Without these, the data cannot be assessed as supporting the central claim.

    Authors: We agree that the central claim requires stronger quantitative backing to allow proper assessment. In the revised manuscript we expand the Evaluation section to report concrete metrics for the LAMMPS and QMCPACK accelerators, including achieved speedups relative to software baselines, post-synthesis area and power estimates from Catapult, and latency figures. We also add a failure-mode analysis of iteration counts required by the agentic loops and a direct comparison against designs produced by conventional HLS flows that required manual refactoring. These additions supply the missing quantitative support for the no-human-intervention claim. revision: yes

  2. Referee: [Implementation and Results] Implementation and Results: the load-bearing assertion that specialist+verifier agents plus agentic RAG produce functionally correct, synthesizable HLS code for irregular codes like LAMMPS and QMCPACK lacks any independent oracle (formal equivalence checking, exhaustive simulation, or post-synthesis gate-level verification) that would detect subtle dataflow, memory-ordering, or semantic errors introduced by the LLM agents.

    Authors: We recognize the importance of independent verification for establishing functional correctness on irregular workloads. The current A3D implementation relies on the verifier agents together with Catapult HLS compilation and simulation against reference testbenches derived from the original applications. In the revision we clarify this verification workflow, describe the specific simulation checks performed to detect dataflow and memory-ordering discrepancies, and report any discrepancies observed during the agentic process. We also explicitly note the absence of formal equivalence checking or exhaustive gate-level verification as a limitation of the present study and outline it as future work, thereby addressing the concern without overstating the strength of the current evidence. revision: partial

Circularity Check

0 steps flagged

No circularity: system description and implementation results contain no self-referential derivations

full rationale

The paper presents a descriptive account of the A3D agentic flow, its partitioning into specialist and verifier agents, use of RAG, and reported end-to-end runs on LAMMPS and QMCPACK using Claude Sonnet and Catapult HLS. No equations, fitted parameters, or mathematical derivations appear in the provided text. The central claim of generating designs with no human intervention rests on the described implementation and tool usage rather than any step that reduces by construction to its own inputs or to a self-citation chain. No load-bearing premise is justified solely by prior work from the same authors, and the absence of a formal derivation chain makes circularity patterns inapplicable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the unverified assumption that current LLMs can be reliably orchestrated for engineering tasks whose correctness is difficult to verify automatically.

axioms (1)
  • domain assumption Modern LLMs can be partitioned into specialist agents whose outputs can be verified and corrected by other agents sufficiently well to produce correct HLS-compatible designs for complex codes.
    Invoked when the paper states that judicious partitioning and verifier agents address the challenges of applying LLMs to accelerator design.

pith-pipeline@v0.9.0 · 5832 in / 1278 out tokens · 79212 ms · 2026-05-19T16:17:54.867984+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 7 internal anchors

  1. [1]

    Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, et al . 2024. Longbench: A bilingual, multitask benchmark for long context understanding. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). 3119–3137

  2. [2]

    Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Golnari, David A Clifton, et al. 2024. Renaissance: A survey into ai text-to-image generation in the era of large model.IEEE transactions on pattern analysis and machine intelligence47, 3 (2024), 2212–2231

  3. [3]

    Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.IEEE journal of solid-state circuits52, 1 (2016), 127–138

  4. [4]

    Andrew A Chien, Allan Snavely, and Mark Gahagan. 2011. 10x10: A general- purpose architectural approach to heterogeneity and energy efficiency.Procedia Computer Science4 (2011), 1987–1996

  5. [5]

    Luca Collini, Siddharth Garg, and Ramesh Karri. 2025. C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap.ACM Transactions on Design Automation of Electronic Systems30, 6 (2025), 1–24

  6. [6]

    Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong, Huixia Li, Jiashi Li, Liang Li, Xiaojie Li, et al. 2025. Seedance 1.0: Exploring the boundaries of video generation models.arXiv preprint arXiv:2506.09113(2025)

  7. [7]

    Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, Hiroaki Takada, and Katsuya Ishii. 2008. Chstone: A benchmark program suite for practical c-based high-level synthesis. In2008 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1192–1195

  8. [8]

    John L Hennessy and David A Patterson. 2019. A new golden age for computer architecture.Commun. ACM62, 2 (2019), 48–60

  9. [9]

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, chal- lenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55

  10. [10]

    Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2026. A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology35, 2 (2026), 1–72

  11. [11]

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2024. Swe-bench: Can language models resolve real-world github issues?. InInternational Conference on Learning Representations, Vol. 2024. 54107–54157

  12. [12]

    Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. InProceedings of the 44th annual international symposium on computer architecture. 1–12

  13. [13]

    Paul RC Kent, Abdulgani Annaberdiyev, Anouar Benali, M Chandler Bennett, Edgar Josué Landinez Borda, Peter Doak, Hongxia Hao, Kenneth D Jordan, Jaron T Krogel, Ilkka Kylänpää, et al. 2020. QMCPACK: Advances in the develop- ment, efficiency, and application of auxiliary field and real-space variational and diffusion quantum Monte Carlo.The Journal of chemi...

  14. [14]

    M Khan, Nowfel Mashnoor, Mohammad Akyash, Kimia Azar, and Hadi Kamali

  15. [15]

    SAGE-HLS: Syntax-Aware AST-Guided LLM for High-Level Synthesis Code Generation.arXiv preprint arXiv:2508.03558(2025)

  16. [16]

    Sakari Lahti, Panu Sjövall, Jarno Vanne, and Timo D Hämäläinen. 2018. Are we there yet? A study on the state of high-level synthesis.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems38, 5 (2018), 898–911

  17. [17]

    Runkai Li, Jia Xiong, Xiuyuan He, Jiaqi Lv, Jieru Zhao, and Xi Wang. 2025. ChatHLS: Towards Systematic Design Automation and Optimization for High- Level Synthesis.arXiv preprint arXiv:2507.00642(2025)

  18. [18]

    Grant Martin and Gary Smith. 2009. High-level synthesis: Past, present, and future.IEEE Design & Test of Computers26, 4 (2009), 18–25

  19. [19]

    Alexander Novikov, Ngân V ˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. 2025. AlphaEvolve: A coding agent for scientific an...

  20. [20]

    Ali Emre Oztas and Mahdi Jelodari. 2024. Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024).arXiv preprint arXiv:2412.01604(2024)

  21. [21]

    Neha Prakriya, Zijian Ding, Yizhou Sun, and Jason Cong. 2025. LIFT: Llm- based pragma insertion for HLS via GNN supervised fine-tuning.arXiv preprint arXiv:2504.21187(2025)

  22. [22]

    Qwen Team. 2026. Qwen3.5: Towards Native Multimodal Agents. https://qwen. ai/blog?id=qwen3.5

  23. [23]

    Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, and David Brooks. 2014. MachSuite: Benchmarks for accelerator design and customized architectures. In2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 110–119

  24. [24]

    RooCodeInc. [n. d.]. Roo-Code. https://github.com/RooCodeInc/Roo-Code GitHub repository

  25. [25]

    Nikola Samardzic, Axel Feldmann, Aleksandar Krastev, Srinivas Devadas, Ronald Dreslinski, Christopher Peikert, and Daniel Sanchez. 2021. F1: A fast and pro- grammable accelerator for fully homomorphic encryption. InMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 238–252

  26. [26]

    Benjamin Carrion Schafer and Zi Wang. 2019. High-level synthesis design space exploration: Past, present, and future.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems39, 10 (2019), 2628–2639

  27. [27]

    Atefeh Sohrabizadeh, Cody Hao Yu, Min Gao, and Jason Cong. 2022. AutoDSE: Enabling software programmers to design efficient FPGA accelerators.ACM Transactions on Design Automation of Electronic Systems (TODAES)27, 4 (2022), 1–27

  28. [28]

    Kling Team, Jialu Chen, Yuanzheng Ci, Xiangyu Du, Zipeng Feng, Kun Gai, Sainan Guo, Feng Han, Jingbin He, Kang He, et al. 2025. Kling-Omni Technical Report.arXiv preprint arXiv:2512.16776(2025)

  29. [29]

    A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton. 2022. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales.Comp. Phys. Comm.2...

  30. [30]

    Hanyu Wang, Xinrui Wu, Zijian Ding, Su Zheng, Chengyue Wang, Tony Nowatzki, Yizhou Sun, and Jason Cong. 2025. LLM-DSE: Searching Accelerator Parameters with LLM Agents.arXiv preprint arXiv:2505.12188(2025)

  31. [31]

    Jing Wang, Shang Liu, Yao Lu, and Zhiyao Xie. 2025. HLSDebugger: Identification and Correction of Logic Bugs in HLS Code with LLM Solutions. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1–9. A3D: Agentic AI flow for autonomous Accelerator Design

  32. [32]

    Chenwei Xiong, Cheng Liu, Huawei Li, and Xiaowei Li. 2024. Hlspilot: Llm- based high-level synthesis. InProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design. 1–9

  33. [33]

    Kangwei Xu, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, and Bing Li. 2024. Automated c/c++ program repair for high-level synthesis via large language models. InProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD. 1–9

  34. [34]

    Kangwei Xu, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, and Bing Li. 2025. HLSRewriter: Efficient Refactoring and Optimization of C/C++ Code with LLMs for High-Level Synthesis.ACM Transactions on Design Automation of Electronic Systems(2025)

  35. [35]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629(2022)

  36. [36]

    Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. 2025. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176(2025)

  37. [37]

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models.arXiv preprint arXiv:2303.182231, 2 (2023), 1–124