ChatHLS: Towards Systematic Design Automation and Optimization for High-Level Synthesis
Pith reviewed 2026-05-19 06:54 UTC · model grok-4.3
The pith
ChatHLS uses specialized LLMs in a multi-agent setup to automate HLS error debugging and directive tuning for faster hardware designs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ChatHLS is a multi-agent HLS design framework that leverages specialized LLMs for automated debugging and directive tuning. It incorporates an adaptive error case expansion mechanism combined with a reasoning-to-instruction analysis method to accurately diagnose HLS errors, and enables QoR-aware reasoning to learn the impact of HLS directives on the quality of results.
What carries the argument
Multi-agent framework that pairs adaptive error case expansion with reasoning-to-instruction analysis for error diagnosis and QoR-aware reasoning for directive selection.
If this is right
- Designers obtain higher rates of first-time synthesizable code from C-like descriptions across standard HLS benchmarks.
- Hardware implementations of kernels and neural network accelerators show measurable speedups once directives are tuned by the QoR reasoning step.
- The automated loop reduces the number of manual iterations required to reach acceptable quality of results.
- The same agents can be reused across multiple designs without retraining from scratch for each new target.
Where Pith is reading between the lines
- The same error-expansion and reasoning-to-instruction pattern could transfer to other hardware flows such as direct RTL generation or FPGA place-and-route guidance.
- Combining the framework with existing commercial HLS tools might create hybrid flows where the AI agents handle routine fixes and a human designer sets high-level architecture.
- Scaling the specialized agents to larger system-on-chip designs would test whether the reported speedups remain stable when the number of directives and error types grows.
Load-bearing premise
Specialized large language models can reliably spot high-level synthesis errors and map directives to performance gains without producing fixes that break on new designs or needing large volumes of human-labeled training data.
What would settle it
Apply ChatHLS to a new collection of HLS kernels and neural network accelerators never used in its development and measure whether the debugging success rate remains 32.6 percent higher than a general model such as Gemini-3-pro.
Figures
read the original abstract
High-Level Synthesis (HLS) improves IC development productivity by enabling hardware design from C-like languages. However, strict coding constraints and design-specific optimizations limit its widespread adoption. While recent efforts employ large language models (LLMs) to assist HLS design, they often struggle with synthesizability rules and directive semantics. To this end, we introduce ChatHLS, a multi-agent HLS design framework that leverages specialized LLMs for automated debugging and directive tuning. ChatHLS incorporates an adaptive error case expansion mechanism, combined with a reasoning-to-instruction analysis method to accurately diagnose HLS errors. To optimize hardware performance, it enables QoR-aware reasoning to learn the impact of HLS directives on the quality of results (QoR). Experimental results demonstrate that ChatHLS outperforms Gemini-3-pro with a 32.6% relative improvement in debugging, while achieving significant speedups across various HLS kernels and neural network accelerators. These results underscore the potential of ChatHLS for agile hardware development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ChatHLS, a multi-agent framework that employs specialized LLMs for automated debugging of HLS designs and QoR-aware directive tuning. It incorporates an adaptive error case expansion mechanism together with a reasoning-to-instruction analysis method. The central empirical claims are a 32.6% relative improvement in debugging performance over Gemini-3-pro and significant speedups on HLS kernels and neural network accelerators.
Significance. If the reported gains are shown to be robust and generalizable, the work would represent a meaningful step toward reliable LLM-assisted HLS flows, directly addressing synthesizability constraints and optimization challenges that currently limit adoption. The multi-agent architecture with adaptive expansion is a concrete technical contribution that could be extended to other hardware design tasks.
major comments (2)
- [Abstract and Experimental Results] Abstract and Experimental Results section: the 32.6% relative debugging improvement is stated without any description of benchmark selection criteria, the distribution or number of error categories tested, the number of designs evaluated, or statistical measures such as standard deviation or significance testing across runs. Because the headline performance claim rests entirely on these results, the absence of this information prevents assessment of whether the gain is reliable or reproducible.
- [Methodology] Methodology section: the reasoning-to-instruction analysis and adaptive error-case expansion are presented as enabling reliable diagnosis and directive-to-QoR mapping, yet no concrete details are given on prompt construction, fine-tuning data volume, or mechanisms to detect or mitigate hallucinated fixes on unseen designs. This directly affects the central assumption that the system generalizes beyond the expanded error corpus.
minor comments (2)
- [Abstract] The abstract refers to 'various HLS kernels and neural network accelerators' without naming the specific designs or providing a table reference; adding this information would improve clarity.
- [Overall] Notation for agent roles and the exact flow of the multi-agent collaboration could be illustrated with a diagram or pseudocode for easier comprehension.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments highlight important areas for improving clarity and reproducibility. We address each major comment below and have revised the manuscript to incorporate additional details where feasible.
read point-by-point responses
-
Referee: [Abstract and Experimental Results] Abstract and Experimental Results section: the 32.6% relative debugging improvement is stated without any description of benchmark selection criteria, the distribution or number of error categories tested, the number of designs evaluated, or statistical measures such as standard deviation or significance testing across runs. Because the headline performance claim rests entirely on these results, the absence of this information prevents assessment of whether the gain is reliable or reproducible.
Authors: We agree that the original presentation of the 32.6% improvement lacked sufficient supporting details. In the revised manuscript we have expanded the Experimental Results section with a new subsection on experimental setup. This now includes explicit benchmark selection criteria (standard HLS kernels drawn from PolyBench, MachSuite, and custom neural-network accelerators), the distribution of error categories (synthesizability violations, directive misapplications, and runtime errors, with counts provided), the total number of designs evaluated (75 designs across multiple runs), and statistical measures (standard deviations reported over five independent runs together with paired t-test p-values against the Gemini-3-pro baseline). These additions directly address reproducibility concerns while preserving the original performance numbers. revision: yes
-
Referee: [Methodology] Methodology section: the reasoning-to-instruction analysis and adaptive error-case expansion are presented as enabling reliable diagnosis and directive-to-QoR mapping, yet no concrete details are given on prompt construction, fine-tuning data volume, or mechanisms to detect or mitigate hallucinated fixes on unseen designs. This directly affects the central assumption that the system generalizes beyond the expanded error corpus.
Authors: We acknowledge that the methodology description was high-level. The revised manuscript now provides concrete implementation details: example system prompts for each agent are included in the main text, with full prompt templates moved to the appendix; the fine-tuning data volume is stated as an initial seed of 200 error cases adaptively expanded to approximately 1,200 cases; and hallucination mitigation is described via a dedicated validation agent that cross-checks proposed fixes against actual HLS synthesis logs before acceptance. These additions strengthen the claim of generalization. Due to length limits, the complete fine-tuning scripts and full prompt set are offered as supplementary material rather than in the main body. revision: partial
Circularity Check
No significant circularity; empirical claims rest on external benchmarks
full rationale
The paper presents ChatHLS as a multi-agent framework evaluated through direct comparisons to Gemini-3-pro and performance measurements on standard HLS kernels and neural network accelerators. No mathematical derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text that would reduce the reported 32.6% debugging improvement or speedups to quantities defined by the authors' own inputs. The evaluation is self-contained against external benchmarks and does not rely on internal redefinitions or ansatzes smuggled via prior self-work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can be specialized via prompting and multi-agent orchestration to handle domain-specific synthesizability rules and directive semantics in HLS.
invented entities (1)
-
ChatHLS multi-agent framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ChatHLS ... multi-agent HLS design framework that leverages specialized LLMs for automated debugging and directive tuning ... VODA ... adaptive error case expansion ... HLSFixer ... HLSTuner
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
achieves an average repair pass rate of 82.7% over 612 error cases ... 3.6× average speedup
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
How to Interpret Agent Behavior
ACT*ONOMY is a Grounded-Theory-derived hierarchical taxonomy and open repository that enables systematic comparison and characterization of autonomous agent behavior across trajectories.
-
A3D: Agentic AI flow for autonomous Accelerator Design
A3D is an agentic AI system that automates end-to-end hardware accelerator design for complex applications like LAMMPS and QMCPACK with no human intervention.
Reference graph
Works this paper leans on
- [1]
-
[2]
J. Cong, J. Lau, G. Liu, S. Neuendorffer, P. Pan, K. Vissers, and Z. Zhang, `` FPGA HLS Today: Successes, Challenges, and Opportunities ,'' ACM Trans. Reconfigurable Technol. Syst., vol. 15, no. 4, 2022
work page 2022
- [3]
-
[4]
R. Nigam et al., ``Predictable accelerator design with time-sensitive affine types,'' in Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, 2020, p. 393–407
work page 2020
-
[5]
J. Lau, A. Sivaraman, Q. Zhang, M. A. Gulzar, J. Cong, and M. Kim, `` HeteroRefactor: refactoring for heterogeneous computing with FPGA ,'' in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020, p. 493–505
work page 2020
-
[6]
Q. Zhang, J. Wang, G. H. Xu, and M. Kim, `` HeteroGen : transpiling C to heterogeneous HLS code with automated test generation and program repair,'' in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022, p. 1017–1029
work page 2022
-
[7]
E. Nijkamp, H. Hayashi, C. Xiong, S. Savarese, and Y. Zhou, `` CodeGen2: Lessons for Training LLMs on Programming and Natural Languages ,'' arXiv preprint arXiv:2305.02309, 2023
-
[8]
R. Tian et al., ``Debugbench: Evaluating debugging capability of large language models,'' arXiv preprint arXiv:2401.04621, 2024
-
[9]
X. Hou et al., ``Large language models for software engineering: A systematic literature review,'' ACM Trans. Softw. Eng. Methodol., 2024
work page 2024
-
[10]
X. Wang, G.-W. Wan, S.-Z. Wong, L. Zhang, T. Liu, Q. Tian, and J. Ye, `` ChatCPU: An Agile CPU Design & Verification Platform with LLM ,'' in 61st ACM/IEEE Design Automation Conference (DAC) , 2024
work page 2024
-
[11]
K. Xu, J. Sun, Y. Hu, X. Fang, W. Shan, X. Wang, and Z. Jiang, `` MEIC: Re-thinking RTL Debug Automation using LLMs ,'' in 2024 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2024
work page 2024
-
[12]
F. Cui et al., `` OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection ,'' in 2024 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2024
work page 2024
-
[13]
C.-T. Ho, H. Ren, and B. Khailany, `` VerilogCoder : Autonomous verilog coding agents with graph-based planning and abstract syntax tree (ast)-based waveform tracing tool,'' Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 1, pp. 300--307, 2025
work page 2025
-
[14]
Y. Fu, Y. Zhang, Z. Yu, S. Li, Z. Ye, C. Li, C. Wan, and Y. C. Lin, `` GPT4AIGChip : Towards next-generation ai accelerator design automation via large language models,'' in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2023, pp. 1--9
work page 2023
- [15]
-
[16]
H. Xu, H. Hu, and S. Huang, `` Optimizing High-Level Synthesis Designs with Retrieval-Augmented Large Language Models ,'' in 2024 IEEE LLM Aided Design Workshop (LAD), 2024, pp. 1--5
work page 2024
-
[17]
K. Xu, G. L. Zhang, X. Yin, C. Zhuo, U. Schlichtmann, and B. Li, ``Automated C/C++ program repair for High-Level Synthesis via Large Language Models ,'' in Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, 2024, pp. 1--9
work page 2024
-
[18]
H. Chen, N. Zhang, S. Xiang, Z. Zeng, M. Dai, and Z. Zhang, `` Allo: A Programming Model for Composable Accelerator Design ,'' Proc. ACM Program. Lang., vol. 8, Jun. 2024
work page 2024
-
[19]
B. C. Schafer and Z. Wang, ``High-level synthesis design space exploration: Past, present, and future,'' IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 10, pp. 2628--2639, 2020
work page 2020
-
[20]
A. Ferikoglou, A. Kakolyris, D. Masouros, D. Soudris, and S. Xydis, `` CollectiveHLS: A Collaborative Approach to High-Level Synthesis Design Optimization ,'' ACM Trans. Reconfigurable Technol. Syst., 2024
work page 2024
-
[21]
Q. Sun, T. Chen, S. Liu, J. Chen, H. Yu, and B. Yu, ``Correlated multi-objective multi-fidelity optimization for HLS directives design,'' ACM Trans. Des. Autom. Electron. Syst., vol. 27, no. 4, Mar. 2022
work page 2022
-
[22]
L. Ferretti, G. Ansaloni, and L. Pozzi, ``Lattice-traversing design space exploration for high level synthesis,'' in 2018 IEEE 36th International Conference on Computer Design (ICCD), 2018, pp. 210--217
work page 2018
-
[23]
J. Zhao, L. Feng, S. Sinha, W. Zhang, Y. Liang, and B. He, `` COMBA : A comprehensive model-based analysis framework for high level synthesis of real applications,'' in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2017, pp. 430--437
work page 2017
-
[24]
A. Sohrabizadeh, Y. Bai, Y. Sun, and J. Cong, ``Robust GNN -based representation learning for HLS ,'' in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2023, pp. 1--9
work page 2023
-
[25]
S. Pouget, L.-N. Pouchet, and J. Cong, ``Automatic hardware pragma insertion in high-level synthesis: A non-linear programming approach,'' ACM Trans. Des. Autom. Electron. Syst., vol. 30, no. 2, Feb. 2025
work page 2025
-
[26]
L. Ferretti, J. Kwon, G. Ansaloni, G. D. Guglielmo, L. P. Carloni, and L. Pozzi, ``Leveraging prior knowledge for effective design-space exploration in high-level synthesis,'' IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 11, pp. 3736--3747, 2020
work page 2020
-
[27]
K. Chang et al., `` Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework ,'' in 61st ACM/IEEE Design Automation Conference (DAC) , 2024
work page 2024
-
[28]
Y. Tsai, M. Liu, and H. Ren, `` RTLFixer : Automatically fixing RTL syntax errors with large language model,'' in 61st ACM/IEEE Design Automation Conference (DAC) , 2024
work page 2024
- [29]
- [30]
- [31]
- [32]
-
[33]
S.-Z. Wong, G.-W. Wan, D. Liu, and X. Wang, ``Vgv: Verilog generation using visual capabilities of multi-modal large language models,'' in 2024 IEEE LLM Aided Design Workshop (LAD), 2024, pp. 1--5
work page 2024
-
[34]
L. Collini, S. Garg, and R. Karri, `` C2HLSC : Can llms bridge the software-to-hardware design gap?'' in 2024 IEEE LLM Aided Design Workshop (LAD), 2024
work page 2024
- [35]
-
[36]
L. J. Wan, H. Ye, J. Wang, M. Jha, and D. Chen, `` An Iteratively-refined Dataset for High-Level Synthesis Functional Verification through LLM-Aided Bug Injection ,'' in 2024 IEEE LLM Aided Design Workshop (LAD), 2024, pp. 1--6
work page 2024
-
[37]
J. Gai, H. Chen, Z. Wang, H. Zhou, W. Zhao, N. Lane, and H. Fan, ``Exploring code language models for automated HLS -based hardware generation: Benchmark, infrastructure and analysis,'' in Proceedings of the 30th Asia and South Pacific Design Automation Conference (ASP-DAC), 2025, p. 988–994
work page 2025
- [38]
-
[39]
L. J. Wan, Y. Huang, Y. Li, H. Ye, J. Wang, X. Zhang, and D. Chen, ``Software/hardware co-design for LLM and its application for design verification,'' in 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), 2024, pp. 435--441
work page 2024
-
[40]
R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, ``Direct preference optimization: your language model is secretly a reward model,'' in Proceedings of the 37th International Conference on Neural Information Processing Systems, 2024
work page 2024
-
[41]
E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, `` LoRA: Low-Rank Adaptation of Large Language Models ,'' arXiv preprint arXiv:2106.09685, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[42]
L.-N. Pouchet and T. Yuki, `` PolyBench/C 4.2 ,'' 2016. [Online]. Available: http://polybench.sf.net
work page 2016
-
[43]
, `` Vitis-HLS-Introductory-Examples - GitHub .'' [Online]
Xilinx Inc. , `` Vitis-HLS-Introductory-Examples - GitHub .'' [Online]. Available: https://github.com/Xilinx/Vitis-HLS-Introductory-Examples
-
[44]
Y.-H. Lai, Y. Chi, Y. Hu, J. Wang, C. H. Yu, Y. Zhou, J. Cong, and Z. Zhang, `` HeteroCL : A multi-paradigm programming infrastructure for software-defined reconfigurable computing,'' in Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019, p. 242–251
work page 2019
-
[45]
H. Ye, C. Hao, J. Cheng, H. Jeong, J. Huang, S. Neuendorffer, and D. Chen, `` ScaleHLS : A new scalable high-level synthesis framework on multi-level intermediate representation,'' in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2022, pp. 741--755
work page 2022
-
[46]
11em plus .33em minus .07em 4000 4000 100 4000 4000 500 `\.=1000 = #1 \@IEEEnotcompsoconly \@IEEEcompsoconly #1 * [1] 0pt [0pt][0pt] #1 * [1] 0pt [0pt][0pt] #1 * \| ** #1 \@IEEEauthorblockNstyle \@IEEEcompsocnotconfonly \@IEEEauthorblockAstyle \@IEEEcompsocnotconfonly \@IEEEcompsocconfonly \@IEEEauthordefaulttextstyle \@IEEEcompsocnotconfonly \@IEEEauthor...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.