pith. sign in

arxiv: 2606.30949 · v1 · pith:7X66SRR5new · submitted 2026-06-29 · 💻 cs.AI · cs.AR

AgRefactor: Self-Evolving Agentic Workflow for HLS Compatibility and Performance

Pith reviewed 2026-07-01 01:27 UTC · model grok-4.3

classification 💻 cs.AI cs.AR
keywords High-Level SynthesisHLS refactoringmulti-agent workflowself-evolving memoryLLM agentscode transformationhardware accelerationpragma optimization
0
0 comments X

The pith

AgRefactor uses a self-evolving memory in a multi-agent LLM workflow to refactor software into HLS-compatible code and achieve speedups over prior tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AgRefactor as an LLM-based multi-agent workflow designed to convert ordinary software into code that High-Level Synthesis tools can turn into hardware. The system adds a memory component that stores and reuses knowledge from earlier refactoring jobs and mixes LLM rewrites with existing automated tools to limit expense. Tests cover eleven real programs that are five to ten times longer than those examined in earlier studies. When the agents also optimize performance directives, the resulting hardware runs 6.51 times faster on average than the best pragma-tuning method while using modest extra resources.

Core claim

AgRefactor is an LLM-based multi-agent workflow for refactoring software into HLS-compatible programs that incorporates a self-evolving memory system accumulating factual and strategic knowledge across tasks and integrates automated refactoring tools to balance LLM-driven rewrites with efficient transformations. On 9 out of 11 challenging real-world benchmarks 5-10x longer than prior cases, it outperforms or matches state-of-the-art automated refactoring tools and a strong LLM baseline; further agentic performance optimization yields a 6.51x geometric mean speedup over the SoTA pragma tuning tool and a 1.20x speedup over optimized open-source designs with less than 20% extra resources.

What carries the argument

The self-evolving memory system that accumulates and retrieves factual and strategic knowledge across tasks to improve robustness and efficiency on unseen programs.

If this is right

  • Refactoring becomes practical for programs five to ten times longer than those handled by earlier automated or LLM methods.
  • Computational cost drops because agents can delegate many edits to existing automated refactoring tools instead of calling the LLM for every change.
  • Hardware designs produced from the refactored code run substantially faster after the agents apply performance-directed transformations.
  • The entire process remains fully automatic and produces designs that use under 20 percent extra resources compared with hand-optimized open-source versions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The memory mechanism could be reused in other iterative code-transformation domains such as porting between different high-level languages or frameworks.
  • Combining the workflow with static analysis or formal verification passes might reduce the chance that refactored code contains subtle functional errors.
  • The reported speedups suggest the same agent structure could be applied to generate accelerator code for domains beyond HLS, such as GPU or FPGA kernel tuning.

Load-bearing premise

The self-evolving memory system accumulates and retrieves factual and strategic knowledge across tasks, improving robustness and efficiency on unseen programs.

What would settle it

A new collection of long real-world programs on which AgRefactor with the memory system fails to match or exceed the performance of the same workflow without memory or the prior SoTA tools.

Figures

Figures reproduced from arXiv: 2606.30949 by Jason Cong, Yang Zou, Yizhou Sun, Zijian Ding.

Figure 2
Figure 2. Figure 2: summarizes the results. Although both HeteroRefactor and HLSRewriter perform well on their own benchmarks, they failed many more larger benchmarks, such as libjpeg-turbo. Be￾yond the known limitation of handling external libraries such as STL containers (e.g., “std::set” and “std::map”), HeteroRefactor also shows limitations when dealing with common pointer operations. In [PITH_FULL_IMAGE:figures/full_fig… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of AGREFACTOR. Given a C/C++ program and a user-specified top-level function, the framework automatically produces a synthesizable HLS implementation by progressing through identifying, planning, refactoring, and fixing stages, while continuously updating a long-term memory bank. Once refactored, the synthesizable code and its testbench are forwarded to a performance optimization agent. Situated i… view at source ↗
Figure 4
Figure 4. Figure 4: Messages passed between agents. B. Long-term Memory for HLS Refactoring To enable self-evolution, AGREFACTOR accumulates successful and unsuccessful trials as queryable knowledgeEach memory entry ˙ is defined as a tuple (pi, Ii, si, ci), where pi is the initial program, Ii lists the identified incompatible constructs, si is the refactoring strategy, and ci is the generalized critique generated by the Analy… view at source ↗
Figure 5
Figure 5. Figure 5: Performance improvement over optimized open-source designs under [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

High-Level Synthesis (HLS) provides a fast path from concepts to silicon, but converting real-world software into synthesizable HLS code remains challenging due to restrictive language support and the gap between software and hardware programming practices. Existing automated and LLM-based refactoring approaches partially address this problem, yet they often lack flexibility, struggle to scale, and incur high computational costs. We introduce AgRefactor, an LLM-based multi-agent workflow for refactoring software into HLS-compatible programs. AgRefactor incorporates a self-evolving memory system that accumulates and retrieves factual and strategic knowledge across tasks, improving robustness and efficiency on unseen programs. To reduce cost and enhance scalability, it integrates automated refactoring tools, enabling agents to balance LLM-driven rewrites with efficient tool-based transformations. On 9 out of 11 challenging real-world benchmarks, which are 5-10x longer than the most complex cases studied in prior work, AgRefactor outperforms or matches the state-of-the-art automated refactoring tool and a strong LLM-based baseline built on the same framework backbone. Further agentic performance optimization yields a 6.51x geometric mean speedup over the SoTA pragma tuning tool and a 1.20x speedup over optimized open-source designs with less than 20% extra resources. AgRefactor is fully-automated and open-sourced.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces AgRefactor, an LLM-based multi-agent workflow for refactoring software into HLS-compatible programs. It incorporates a self-evolving memory system for accumulating and retrieving knowledge across tasks, integrates automated refactoring tools to balance LLM rewrites with tool-based transformations, and claims to outperform or match SOTA automated and LLM baselines on 9 out of 11 challenging real-world benchmarks (5-10x longer than prior work), while delivering 6.51x geometric mean speedup over SoTA pragma tuning and 1.20x over optimized open-source designs with <20% extra resources. The system is fully automated and open-sourced.

Significance. If the performance claims hold under rigorous evaluation, the work would advance automated HLS refactoring by demonstrating scalable agentic workflows that combine LLM flexibility with traditional tools and self-evolving memory, addressing key limitations in cost, scalability, and robustness for long programs. The open-sourcing supports reproducibility and community follow-up.

major comments (2)
  1. Abstract: the central performance claims (outperformance on 9/11 benchmarks, 6.51x and 1.20x speedups) are presented without any accompanying experimental setup, benchmark descriptions, baseline implementations, statistical details, or resource measurements, rendering the claims unverifiable from the provided material.
  2. Abstract: the description of the self-evolving memory system and its integration with automated tools is stated at a high level only, with no details on memory structure, retrieval mechanisms, or how they contribute to the reported robustness gains on unseen programs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their review and address the two major comments on the abstract below. The full manuscript provides the requested details in dedicated sections.

read point-by-point responses
  1. Referee: Abstract: the central performance claims (outperformance on 9/11 benchmarks, 6.51x and 1.20x speedups) are presented without any accompanying experimental setup, benchmark descriptions, baseline implementations, statistical details, or resource measurements, rendering the claims unverifiable from the provided material.

    Authors: Abstracts are intentionally concise per standard academic practice and do not contain full experimental details. The complete manuscript supplies all requested information: benchmark descriptions and lengths (5-10x longer than prior work) appear in Section 4.1, baseline implementations in Section 4.2, experimental setup and statistical details in Sections 4.3 and 5, and resource measurements in Section 5.2 plus associated tables. The claims are therefore verifiable from the full paper. We see no need to expand the abstract, as doing so would violate length conventions without adding value. revision: no

  2. Referee: Abstract: the description of the self-evolving memory system and its integration with automated tools is stated at a high level only, with no details on memory structure, retrieval mechanisms, or how they contribute to the reported robustness gains on unseen programs.

    Authors: The abstract summarizes contributions at the conventional high level. Full technical details are provided in the manuscript body: memory structure is described in Section 3.2, retrieval mechanisms in Section 3.3, integration with automated tools in Section 3.4, and contributions to robustness on unseen programs (including ablation studies) in Section 5.3. These sections explain knowledge accumulation across tasks and resulting efficiency/robustness gains. No abstract revision is warranted. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical engineering system (AgRefactor multi-agent workflow with self-evolving memory) and reports benchmark performance results. No equations, derivations, predictions from first principles, or mathematical claims appear in the provided abstract or described full text. All load-bearing elements are implementation descriptions and experimental outcomes on real-world HLS benchmarks, with no self-definitional loops, fitted inputs renamed as predictions, or self-citation chains reducing a central result to its own inputs. The self-evolving memory is presented as an architectural feature whose benefits are measured empirically rather than derived tautologically. This is a standard non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, background axioms, or newly postulated entities.

pith-pipeline@v0.9.1-grok · 5766 in / 1236 out tokens · 44639 ms · 2026-07-01T01:27:30.063406+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 10 canonical work pages · 5 internal anchors

  1. [1]

    High-level synthesis for FPGAs: From prototyping to deployment,

    J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang, “High-level synthesis for FPGAs: From prototyping to deployment,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 4, pp. 473–491, 2011

  2. [2]

    FPGA HLS today: successes, challenges, and opportunities,

    J. Cong, J. Lau, G. Liu, S. Neuendorffer, P. Pan, K. Vissers, and Z. Zhang, “FPGA HLS today: successes, challenges, and opportunities,” ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 15, no. 4, pp. 1–42, 2022

  3. [3]

    ScaleHLS: A new scalable high-level synthesis framework on multi-level intermediate representation,

    H. Ye, C. Hao, J. Cheng, H. Jeong, J. Huang, S. Neuendorffer, and D. Chen, “ScaleHLS: A new scalable high-level synthesis framework on multi-level intermediate representation,” in2022 IEEE Iternational Sym- posium on High-Performance Computer Architecture (HPCA). IEEE, 2022, pp. 741–755

  4. [4]

    Stream-HLS: Towards automatic dataflow acceleration,

    S. Basalama and J. Cong, “Stream-HLS: Towards automatic dataflow acceleration,”arXiv e-prints, pp. arXiv–2501, 2025

  5. [5]

    HIDA: A hierarchical dataflow compiler for high-level synthesis,

    H. Ye, H. Jun, and D. Chen, “HIDA: A hierarchical dataflow compiler for high-level synthesis,” inProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, 2024, pp. 215–230

  6. [6]

    A unified framework for automated code transformation and pragma insertion,

    S. Pouget, L.-N. Pouchet, and J. Cong, “A unified framework for automated code transformation and pragma insertion,” inProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2025, pp. 187–198

  7. [7]

    Allo: A programming model for composable accelerator design,

    H. Chen, N. Zhang, S. Xiang, Z. Zeng, M. Dai, and Z. Zhang, “Allo: A programming model for composable accelerator design,”Proceedings of the ACM on Programming Languages, vol. 8, no. PLDI, pp. 593–620, 2024

  8. [8]

    Heterorefactor: refactoring for heterogeneous computing with fpga,

    J. Lau, A. Sivaraman, Q. Zhang, M. A. Gulzar, J. Cong, and M. Kim, “Heterorefactor: refactoring for heterogeneous computing with fpga,” inProceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020, pp. 493–505

  9. [9]

    Heterogen: transpiling c to heterogeneous hls code with automated test generation and program repair,

    Q. Zhang, J. Wang, G. H. Xu, and M. Kim, “Heterogen: transpiling c to heterogeneous hls code with automated test generation and program repair,” inProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022, pp. 1017–1029

  10. [10]

    C2HLSC: Can LLMs bridge the software-to-hardware design gap?

    L. Collini, S. Garg, and R. Karri, “C2HLSC: Can LLMs bridge the software-to-hardware design gap?”arXiv preprint arXiv:2406.09233, 2024

  11. [11]

    Hard- ware acceleration of complex HEP algorithms with HLS and FPGAs: Methodology and preliminary implementation,

    A. Wojenski, H. Zbroszczyk, M. Kruszewski, P. Szymanski, E. Wawrzyn, D. Wielanek, W. Zabolotny, D. Pawlowska, and T. Gniazdowski, “Hard- ware acceleration of complex HEP algorithms with HLS and FPGAs: Methodology and preliminary implementation,”Computer Physics Com- munications, vol. 295, p. 108997, 2024

  12. [12]

    Hlsrewriter: Efficient refactoring and optimization of c/c++ code with llms for high-level synthesis,

    K. Xu, G. L. Zhang, X. Yin, C. Zhuo, U. Schlichtmann, and B. Li, “Hlsrewriter: Efficient refactoring and optimization of c/c++ code with llms for high-level synthesis,”ACM Transactions on Design Automation of Electronic Systems, 2025

  13. [13]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” arXiv preprint arXiv:2005.11401, 2020

  14. [14]

    MemGPT: Towards LLMs as Operating Systems

    C. Packer, S. Wooders, K. Lin, V . Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,”arXiv preprint arXiv:2310.08560, 2023

  15. [15]

    Hipporag: Neurobiologically inspired long-term memory for large language models.arXiv preprint arXiv:2405.14831, 2024

    B. Jiménez Gutiérrez, Y . Shu, Y . Gu, M. Yasunaga, and Y . Su, “Hip- porag: Neurobiologically inspired long-term memory for large language models,”arXiv preprint arXiv:2405.14831, 2024

  16. [16]

    Agent Workflow Memory

    Z. Z. Wang, J. Mao, D. Fried, and G. Neubig, “Agent workflow memory,” arXiv preprint arXiv:2409.07429, 2024

  17. [17]

    MemoryLLM: Towards self-updatable large language models,

    Y . Wang, Y . Gao, X. Chen, H. Jiang, S. Li, J. Yang, Q. Yin, Z. Li, X. Li, B. Yin, J. Shang, and J. McAuley, “MemoryLLM: Towards self-updatable large language models,”arXiv preprint arXiv:2402.04624, 2024

  18. [18]

    A-MEM: Agentic Memory for LLM Agents

    W. Xu, Z. Liang, K. Mei, H. Gao, J. Tan, and Y . Zhang, “A-MEM: Agentic memory for LLM agents,”arXiv preprint arXiv:2502.12110, 2025

  19. [19]

    AlphaEvolve: A coding agent for scientific and algorithmic discovery

    A. Novikov, N. V ˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. Ruiz, A. Mehrabianet al., “Alphaevolve: A coding agent for scientific and algorithmic discovery,” arXiv preprint arXiv:2506.13131, 2025

  20. [20]

    AutoDSE: Enabling software programmers to design efficient FPGA accelerators,

    A. Sohrabizadeh, C. H. Yu, M. Gao, and J. Cong, “AutoDSE: Enabling software programmers to design efficient FPGA accelerators,”ACM Trans. Des. Autom. Electron. Syst., vol. 27, no. 4, Feb. 2022. [Online]. Available: https://doi.org/10.1145/3494534

  21. [21]

    Leetcode,

    LeetCode, “Leetcode,” 2025, accessed: 2025-10-02. [Online]. Available: https://leetcode.com/

  22. [22]

    libsodium,

    F. Denis and the libsodium contributors, “libsodium,” 2025, accessed: 2025-10-02. [Online]. Available: https://doc.libsodium.org/

  23. [23]

    minimap2,

    H. Li, “minimap2,” 2025, accessed: 2025-10-02. [Online]. Available: https://github.com/lh3/minimap2

  24. [24]

    libjpeg-turbo,

    The libjpeg-turbo Project, “libjpeg-turbo,” 2025, accessed: 2025-10-02. [Online]. Available: https://libjpeg-turbo.org/

  25. [25]

    Av1 reference codec (libaom),

    Alliance for Open Media, “Av1 reference codec (libaom),” 2025, accessed: 2025-10-02. [Online]. Available: https://aomedia. googlesource.com/aom/

  26. [26]

    LightningSimV2: Faster and scalable simulation for high-level synthesis via graph compilation and optimization,

    R. Sarkar, R. Paul, and C. C. Hao, “LightningSimV2: Faster and scalable simulation for high-level synthesis via graph compilation and optimization,” in2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2024, pp. 104–114

  27. [27]

    Holistic Optimization Framework for FPGA Accelerators,

    S. Pouget, M. Lo, L.-N. Pouchet, and J. Cong, “Holistic Optimization Framework for FPGA Accelerators,”ACM Transactions on Design Automation of Electronic Systems, vol. 31, no. 1, pp. 1–37, 2025

  28. [28]

    Ag2: Open-source agentos for ai agents,

    C. Wang, Q. Wu, and the AG2 Community, “Ag2: Open-source agentos for ai agents,” 2024, available at https://docs.ag2.ai/. [Online]. Available: https://github.com/ag2ai/ag2

  29. [29]

    Sentencetransformers,

    UKPLab, “Sentencetransformers,” 2025, accessed: 2025-10-02. [Online]. Available: https://github.com/UKPLab/sentence-transformers

  30. [30]

    Soda: Stencil with optimized dataflow architecture,

    Y . Chi, J. Cong, P. Wei, and P. Zhou, “Soda: Stencil with optimized dataflow architecture,” inProceedings of the International Conference on Computer-Aided Design, 2018, pp. 1–8

  31. [31]

    HLSFactory: A framework empowering high-level synthesis datasets for machine learning and beyond,

    S. Abi-Karam, R. Sarkar, A. Seigler, S. Lowe, Z. Wei, H. Chen, N. Rao, L. John, A. Arora, and C. Hao, “HLSFactory: A framework empowering high-level synthesis datasets for machine learning and beyond,” inPro- ceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, 2024, pp. 1–9

  32. [32]

    Vitis Libraries,

    AMD/Xilinx, “Vitis Libraries,” https://github.com/Xilinx/Vitis\ _Libraries, 2024

  33. [33]

    GPT-5 model family,

    OpenAI, “GPT-5 model family,” OpenAI, 2025, accessed: 2025-2026. [Online]. Available: https://openai.com/gpt-5/ APPENDIX AGREFACTORis publicly available at https://github.com/Williamzou0123/AgRefactor This appendix presents supplementary studies that complement the main evaluation: the effect of self-accumulating memory across training epochs, the contrib...