pith. sign in

arxiv: 2604.23355 · v2 · pith:SPQ5Q4W4new · submitted 2026-04-25 · 💻 cs.AI

LEGO: An LLM Skill-Based Front-End Design Generation Platform

Pith reviewed 2026-05-21 09:10 UTC · model grok-4.3

classification 💻 cs.AI
keywords LLMEDARTL designVerilogskill librarycircuit skillsfront-end automationcomposable agents
0
0 comments X

The pith

LEGO turns LLM front-end design into a reusable system by packaging capabilities as 42 composable circuit skills across six steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LEGO as a platform that decomposes the digital front-end flow into six independent steps and stores every agent capability as a standardized, composable circuit skill in a plug-and-play library. Skills are extracted automatically from surveyed projects and retrieved in submillisecond time without embeddings. On 41 hard VerilogEval v2 problems where a strong baseline LLM scores zero pass rate even with extra reasoning, the skills raise Pass@1 to 0.805. Cross-project compositions achieve the same rate and outperform some prior tools while matching others. The results indicate that modular skill composition can make RTL design automation both more effective and adaptable.

Core claim

By representing front-end design capabilities as standardized composable circuit skills inside a six-step finite state machine formulation, the LEGO platform enables LLMs to solve complex RTL problems through modular retrieval and composition, raising success rates from zero to 80.5 percent on a hard benchmark subset.

What carries the argument

The six-step finite state machine formulation that turns every agent capability into a standardized composable circuit skill, supported by an automated Circuit Skill Builder and an embedding-free Agent Skill RAG retriever.

If this is right

  • Individual skills lift Pass@1 from 0.000 to 0.805 on the 41 hardest VerilogEval v2 cases.
  • Cross-project skill compositions also reach 0.805 Pass@1 and beat hierarchy-verilog by 14.6 percent.
  • The same compositions match the performance of MAGE while exceeding VerilogCoder by 2.5 percent.
  • Modular skill composition produces both higher success and greater flexibility than isolated task-specific agents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Public release of the skill library could let other teams add new skills and test coverage on additional benchmarks.
  • The same six-step structure might extend to back-end flows or mixed-signal design if the skill extraction process scales linearly.
  • Fast non-embedding retrieval suggests the approach could keep latency low even as the library grows beyond 42 skills.

Load-bearing premise

The six-step decomposition and the 42 extracted skills are general enough to cover most front-end tasks without needing major new customization for each problem.

What would settle it

Running the LEGO system on a fresh set of 50 Verilog design problems drawn from sources outside the original 11 surveyed projects and checking whether the Pass@1 rate falls below 0.6.

Figures

Figures reproduced from arXiv: 2604.23355 by Jiecheng Ma, Jincheng Lou, Runzhe Tao, Ruohan Xu, Xinyu Qu, Yibo Lin.

Figure 1
Figure 1. Figure 1: LEGO System Overview instead decomposes the workflow into six-steps and supports plug-and-play composition across steps. As shown in view at source ↗
Figure 2
Figure 2. Figure 2: Step level view of the LEGO methodology. skills to activate for each step, thereby adapting the system to specific requirements. The Step Skill component forms a three-layer hierarchy to￾gether with the top level LEGO Skill. The top layer defines the overall workflow decomposition, execution order, and iteration logic. The middle layer consists of six-step skills, each listing available circuit skills and … view at source ↗
Figure 3
Figure 3. Figure 3: Per problem results of Experiments 1 and 2 on 41 filtered VerilogEval v2 problems. The heatmap is split into two aligned subplots for Experiment 1 view at source ↗
read the original abstract

Existing LLM-based EDA agents are often isolated task-specific systems. This leads to repeated engineering effort and limited reuse of successful design and debugging strategies. We present LEGO, a unified skill-based platform for front-end design generation. It decomposes the digital front-end flow into six independent steps and represents every agent capability as a standardized composable circuit skill within a plug-and-play architecture. To build this skill library, we survey more than 100 papers, select 11 representative open-source projects, and extract 42 executable circuit skills within a six-step finite state machine formulation. Circuit Skill Builder automates skill extraction with linear scalability. Agent Skill RAG achieves submillisecond retrieval without relying on embedding models. Empirical evaluation on a hard subset of 41 VerilogEval v2 problems that gpt-5.2-codex fails to solve under extra-high reasoning effort shows that individual circuit skills constructed within LEGO raise Pass@1 from 0.000 to 0.805. This is an 80.5% gain over the baseline. Cross-project skill compositions also reach 0.805 Pass@1. They outperform hierarchy-verilog by 14.6% and VerilogCoder by 2.5%. They also match MAGE. These results show that modular skill composition supports both effective and flexible RTL design automation. The LEGO platform and all circuit skills are publicly available at GitHub: https://github.com/loujc/LEGO-An-LLM-Skill-Based-Front-End-Design-Generation-Platform

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents LEGO, a unified skill-based platform for LLM-driven front-end digital design generation. It decomposes the flow into six independent steps formulated as a finite state machine and represents capabilities as standardized, composable 'circuit skills.' From a survey of over 100 papers, the authors select 11 open-source projects and extract 42 executable skills via an automated Circuit Skill Builder. Agent Skill RAG enables fast retrieval. On a hard subset of 41 VerilogEval v2 problems where gpt-5.2-codex fails, individual skills raise Pass@1 from 0.000 to 0.805; cross-project compositions match or exceed several baselines while the full platform and skills are released publicly.

Significance. If the results hold, the work shows that modular skill composition can deliver substantial gains in LLM-based RTL generation on challenging cases, supporting reuse of design strategies and reducing per-task engineering. The public GitHub release of the platform and skills is a clear strength for reproducibility and extension by the community.

major comments (2)
  1. Abstract: The central performance claim (Pass@1 rising from 0.000 to 0.805 on the 41 hard VerilogEval v2 cases) depends on the fixed library of 42 skills extracted from only 11 projects being sufficiently general and composable. The reported evaluation provides no additional experiments or analysis demonstrating coverage for problems whose structure falls outside those projects or requires steps beyond the six FSM states; this directly affects the claimed plug-and-play advantage.
  2. Abstract: The manuscript lacks detail on skill validation procedures, potential selection bias in the choice of the 11 projects, and error analysis or failure modes for the 41 problems, which leaves only moderate support for the generality and robustness of the 0.805 Pass@1 result despite the numerical improvement over the stated baseline.
minor comments (2)
  1. Clarify the exact baseline model name and prompting configuration for 'gpt-5.2-codex' in the evaluation section.
  2. Consider adding a summary table or figure that maps the 42 skills to the six FSM steps to improve readability of the skill library construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and for recognizing the significance of the public release. We respond to each major comment below and describe the revisions we will make.

read point-by-point responses
  1. Referee: Abstract: The central performance claim (Pass@1 rising from 0.000 to 0.805 on the 41 hard VerilogEval v2 cases) depends on the fixed library of 42 skills extracted from only 11 projects being sufficiently general and composable. The reported evaluation provides no additional experiments or analysis demonstrating coverage for problems whose structure falls outside those projects or requires steps beyond the six FSM states; this directly affects the claimed plug-and-play advantage.

    Authors: The 11 projects were selected after surveying more than 100 papers as representative of common front-end patterns, and the six FSM states are intended to span the standard RTL generation flow. Cross-project skill compositions reaching the same 0.805 Pass@1 provide evidence of composability within the evaluated distribution. We agree that explicit experiments on problems whose structure lies outside the surveyed projects would strengthen the generality argument. We will add a dedicated limitations subsection discussing the current scope of the skill library and outlining planned extensions for broader coverage. revision: partial

  2. Referee: Abstract: The manuscript lacks detail on skill validation procedures, potential selection bias in the choice of the 11 projects, and error analysis or failure modes for the 41 problems, which leaves only moderate support for the generality and robustness of the 0.805 Pass@1 result despite the numerical improvement over the stated baseline.

    Authors: We will revise the manuscript to include expanded descriptions of the validation steps performed by the Circuit Skill Builder, the explicit selection criteria used to choose the 11 projects from the surveyed literature, and a concise error analysis of the failure modes observed on the 41 problems. These additions will directly address concerns about robustness and selection bias. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are direct empirical measurements

full rationale

The paper presents a platform built by surveying papers, selecting 11 projects, and extracting 42 skills into a six-step FSM, then reports Pass@1 gains on an external hard subset of 41 VerilogEval v2 problems. These outcomes are measured directly against baselines rather than derived from fitted parameters, self-referential definitions, or load-bearing self-citations. No equations exist that equate the reported 0.805 Pass@1 to the skill extraction process by construction, and the evaluation uses problems the baseline fails, providing independent falsifiability. The generality assumption is a design choice open to external validation, not a circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that front-end design can be decomposed into six independent steps and that skills extracted from a limited set of open-source projects generalize via composition; no explicit free parameters are introduced, and the circuit skill concept is a new postulated entity without independent falsifiable evidence outside the reported benchmark.

axioms (1)
  • domain assumption The digital front-end flow can be decomposed into six independent steps representable as a finite state machine.
    Invoked in the description of the skill-based platform architecture.
invented entities (1)
  • Circuit Skill no independent evidence
    purpose: Standardized, composable representation of agent capability for RTL design tasks.
    New entity introduced to enable the plug-and-play library; no independent evidence such as a predicted property outside the benchmark is provided.

pith-pipeline@v0.9.0 · 5818 in / 1338 out tokens · 59208 ms · 2026-05-21T09:10:34.230706+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Invited paper: Verilo- gEval: Evaluating large language models for verilog code generation,

    M. Liu, N. Pinckney, B. Khailany, and H. Ren, “Invited paper: Verilo- gEval: Evaluating large language models for verilog code generation,” in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2023, pp. 1–8

  2. [2]

    Revisiting verilogeval: Newer llms, in-context learning, and specification-to-rtl tasks,

    N. Pinckney, C. Batten, M. Liu, H. Ren, and B. Khailany, “Revisiting verilogeval: Newer llms, in-context learning, and specification-to-rtl tasks,” 2024. [Online]. Available: https://arxiv.org/abs/2408.11053

  3. [3]

    RTLLM: An open-source benchmark for design rtl generation with large language model,

    Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “RTLLM: An open-source benchmark for design rtl generation with large language model,” in2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), 2024, pp. 722–727

  4. [4]

    RTLBench: A multi- dimensional benchmark suite for evaluating llm-generated rtl code,

    Z. Fang, R. Chen, Y . Guo, H. Dai, and L. Wang, “RTLBench: A multi- dimensional benchmark suite for evaluating llm-generated rtl code,” in 2025 IEEE 43rd International Conference on Computer Design (ICCD), 2025, pp. 566–573

  5. [5]

    OpenLLM-RTL: Open dataset and benchmark for llm-aided design rtl generation: Invited paper,

    S. Liu, Y . Lu, W. Fang, M. Li, and Z. Xie, “OpenLLM-RTL: Open dataset and benchmark for llm-aided design rtl generation: Invited paper,” in2024 ACM/IEEE International Conference On Computer Aided Design (ICCAD), 2024, pp. 1–9

  6. [6]

    Liu, T.-D

    M. Liu, T.-D. Ene, R. Kirbyet al., “ChipNeMo: Domain- adapted llms for chip design,” 2024. [Online]. Available: https: //arxiv.org/abs/2311.00176

  7. [7]

    VeriMind: Agentic llm for automated verilog generation with a novel evaluation metric,

    B. Nadimi, G. O. Boutaib, and H. Zheng, “VeriMind: Agentic llm for automated verilog generation with a novel evaluation metric,” 2025. [Online]. Available: https://arxiv.org/abs/2503.16514

  8. [8]

    VerilogCoder: Autonomous verilog coding agents with graph-based planning and abstract syntax tree (ast)- based waveform tracing tool,

    C.-T. Ho, H. Ren, and B. Khailany, “VerilogCoder: Autonomous verilog coding agents with graph-based planning and abstract syntax tree (ast)- based waveform tracing tool,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 1, pp. 300–307, Apr. 2025. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/32007

  9. [9]

    Spec2RTL- Agent: Automated hardware code generation from complex specifica- tions using llm agent systems,

    Z. Yu, M. Liu, M. Zimmer, Y . Celine, Y . Liu, and H. Ren, “Spec2RTL- Agent: Automated hardware code generation from complex specifica- tions using llm agent systems,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025, pp. 37–43

  10. [10]

    MAGE: A multi-agent engine for automated rtl code generation,

    Y . Zhao, H. Zhang, H. Huang, Z. Yu, and J. Zhao, “MAGE: A multi-agent engine for automated rtl code generation,” in2025 62nd ACM/IEEE Design Automation Conference (DAC), 2025, pp. 1–7

  11. [11]

    CodeV: Empowering llms with hdl generation through multi- level summarization,

    Y . Zhao, D. Huang, C. Li, P. Jin, M. Song, Y . Xu, Z. Nan, M. Gao, T. Ma, L. Qi, Y . Pan, Z. Zhang, R. Zhang, X. Zhang, Z. Du, Q. Guo, and X. Hu, “CodeV: Empowering llms with hdl generation through multi- level summarization,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1–1, 2025

  12. [12]

    RTL- Coder: Fully open-source and efficient llm-assisted rtl code generation technique,

    S. Liu, W. Fang, Y . Lu, J. Wang, Q. Zhang, H. Zhang, and Z. Xie, “RTL- Coder: Fully open-source and efficient llm-assisted rtl code generation technique,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 44, no. 4, pp. 1448–1461, 2025

  13. [13]

    HDL-GPT: High-quality hdl is all you need,

    B. Kumar, S. Nanda, G. Parthasarathy, P. Patil, A. Tsai, and P. Choudhary, “HDL-GPT: High-quality hdl is all you need,” 2024. [Online]. Available: https://arxiv.org/abs/2407.18423

  14. [14]

    VeriGen: A large language model for verilog code generation,

    S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “VeriGen: A large language model for verilog code generation,”ACM Trans. Des. Autom. Electron. Syst., vol. 29, no. 3, Apr. 2024. [Online]. Available: https://doi.org/10.1145/3643681

  15. [15]

    OriGen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,

    F. Cui, C. Yin, K. Zhou, Y . Xiao, G. Sun, Q. Xu, Q. Guo, D. Song, D. Lin, X. Zhang, and Y . E. Liang, “OriGen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,” in2024 ACM/IEEE International Conference On Computer Aided Design (IC- CAD), 2024, pp. 1–9

  16. [16]

    AutoVCoder: A systematic framework for automated verilog code generation using llms,

    M. Gao, J. Zhao, Z. Lin, W. Ding, X. Hou, Y . Feng, C. Li, and M. Guo, “AutoVCoder: A systematic framework for automated verilog code generation using llms,” in2024 IEEE 42nd International Conference on Computer Design (ICCD), 2024, pp. 162–169

  17. [17]

    Codex CLI: Command-line interface for code agents,

    OpenAI, “Codex CLI: Command-line interface for code agents,” https:// developers.openai.com/codex/cli/, 2024, version 0.98.0, Accessed: 2026- 02-08

  18. [18]

    Claude Code: Ai-powered coding assistant,

    Anthropic, “Claude Code: Ai-powered coding assistant,” https://www. anthropic.com/claude-code, 2024, accessed: 2026-02-08

  19. [19]

    OpenCode: Open-source code generation plat- form,

    OpenCode Contributors, “OpenCode: Open-source code generation plat- form,” https://opencode.org, 2024, accessed: 2026-02-08

  20. [20]

    Icarus Verilog,

    Icarus Verilog Project, “Icarus Verilog,” https://steveicarus.github.io/ iverilog/, 2026, accessed: 2026-02-07

  21. [21]

    Verilator,

    Verilator Project, “Verilator,” https://www.veripool.org/verilator/, 2026, accessed: 2026-02-07

  22. [22]

    BRIDGES: Bridging graph modality and large language models within eda tasks,

    W. Li, Y . Zou, C. Ellis, R. Purdy, S. Blanton, and J. M. F. Moura, “BRIDGES: Bridging graph modality and large language models within eda tasks,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025, pp. 77–84

  23. [23]

    AutoBench: Automatic testbench generation and evaluation using llms for hdl design,

    R. Qiu, G. L. Zhang, R. Drechsler, U. Schlichtmann, and B. Li, “AutoBench: Automatic testbench generation and evaluation using llms for hdl design,” inProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, ser. MLCAD ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1...

  24. [24]

    HiVeGen – hierarchical llm-based verilog generation for scalable chip design,

    J. Tang, J. Qin, K. Thorat, C. Zhu-Tian, Y . Cao, Y . K. Zhao, and C. Ding, “HiVeGen – hierarchical llm-based verilog generation for scalable chip design,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025, pp. 30–36

  25. [25]

    TuRTLe: A unified evaluation of llms for rtl generation,

    D. Garcia-Gasulla, G. Kestor, E. Parisi, M. Albert ´ı-Binimelis, C. Gutier- rez, R. M. Ghorab, O. Montenegro, B. Homs, and M. Moreto, “TuRTLe: A unified evaluation of llms for rtl generation,” in2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD), 2025, pp. 1–12

  26. [26]

    AutoChip: Automating hdl generation using llm feedback,

    S. Thakur, J. Blocklove, H. Pearce, B. Tan, S. Garg, and R. Karri, “AutoChip: Automating hdl generation using llm feedback,” 2024. [Online]. Available: https://arxiv.org/abs/2311.04887

  27. [27]

    Towards llm-powered verilog rtl assistant: Self-verification and self-correction,

    H. Huang, Z. Lin, Z. Wang, X. Chen, K. Ding, and J. Zhao, “Towards llm-powered verilog rtl assistant: Self-verification and self-correction,”

  28. [28]

    Available: https://arxiv.org/abs/2406.00115

    [Online]. Available: https://arxiv.org/abs/2406.00115

  29. [29]

    RTLFixer: Automatically fixing rtl syntax errors with large language models,

    Y . Tsai, M. Liu, and H. Ren, “RTLFixer: Automatically fixing rtl syntax errors with large language models,” in2024 61st ACM/IEEE Design Automation Conference (DAC), 2024, pp. 1–6

  30. [30]

    Understanding and mitigating errors of llm-generated rtl code,

    J. Zhang, C. Liu, L. Cheng, X. Li, and H. Li, “Understanding and mitigating errors of llm-generated rtl code,” 2026. [Online]. Available: https://arxiv.org/abs/2508.05266

  31. [31]

    Verilogassistant: Open-source reproduction repository,

    zjz1222, “Verilogassistant: Open-source reproduction repository,” https: //github.com/zjz1222/VerilogAssistant, 2026, gitHub repository, Ac- cessed: 2026-02-08

  32. [32]

    Skills.homes: Agent skill marketplace,

    Skills.homes, “Skills.homes: Agent skill marketplace,” https://skills. homes/zh-CN, 2025, accessed: 2026-02-08