LEGO: An LLM Skill-Based Front-End Design Generation Platform
Pith reviewed 2026-05-21 09:10 UTC · model grok-4.3
The pith
LEGO turns LLM front-end design into a reusable system by packaging capabilities as 42 composable circuit skills across six steps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By representing front-end design capabilities as standardized composable circuit skills inside a six-step finite state machine formulation, the LEGO platform enables LLMs to solve complex RTL problems through modular retrieval and composition, raising success rates from zero to 80.5 percent on a hard benchmark subset.
What carries the argument
The six-step finite state machine formulation that turns every agent capability into a standardized composable circuit skill, supported by an automated Circuit Skill Builder and an embedding-free Agent Skill RAG retriever.
If this is right
- Individual skills lift Pass@1 from 0.000 to 0.805 on the 41 hardest VerilogEval v2 cases.
- Cross-project skill compositions also reach 0.805 Pass@1 and beat hierarchy-verilog by 14.6 percent.
- The same compositions match the performance of MAGE while exceeding VerilogCoder by 2.5 percent.
- Modular skill composition produces both higher success and greater flexibility than isolated task-specific agents.
Where Pith is reading between the lines
- Public release of the skill library could let other teams add new skills and test coverage on additional benchmarks.
- The same six-step structure might extend to back-end flows or mixed-signal design if the skill extraction process scales linearly.
- Fast non-embedding retrieval suggests the approach could keep latency low even as the library grows beyond 42 skills.
Load-bearing premise
The six-step decomposition and the 42 extracted skills are general enough to cover most front-end tasks without needing major new customization for each problem.
What would settle it
Running the LEGO system on a fresh set of 50 Verilog design problems drawn from sources outside the original 11 surveyed projects and checking whether the Pass@1 rate falls below 0.6.
Figures
read the original abstract
Existing LLM-based EDA agents are often isolated task-specific systems. This leads to repeated engineering effort and limited reuse of successful design and debugging strategies. We present LEGO, a unified skill-based platform for front-end design generation. It decomposes the digital front-end flow into six independent steps and represents every agent capability as a standardized composable circuit skill within a plug-and-play architecture. To build this skill library, we survey more than 100 papers, select 11 representative open-source projects, and extract 42 executable circuit skills within a six-step finite state machine formulation. Circuit Skill Builder automates skill extraction with linear scalability. Agent Skill RAG achieves submillisecond retrieval without relying on embedding models. Empirical evaluation on a hard subset of 41 VerilogEval v2 problems that gpt-5.2-codex fails to solve under extra-high reasoning effort shows that individual circuit skills constructed within LEGO raise Pass@1 from 0.000 to 0.805. This is an 80.5% gain over the baseline. Cross-project skill compositions also reach 0.805 Pass@1. They outperform hierarchy-verilog by 14.6% and VerilogCoder by 2.5%. They also match MAGE. These results show that modular skill composition supports both effective and flexible RTL design automation. The LEGO platform and all circuit skills are publicly available at GitHub: https://github.com/loujc/LEGO-An-LLM-Skill-Based-Front-End-Design-Generation-Platform
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents LEGO, a unified skill-based platform for LLM-driven front-end digital design generation. It decomposes the flow into six independent steps formulated as a finite state machine and represents capabilities as standardized, composable 'circuit skills.' From a survey of over 100 papers, the authors select 11 open-source projects and extract 42 executable skills via an automated Circuit Skill Builder. Agent Skill RAG enables fast retrieval. On a hard subset of 41 VerilogEval v2 problems where gpt-5.2-codex fails, individual skills raise Pass@1 from 0.000 to 0.805; cross-project compositions match or exceed several baselines while the full platform and skills are released publicly.
Significance. If the results hold, the work shows that modular skill composition can deliver substantial gains in LLM-based RTL generation on challenging cases, supporting reuse of design strategies and reducing per-task engineering. The public GitHub release of the platform and skills is a clear strength for reproducibility and extension by the community.
major comments (2)
- Abstract: The central performance claim (Pass@1 rising from 0.000 to 0.805 on the 41 hard VerilogEval v2 cases) depends on the fixed library of 42 skills extracted from only 11 projects being sufficiently general and composable. The reported evaluation provides no additional experiments or analysis demonstrating coverage for problems whose structure falls outside those projects or requires steps beyond the six FSM states; this directly affects the claimed plug-and-play advantage.
- Abstract: The manuscript lacks detail on skill validation procedures, potential selection bias in the choice of the 11 projects, and error analysis or failure modes for the 41 problems, which leaves only moderate support for the generality and robustness of the 0.805 Pass@1 result despite the numerical improvement over the stated baseline.
minor comments (2)
- Clarify the exact baseline model name and prompting configuration for 'gpt-5.2-codex' in the evaluation section.
- Consider adding a summary table or figure that maps the 42 skills to the six FSM steps to improve readability of the skill library construction.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and for recognizing the significance of the public release. We respond to each major comment below and describe the revisions we will make.
read point-by-point responses
-
Referee: Abstract: The central performance claim (Pass@1 rising from 0.000 to 0.805 on the 41 hard VerilogEval v2 cases) depends on the fixed library of 42 skills extracted from only 11 projects being sufficiently general and composable. The reported evaluation provides no additional experiments or analysis demonstrating coverage for problems whose structure falls outside those projects or requires steps beyond the six FSM states; this directly affects the claimed plug-and-play advantage.
Authors: The 11 projects were selected after surveying more than 100 papers as representative of common front-end patterns, and the six FSM states are intended to span the standard RTL generation flow. Cross-project skill compositions reaching the same 0.805 Pass@1 provide evidence of composability within the evaluated distribution. We agree that explicit experiments on problems whose structure lies outside the surveyed projects would strengthen the generality argument. We will add a dedicated limitations subsection discussing the current scope of the skill library and outlining planned extensions for broader coverage. revision: partial
-
Referee: Abstract: The manuscript lacks detail on skill validation procedures, potential selection bias in the choice of the 11 projects, and error analysis or failure modes for the 41 problems, which leaves only moderate support for the generality and robustness of the 0.805 Pass@1 result despite the numerical improvement over the stated baseline.
Authors: We will revise the manuscript to include expanded descriptions of the validation steps performed by the Circuit Skill Builder, the explicit selection criteria used to choose the 11 projects from the surveyed literature, and a concise error analysis of the failure modes observed on the 41 problems. These additions will directly address concerns about robustness and selection bias. revision: yes
Circularity Check
No significant circularity; results are direct empirical measurements
full rationale
The paper presents a platform built by surveying papers, selecting 11 projects, and extracting 42 skills into a six-step FSM, then reports Pass@1 gains on an external hard subset of 41 VerilogEval v2 problems. These outcomes are measured directly against baselines rather than derived from fitted parameters, self-referential definitions, or load-bearing self-citations. No equations exist that equate the reported 0.805 Pass@1 to the skill extraction process by construction, and the evaluation uses problems the baseline fails, providing independent falsifiability. The generality assumption is a design choice open to external validation, not a circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The digital front-end flow can be decomposed into six independent steps representable as a finite state machine.
invented entities (1)
-
Circuit Skill
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
decomposes the digital front-end flow into six independent steps ... six-step finite state machine formulation
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
42 executable circuit skills ... organized into 24 functional groups
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Invited paper: Verilo- gEval: Evaluating large language models for verilog code generation,
M. Liu, N. Pinckney, B. Khailany, and H. Ren, “Invited paper: Verilo- gEval: Evaluating large language models for verilog code generation,” in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2023, pp. 1–8
work page 2023
-
[2]
Revisiting verilogeval: Newer llms, in-context learning, and specification-to-rtl tasks,
N. Pinckney, C. Batten, M. Liu, H. Ren, and B. Khailany, “Revisiting verilogeval: Newer llms, in-context learning, and specification-to-rtl tasks,” 2024. [Online]. Available: https://arxiv.org/abs/2408.11053
-
[3]
RTLLM: An open-source benchmark for design rtl generation with large language model,
Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “RTLLM: An open-source benchmark for design rtl generation with large language model,” in2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), 2024, pp. 722–727
work page 2024
-
[4]
RTLBench: A multi- dimensional benchmark suite for evaluating llm-generated rtl code,
Z. Fang, R. Chen, Y . Guo, H. Dai, and L. Wang, “RTLBench: A multi- dimensional benchmark suite for evaluating llm-generated rtl code,” in 2025 IEEE 43rd International Conference on Computer Design (ICCD), 2025, pp. 566–573
work page 2025
-
[5]
OpenLLM-RTL: Open dataset and benchmark for llm-aided design rtl generation: Invited paper,
S. Liu, Y . Lu, W. Fang, M. Li, and Z. Xie, “OpenLLM-RTL: Open dataset and benchmark for llm-aided design rtl generation: Invited paper,” in2024 ACM/IEEE International Conference On Computer Aided Design (ICCAD), 2024, pp. 1–9
work page 2024
- [6]
-
[7]
VeriMind: Agentic llm for automated verilog generation with a novel evaluation metric,
B. Nadimi, G. O. Boutaib, and H. Zheng, “VeriMind: Agentic llm for automated verilog generation with a novel evaluation metric,” 2025. [Online]. Available: https://arxiv.org/abs/2503.16514
-
[8]
C.-T. Ho, H. Ren, and B. Khailany, “VerilogCoder: Autonomous verilog coding agents with graph-based planning and abstract syntax tree (ast)- based waveform tracing tool,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 1, pp. 300–307, Apr. 2025. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/32007
work page 2025
-
[9]
Z. Yu, M. Liu, M. Zimmer, Y . Celine, Y . Liu, and H. Ren, “Spec2RTL- Agent: Automated hardware code generation from complex specifica- tions using llm agent systems,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025, pp. 37–43
work page 2025
-
[10]
MAGE: A multi-agent engine for automated rtl code generation,
Y . Zhao, H. Zhang, H. Huang, Z. Yu, and J. Zhao, “MAGE: A multi-agent engine for automated rtl code generation,” in2025 62nd ACM/IEEE Design Automation Conference (DAC), 2025, pp. 1–7
work page 2025
-
[11]
CodeV: Empowering llms with hdl generation through multi- level summarization,
Y . Zhao, D. Huang, C. Li, P. Jin, M. Song, Y . Xu, Z. Nan, M. Gao, T. Ma, L. Qi, Y . Pan, Z. Zhang, R. Zhang, X. Zhang, Z. Du, Q. Guo, and X. Hu, “CodeV: Empowering llms with hdl generation through multi- level summarization,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1–1, 2025
work page 2025
-
[12]
RTL- Coder: Fully open-source and efficient llm-assisted rtl code generation technique,
S. Liu, W. Fang, Y . Lu, J. Wang, Q. Zhang, H. Zhang, and Z. Xie, “RTL- Coder: Fully open-source and efficient llm-assisted rtl code generation technique,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 44, no. 4, pp. 1448–1461, 2025
work page 2025
-
[13]
HDL-GPT: High-quality hdl is all you need,
B. Kumar, S. Nanda, G. Parthasarathy, P. Patil, A. Tsai, and P. Choudhary, “HDL-GPT: High-quality hdl is all you need,” 2024. [Online]. Available: https://arxiv.org/abs/2407.18423
-
[14]
VeriGen: A large language model for verilog code generation,
S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “VeriGen: A large language model for verilog code generation,”ACM Trans. Des. Autom. Electron. Syst., vol. 29, no. 3, Apr. 2024. [Online]. Available: https://doi.org/10.1145/3643681
-
[15]
OriGen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,
F. Cui, C. Yin, K. Zhou, Y . Xiao, G. Sun, Q. Xu, Q. Guo, D. Song, D. Lin, X. Zhang, and Y . E. Liang, “OriGen: Enhancing rtl code generation with code-to-code augmentation and self-reflection,” in2024 ACM/IEEE International Conference On Computer Aided Design (IC- CAD), 2024, pp. 1–9
work page 2024
-
[16]
AutoVCoder: A systematic framework for automated verilog code generation using llms,
M. Gao, J. Zhao, Z. Lin, W. Ding, X. Hou, Y . Feng, C. Li, and M. Guo, “AutoVCoder: A systematic framework for automated verilog code generation using llms,” in2024 IEEE 42nd International Conference on Computer Design (ICCD), 2024, pp. 162–169
work page 2024
-
[17]
Codex CLI: Command-line interface for code agents,
OpenAI, “Codex CLI: Command-line interface for code agents,” https:// developers.openai.com/codex/cli/, 2024, version 0.98.0, Accessed: 2026- 02-08
work page 2024
-
[18]
Claude Code: Ai-powered coding assistant,
Anthropic, “Claude Code: Ai-powered coding assistant,” https://www. anthropic.com/claude-code, 2024, accessed: 2026-02-08
work page 2024
-
[19]
OpenCode: Open-source code generation plat- form,
OpenCode Contributors, “OpenCode: Open-source code generation plat- form,” https://opencode.org, 2024, accessed: 2026-02-08
work page 2024
-
[20]
Icarus Verilog Project, “Icarus Verilog,” https://steveicarus.github.io/ iverilog/, 2026, accessed: 2026-02-07
work page 2026
-
[21]
Verilator Project, “Verilator,” https://www.veripool.org/verilator/, 2026, accessed: 2026-02-07
work page 2026
-
[22]
BRIDGES: Bridging graph modality and large language models within eda tasks,
W. Li, Y . Zou, C. Ellis, R. Purdy, S. Blanton, and J. M. F. Moura, “BRIDGES: Bridging graph modality and large language models within eda tasks,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025, pp. 77–84
work page 2025
-
[23]
AutoBench: Automatic testbench generation and evaluation using llms for hdl design,
R. Qiu, G. L. Zhang, R. Drechsler, U. Schlichtmann, and B. Li, “AutoBench: Automatic testbench generation and evaluation using llms for hdl design,” inProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, ser. MLCAD ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1...
-
[24]
HiVeGen – hierarchical llm-based verilog generation for scalable chip design,
J. Tang, J. Qin, K. Thorat, C. Zhu-Tian, Y . Cao, Y . K. Zhao, and C. Ding, “HiVeGen – hierarchical llm-based verilog generation for scalable chip design,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025, pp. 30–36
work page 2025
-
[25]
TuRTLe: A unified evaluation of llms for rtl generation,
D. Garcia-Gasulla, G. Kestor, E. Parisi, M. Albert ´ı-Binimelis, C. Gutier- rez, R. M. Ghorab, O. Montenegro, B. Homs, and M. Moreto, “TuRTLe: A unified evaluation of llms for rtl generation,” in2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD), 2025, pp. 1–12
work page 2025
-
[26]
AutoChip: Automating hdl generation using llm feedback,
S. Thakur, J. Blocklove, H. Pearce, B. Tan, S. Garg, and R. Karri, “AutoChip: Automating hdl generation using llm feedback,” 2024. [Online]. Available: https://arxiv.org/abs/2311.04887
-
[27]
Towards llm-powered verilog rtl assistant: Self-verification and self-correction,
H. Huang, Z. Lin, Z. Wang, X. Chen, K. Ding, and J. Zhao, “Towards llm-powered verilog rtl assistant: Self-verification and self-correction,”
-
[28]
Available: https://arxiv.org/abs/2406.00115
[Online]. Available: https://arxiv.org/abs/2406.00115
-
[29]
RTLFixer: Automatically fixing rtl syntax errors with large language models,
Y . Tsai, M. Liu, and H. Ren, “RTLFixer: Automatically fixing rtl syntax errors with large language models,” in2024 61st ACM/IEEE Design Automation Conference (DAC), 2024, pp. 1–6
work page 2024
-
[30]
Understanding and mitigating errors of llm-generated rtl code,
J. Zhang, C. Liu, L. Cheng, X. Li, and H. Li, “Understanding and mitigating errors of llm-generated rtl code,” 2026. [Online]. Available: https://arxiv.org/abs/2508.05266
-
[31]
Verilogassistant: Open-source reproduction repository,
zjz1222, “Verilogassistant: Open-source reproduction repository,” https: //github.com/zjz1222/VerilogAssistant, 2026, gitHub repository, Ac- cessed: 2026-02-08
work page 2026
-
[32]
Skills.homes: Agent skill marketplace,
Skills.homes, “Skills.homes: Agent skill marketplace,” https://skills. homes/zh-CN, 2025, accessed: 2026-02-08
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.