pith. sign in

arxiv: 2605.17892 · v1 · pith:Z3Z2NTVDnew · submitted 2026-05-18 · 💻 cs.AR

CPPL: A Circuit Prompt Programming Language

Pith reviewed 2026-05-20 00:41 UTC · model grok-4.3

classification 💻 cs.AR
keywords LLM hardware designRTL generationcircuit IRcompiler frontendCIRCTfunctional correctnessVerilog synthesishardware automation
0
0 comments X

The pith

CPPL routes LLM hardware outputs through a JSON IR and compiler checks to raise functional correctness over direct Verilog generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models struggle to emit correct and optimizable register-transfer level designs because of syntax rules, width constraints, and the lack of built-in validation. This paper introduces CPPL to convert the generation task into a structured frontend problem: a Python DSL declares module interfaces and hierarchy while a JSON-based circuit IR exposes the structure that a compiler can check. The compiler validates port bindings, infers widths, and lowers the result deterministically to CIRCT for synthesizable Verilog. On the RTLLM benchmark this yields higher functional correctness than either direct Verilog text or raw CIRCT IR, and the CIRCT backend further reduces post-synthesis AIG node counts. A sympathetic reader would care because the method replaces unconstrained text generation with a pipeline that is both checkable and compatible with existing hardware optimization tools.

Core claim

The paper establishes that mediating LLM circuit generation through CPPL, which combines a Python frontend DSL with a JSON-based circuit IR, turns an error-prone text task into a statically checkable compiler frontend; the resulting designs pass validation for hierarchy and bindings, infer widths automatically, lower to CIRCT, and exhibit improved functional correctness on the RTLLM benchmark while benefiting from CIRCT optimization passes that reduce AIG node counts after synthesis.

What carries the argument

CPPL IR, a JSON-based circuit representation that encodes modules, operations, and connections in a form LLMs can produce and a compiler can validate before lowering to CIRCT.

If this is right

  • LLM outputs become subject to automatic checks for port bindings, hierarchy consistency, and width inference before any synthesis occurs.
  • Designs reach a standard compiler infrastructure that applies existing optimization passes without further manual intervention.
  • Post-synthesis hardware metrics improve because CIRCT can operate on the lowered representation rather than on raw LLM text.
  • The generation task for the LLM is simplified to producing JSON that matches a documented schema instead of raw RTL syntax.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mediation pattern could be reused for LLM generation of code in other compiler-based domains where syntax and constraint enforcement are currently brittle.
  • Because the IR is JSON, incremental or partial module generation becomes easier to integrate into larger design flows.
  • Fine-tuning or few-shot prompting targeted at the CPPL schema could be tested as a direct way to raise correctness further without altering the compiler.

Load-bearing premise

The CPPL IR and its compiler can capture every necessary circuit structure and constraint that an LLM might intend without leaving validation gaps or restricting the range of expressible designs.

What would settle it

A rerun of the RTLLM benchmark in which CPPL-generated designs show no gain in functional correctness over direct Verilog generation would demonstrate that the compiler-mediated path does not improve reliability.

read the original abstract

Large language models (LLMs) have shown promise in register-transfer level (RTL) design automation, but direct RTL generation remains difficult to validate, optimize, and integrate with compiler-based hardware design flows. Hardware compiler infrastructures such as CIRCT provide typed intermediate representations, legality checks, and optimization passes, yet current LLMs struggle to emit raw compiler IR because of MLIR syntax, SSA discipline, dialect-specific operations, and strict width constraints. This paper presents CPPL, a compiler-mediated design framework that turns LLM-assisted hardware generation into a statically checkable frontend problem rather than an unconstrained RTL text-generation task. CPPL combines a Python frontend DSL for declaring module interfaces and hierarchy with CPPL IR, a JSON-based circuit IR designed to expose compiler-visible structure while remaining accessible to LLMs. The compiler infers operation widths from declared module ports, validates generated IR, checks hierarchy and port bindings, and deterministically lowers the result to CIRCT for synthesizable Verilog generation. On the RTLLM benchmark, CPPL improves functional correctness over direct Verilog and direct CIRCT IR generation, while CIRCT optimization reduces post-synthesis AIG node counts. These results show that a compiler-mediated interface can make LLM-assisted hardware design more reliable, analyzable, and amenable to backend optimization. CPPL is available at https://github.com/SawyDust1228/CPPL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents CPPL, a compiler-mediated framework for LLM-assisted RTL design. It combines a Python DSL for declaring module interfaces and hierarchy with a JSON-based CPPL IR that exposes structure for static checks. The compiler performs width inference from ports, hierarchy/port validation, and deterministic lowering to CIRCT, from which synthesizable Verilog is generated. On the RTLLM benchmark the approach is reported to improve functional correctness relative to direct Verilog generation and direct CIRCT-IR generation, while CIRCT passes further reduce post-synthesis AIG node counts. The implementation is released at https://github.com/SawyDust1228/CPPL.

Significance. If the empirical claims hold, the work demonstrates that routing LLM output through a typed, compiler-visible IR can turn an unconstrained text-generation task into a statically checkable frontend problem, improving reliability and enabling downstream optimization. The public release of the code and benchmark harness constitutes a concrete strength for reproducibility.

major comments (2)
  1. [Evaluation] Evaluation section: the abstract states that CPPL improves functional correctness on RTLLM, yet the manuscript supplies no description of the experimental protocol (prompt templates, number of LLM calls per task, temperature, decoding strategy, definition of 'functional correctness,' or statistical measures). Without these details the central empirical claim cannot be assessed.
  2. [§3] §3 (CPPL IR and compiler): the claim that the JSON IR plus width-inference and port-binding checks 'can accept LLM outputs while preserving all needed structure' is load-bearing for the reliability argument, but the manuscript does not demonstrate that the deliberately simplified IR can express parameterized widths, conditional generation, or non-static bindings that appear in realistic designs outside the RTLLM suite.
minor comments (2)
  1. [Abstract] Abstract: the statement that 'CIRCT optimization reduces post-synthesis AIG node counts' should be accompanied by the specific passes used and the magnitude of the reduction.
  2. [§3] Notation: the manuscript should clarify whether the JSON schema for CPPL IR is formally specified or only illustrated by examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating where we agree and the revisions we will make to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the abstract states that CPPL improves functional correctness on RTLLM, yet the manuscript supplies no description of the experimental protocol (prompt templates, number of LLM calls per task, temperature, decoding strategy, definition of 'functional correctness,' or statistical measures). Without these details the central empirical claim cannot be assessed.

    Authors: We agree that the current manuscript lacks sufficient detail on the experimental protocol, which is required to allow readers to assess and reproduce the functional correctness results. In the revised manuscript we will expand the Evaluation section with a full description of the prompt templates, the number of LLM calls per task, temperature and decoding settings, the precise definition of functional correctness used, and statistical measures including variance or confidence intervals across runs. revision: yes

  2. Referee: [§3] §3 (CPPL IR and compiler): the claim that the JSON IR plus width-inference and port-binding checks 'can accept LLM outputs while preserving all needed structure' is load-bearing for the reliability argument, but the manuscript does not demonstrate that the deliberately simplified IR can express parameterized widths, conditional generation, or non-static bindings that appear in realistic designs outside the RTLLM suite.

    Authors: We acknowledge that the manuscript does not provide explicit demonstrations or examples of parameterized widths, conditional generation, or non-static bindings beyond the RTLLM benchmark. The CPPL IR was intentionally kept minimal to target the structure present in the evaluated tasks while remaining LLM-friendly. In the revision we will add a dedicated limitations paragraph in §3 clarifying the current scope of the IR, noting these features as areas for future extension, and qualifying the reliability claims to the RTLLM setting rather than asserting general coverage of all realistic designs. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical evaluation on external benchmark

full rationale

The paper introduces CPPL as a Python DSL and JSON IR frontend that compiles to CIRCT for Verilog generation. Its central claims are empirical improvements in functional correctness on the public RTLLM benchmark and AIG node count reductions via CIRCT passes. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The compiler's width inference, validation, and lowering steps are described as deterministic and external to the LLM outputs, with no reduction of results to self-defined inputs. The approach relies on independent infrastructure (CIRCT) and benchmark data rather than any self-referential derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on standard assumptions about hardware compilation and LLM capabilities rather than new fitted parameters or invented physical entities.

axioms (2)
  • domain assumption LLMs can reliably generate structured JSON following the CPPL IR schema when prompted appropriately.
    Implicit in the claim that the compiler-mediated interface improves correctness.
  • standard math CIRCT provides correct optimization and Verilog lowering passes for the generated IR.
    Relies on the correctness of the external CIRCT infrastructure.
invented entities (1)
  • CPPL IR (JSON-based circuit representation) no independent evidence
    purpose: To expose compiler-visible structure in a format accessible to LLMs while enabling validation and lowering.
    New intermediate representation introduced by the paper.

pith-pipeline@v0.9.0 · 5799 in / 1260 out tokens · 30717 ms · 2026-05-20T00:41:50.118229+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Qwen3 technical report,

    A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint, 2025

  2. [2]

    Deepseek-v3 technical report,

    A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruanet al., “Deepseek-v3 technical report,”arXiv preprint, 2024

  3. [3]

    Gpt-4 technical report,

    J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint, 2023

  4. [4]

    VerilogEval: Evaluating Large Language Models for Verilog Code Generation,

    M. Liu, N. Pinckney, B. Khailany, and H. Ren, “VerilogEval: Evaluating Large Language Models for Verilog Code Generation,” inProc. ICCAD, 2023

  5. [5]

    RTLLM: An Open-source Bench- mark for Designing RTL Generation with Large Language Model,

    Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “RTLLM: An Open-source Bench- mark for Designing RTL Generation with Large Language Model,” in Proc. ASPDAC, 2024

  6. [6]

    Origen: Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection,

    F. Cui, C. Yin, K. Zhou, Y . Xiao, G. Sun, Q. Xu, Q. Guo, Y . Liang, X. Zhang, D. Songet al., “Origen: Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection,” inProc. ICCAD, 2024

  7. [7]

    BetterV: Controlled Verilog Generation with Discriminative Guidance,

    Z. Pei, H.-L. Zhen, M. Yuan, Y . Huang, and B. Yu, “BetterV: Controlled Verilog Generation with Discriminative Guidance,”Proc. ICML, 2024

  8. [8]

    RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique,

    S. Liu, W. Fang, Y . Lu, J. Wang, Q. Zhang, H. Zhang, and Z. Xie, “RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique,”IEEE TCAD, 2024

  9. [9]

    CIRCT: Circuit IR Compilers and Tools,

    “CIRCT: Circuit IR Compilers and Tools,” https://circt.llvm.org/, 2026

  10. [10]

    RTL++: Graph-Enhanced LLM for RTL Code Generation,

    M. Akyash, K. Azar, and H. Kamali, “RTL++: Graph-Enhanced LLM for RTL Code Generation,” inProc. ICLAD, 2025

  11. [11]

    MAGE: A Multi- Agent Engine for Automated RTL Code Generation,

    Y . Zhao, H. Zhang, H. Huang, Z. Yu, and J. Zhao, “MAGE: A Multi- Agent Engine for Automated RTL Code Generation,” inProc. DAC, 2025

  12. [12]

    Spec2RTL- Agent: Automated Hardware Code Generation from Complex Specifi- cations Using LLM Agent Systems,

    Z. Yu, M. Liu, M. Zimmer, Y . Celine, Y . Liu, and H. Ren, “Spec2RTL- Agent: Automated Hardware Code Generation from Complex Specifi- cations Using LLM Agent Systems,” inProc. ICLAD, 2025

  13. [13]

    SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning,

    Y . Wang, W. Ye, P. Guo, Y . He, Z. Wang, B. Tian, S. He, G. Sun, Z. Shen, S. Chenet al., “SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning,”Proc. NIPS, 2026

  14. [14]

    RTLRewriter: Methodologies for Large Models Aided RTL Code Optimization,

    X. Yao, Y . Wang, X. Li, Y . Lian, R. Chen, L. Chen, M. Yuan, H. Xu, and B. Yu, “RTLRewriter: Methodologies for Large Models Aided RTL Code Optimization,” inProc. ICCAD, 2024

  15. [15]

    ASPEN: LLM-Guided E-Graph Rewriting for RTL Datapath Optimization,

    N. Zhang, C. Deng, J. M. Kuehn, C.-T. Ho, C. Yu, Z. Zhang, and H. Ren, “ASPEN: LLM-Guided E-Graph Rewriting for RTL Datapath Optimization,” inProc. MLCAD, 2025

  16. [16]

    Learning to Debug: LLM-Organized Knowledge Trees for Solving RTL Assertion Failures,

    Y . Bai and H. Ren, “Learning to Debug: LLM-Organized Knowledge Trees for Solving RTL Assertion Failures,”arXiv preprint, 2025

  17. [17]

    HLSDebugger: Identification and Correction of Logic Bugs in HLS Code with LLM Solutions,

    J. Wang, S. Liu, Y . Lu, and Z. Xie, “HLSDebugger: Identification and Correction of Logic Bugs in HLS Code with LLM Solutions,” in Proc. ICCAD, 2025

  18. [18]

    Assertllm: Generating hardware verification assertions from design specifications via multi-llms,

    Z. Yan, W. Fang, M. Li, M. Li, S. Liu, Z. Xie, and H. Zhang, “Assertllm: Generating hardware verification assertions from design specifications via multi-llms,” inProc. ASPDAC, 2025

  19. [19]

    MLIR: Scaling compiler infrastructure for domain specific computation,

    C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. Pienaar, R. Riddle, T. Shpeisman, N. Vasilache, and O. Zinenko, “MLIR: Scaling compiler infrastructure for domain specific computation,” inProc. CGO, 2021

  20. [20]

    Assassyn: A Unified Abstraction for Architectural Simulation and Implementation,

    J. Weng, B. Han, D. Gao, R. Gao, W. Zhang, A. Zhong, C. Xu, J. Xin, Y . Luo, L. W. Willset al., “Assassyn: A Unified Abstraction for Architectural Simulation and Implementation,” inProc. ISCA, 2025

  21. [21]

    Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic EHDL and Synthesis,

    Y . Xiao, Z. Luo, K. Zhou, and Y . Liang, “Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic EHDL and Synthesis,” in Proc. FPGA, 2024

  22. [22]

    Chisel: constructing hardware in a scala embedded language,

    J. Bachrach, H. V o, B. Richards, Y . Lee, A. Waterman, R. Avi ˇzienis, J. Wawrzynek, and K. Asanovi ´c, “Chisel: constructing hardware in a scala embedded language,” inProc. DAC, 2012

  23. [23]

    A Compiler Infrastruc- ture For Accelerator Generators,

    R. Nigam, S. Thomas, Z. Li, and A. Sampson, “A Compiler Infrastruc- ture For Accelerator Generators,” inProc. ASPLOS, 2021

  24. [24]

    PipeRTL: Timing-Aware Pipeline Optimization at IR-Level for RTL Generation,

    S. Yin, F. Liu, L. Zou, R. Fu, W. Zhao, C. Bai, T.-Y . Ho, Y . Xie, and B. Yu, “PipeRTL: Timing-Aware Pipeline Optimization at IR-Level for RTL Generation,”arXiv preprint, 2026

  25. [25]

    CombRewriter: Enabling Combinational Logic Simplification in MLIR-Based Hardware Com- piler,

    H. Zheng, Z. He, S. Yin, Y . Ma, and B. Yu, “CombRewriter: Enabling Combinational Logic Simplification in MLIR-Based Hardware Com- piler,” inProc. ASPDAC, 2026

  26. [26]

    LLHD: A Multi-level Intermediate Representation For Hardware Description Languages,

    F. Schuiki, A. Kurth, T. Grosser, and L. Benini, “LLHD: A Multi-level Intermediate Representation For Hardware Description Languages,” in Proc. PLDI, 2020

  27. [27]

    Khronos: Fusing Memory Access for Improved Hardware RTL Simulation,

    K. Zhou, Y . Liang, Y . Lin, R. Wang, and R. Huang, “Khronos: Fusing Memory Access for Improved Hardware RTL Simulation,” in Proc. MICRO, 2023. 8

  28. [28]

    LLVM: A compilation framework for lifelong program analysis & transformation,

    C. Lattner and V . Adve, “LLVM: A compilation framework for lifelong program analysis & transformation,” inProc. CGO, 2004

  29. [29]

    Can Large Language Models Understand Intermediate Representations in Compilers?

    H. Jiang, J. Zhu, Y . Wan, B. Fang, H. Zhang, R. Jin, and Q. Guan, “Can Large Language Models Understand Intermediate Representations in Compilers?” inProc. ICML, 2025

  30. [30]

    Appl: A prompt programming language for harmonious integration of programs and large language model prompts,

    H. Dong, Q. Su, Y . Gao, Z. Li, Y . Ruan, G. Pekhimenko, C. J. Maddison, and X. Si, “Appl: A prompt programming language for harmonious integration of programs and large language model prompts,” inProc. ACL, 2025

  31. [31]

    Sglang: Efficient execution of structured language model programs,

    L. Zheng, L. Yin, Z. Xie, C. Sun, J. Huang, C. H. Yu, S. Cao, C. Kozyrakis, I. Stoica, J. E. Gonzalezet al., “Sglang: Efficient execution of structured language model programs,”Proc. NIPS, 2024

  32. [32]

    Efficient memory management for large language model serving with pagedattention,

    W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Stoica, “Efficient memory management for large language model serving with pagedattention,” inProc. SOSP, 2023

  33. [33]

    LlamaIndex,

    “LlamaIndex,” https://github.com/jerryjliu/llama index, 2022

  34. [34]

    Evaluating large language models trained on code,

    M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockmanet al., “Evaluating large language models trained on code,”arXiv preprint, 2021

  35. [35]

    The ICARUS Verilog Compilation System,

    S. Williams, “The ICARUS Verilog Compilation System,” https:// steveicarus.github.io/iverilog/, 2002

  36. [36]

    Yosys-a free verilog synthesis suite,

    C. Wolf, J. Glaser, and J. Kepler, “Yosys-a free verilog synthesis suite,” https://yosyshq.net/yosys/, 2013

  37. [37]

    Logic synthesis with generative deep neural networks,

    X. Li, X. Li, L. Chen, X. Zhang, M. Yuan, and J. Wang, “Logic synthesis with generative deep neural networks,”arXiv preprint, 2024. 9