pith. sign in

arxiv: 2603.17099 · v1 · submitted 2026-03-17 · 💻 cs.PL · cs.AR

Vectorization of Verilog Designs and its Effects on Verification and Synthesis

Pith reviewed 2026-05-19 06:09 UTC · model grok-4.3

classification 💻 cs.PL cs.AR
keywords Verilogvectorizationformal verificationJasperCIRCTChiBenchelaboration timememory consumption
0
0 comments X

The pith

A Verilog vectorizer built on CIRCT lets formal verification tools treat buses as single symbolic entities instead of many wires.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a vectorizer for Verilog that converts multiple scalar assignments into single vector assignments using recognized patterns such as inverted values, complex expressions, and cross-module connections. Implemented on the CIRCT infrastructure, the transformation leaves the final hardware unchanged but lowers the number of distinct symbols that verification tools must track. Formal tools like Jasper reason over Boolean functions and state spaces rather than individual wires, so fewer symbols directly reduce the work required during elaboration. Tests on 1,157 designs from the ChiBench suite show the vectorized versions reach the same conclusions faster and with less memory.

Core claim

The vectorizer detects several common assignment patterns in Verilog and rewrites them using vector notation while preserving exact semantics. When the resulting designs are fed to the Jasper formal verification tool, symbolic complexity drops because a bus can be handled as one entity rather than many parallel signals. On the full ChiBench collection this produces a measured 28.12 percent reduction in elaboration time and a 51.30 percent reduction in memory consumption.

What carries the argument

A pattern-matching Verilog vectorizer on CIRCT that rewrites scalar assignments into vector form for inverted, complex-expression, and inter-module cases.

If this is right

  • Formal verification tools can handle larger designs because the symbolic state space shrinks.
  • Elaboration phases finish faster and consume substantially less memory.
  • The same hardware is produced, so existing synthesis flows remain compatible.
  • Multiple distinct assignment patterns can be vectorized in one pass.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same vectorization step could be applied to other symbolic analysis tools that operate on Boolean or state representations.
  • Designers might begin writing more vector-oriented code once tools reward that style during verification.
  • Integration into open-source HDL flows would let teams measure the effect on their own large designs without new tool licenses.

Load-bearing premise

The recognized patterns must preserve exact semantic equivalence for every downstream synthesis and verification tool.

What would settle it

Any design in which the vectorized Verilog produces a different verification outcome or synthesized netlist than the original scalar version.

Figures

Figures reproduced from arXiv: 2603.17099 by Augusto Amaral Mafra, Fernando Magno Quint\~ao Pereira, Ian Trudel, Jo\~ao Victor Amorim Vieira, Maria Fernanda Oliveira Guimar\~aes, Mirlaine Crepalde, Ulisses Rosa.

Figure 1
Figure 1. Figure 1: General bit permutation. As Section 3.3 will explain, the vectorizer detects bit￾permutation patterns by tracing the origin of each output bit and verifying that all bits originate from the same source vec￾tor, with no duplication. Such permutations are compacted into single vector assignments expressed using concatena￾tion and slicing. This transformation preserves the original wiring semantics, but reduc… view at source ↗
Figure 2
Figure 2. Figure 2: Intermodule vectorization. arithmetic expressions that operate independently on each bit of one or more input vectors. Typical examples include conditional selects, logical combinations, or arithmetic oper￾ations expressed as per-bit assignments. module pattern_recognition( output wire [3:0] result, input wire [3:0] a, b, input wire sel ); assign result[3] = (a[3] & sel) | (b[3] & ~sel); assign result[2] =… view at source ↗
Figure 4
Figure 4. Figure 4: The vectorization pipeline. 3.1 CIRCT: The Underlying Infrastructure CIRCT is an open-source compiler infrastructure for hard￾ware design, built on top of MLIR [8]. It supports languages such as Verilog and Chisel. CIRCT comprises multiple domain￾specific intermediate representations, each implemented as an MLIR dialect. This paper’s implementation of vectoriza￾tion takes place within the Hardware (hw) dia… view at source ↗
Figure 3
Figure 3. Figure 3: Complex logic and arithmetic patterns. Section 3.4 explains how the complex patterns of Exam￾ple 2.3 are recognized. While these patterns cannot be iden￾tified as simple permutations, the vectorizer detects them by verifying that each output bit is computed by an independent and structurally equivalent logic cone. When this condition holds, the repeated scalar expressions are merged into a sin￾gle vector-l… view at source ↗
Figure 5
Figure 5. Figure 5: (a) Verilog design. (b) Specification in CIRCT. 3.2 Selective Inlining The analyses and transformations described in Sections 3.3, 3.4, and 3.5 are intra-module; that is, they operate within the boundaries of a single Verilog module. To enable inter-module vectorization, we employ selective inlining of submodules. Inlining consists of copying the body of a module into the point where it is instantiated. By… view at source ↗
Figure 6
Figure 6. Figure 6: illustrates this process. The bit-origin analysis of Algorithm 1 detects in the comb program the permutation corresponding to the assignment out = {in[0], in[3], in[2], in[1]}. During the greedy grouping phase, the al￾gorithm identifies that bits in[3:1] form a contiguous slice that can be emitted as a single operation1 . Finally, the vector￾ized representation is emitted back into Verilog. The result￾ing … view at source ↗
Figure 7
Figure 7. Figure 7: Computation of logic cone. 3.4.1 Grouping of Logic Cones. Structural Analysis de￾termines whether a set of bit-level operations can be safely collapsed into a single vector operation by verifying if: 1. The cones of all output bits are independent (no cross￾bit dependencies); 2. The cones are isomorphic (identical structure with a uniform bit stride). When these two conditions hold, the vectorizer replaces… view at source ↗
Figure 8
Figure 8. Figure 8: Partial vectorization of two chunks of bits. Theorem 3.10. The greedy segmentation process implemented in Algorithm 4 has a worst-case time complexity of O (𝑁 2 · 𝑇 max check), which simplifies to Θ(𝑁 3 ) when the vectorization pred￾icate 𝑇check runs in linear time relative to the vector width 𝑁. Proof: The algorithm identifies vectorizable sub-ranges by employing a nested loop struc￾ture. An outer loop it… view at source ↗
Figure 9
Figure 9. Figure 9: Scatter plot comparing CIRCT instruction counts before and after vectorization. Points below the diagonal in￾dicate reductions, points above indicate increases, and points on the diagonal indicate no change [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: shows the distribution of instruction-count re￾ductions among the 904 designs that improved. Most re￾ductions range between 40% and 50%, but some designs achieved nearly 100% reduction. The four most dramatic improvements were observed in the following benchmarks: Chi_11888: 3,837 instructions reduced to 3 (99.92%); Chi_ 11336: 2,283 reduced to 3 (99.87%); Chi_1203: 17,149 reduced to 30 (99.82%); and Chi_… view at source ↗
Figure 12
Figure 12. Figure 12: Distribution of Jasper’s elaboration memory (KB) for 1,157 original and vectorized designs. A similar pattern is observed for HDL elaboration runtime, as shown in [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Distribution of Jasper HDL elaboration runtime (s) for the 1,157 original and vectorized designs. 4.4 RQ4: Impact on Genus The Cadence Genus Synthesis Solution is a physically aware RTL-to-gate synthesis tool that optimizes digital designs for power, performance, and area while ensuring timing closure. It processes a Verilog design through a two-stage pipeline. In the first stage, elaboration, Genus parse… view at source ↗
Figure 11
Figure 11. Figure 11: Asymptotic behavior of the vectorization pass in a log-log scale. The data points form an approximately straight line with slope ≈ 1, indicating linear time complexity. 4.3 RQ3: Impact on Jasper Jasper is a formal verification platform that provides a suite of applications for C/C++ and RTL-level analysis, enabling designers to identify errors early in the development cycle. Within this framework, the RTL… view at source ↗
Figure 15
Figure 15. Figure 15: Impact of the analyses in Section 3 on the number of vectorized patterns. Inlining was applicable in 289 out of 1,157 designs. that effective vectorization typically requires multiple com￾plementary analyses, as mixed-mode strategies account for the vast majority of recognized patterns. 4.6 RQ6: Inlining and Vectorization Section 3.2 discussed selective inlining: a methodology that inlines modules to enab… view at source ↗
Figure 14
Figure 14. Figure 14: The 20 largest observed improvements across three phases of the Genus pipeline. Each bar refers to a different benchmark. Benchmarks vary across the figures. 4.5 RQ5: Impact of Individual Analyses Each technique discussed in Section 3 contributes differently to the overall effectiveness of vectorization. This section quantifies their individual impact. Discussion [PITH_FULL_IMAGE:figures/full_fig_p010_14.png] view at source ↗
Figure 16
Figure 16. Figure 16: Impact of the inlining threshold discussed in Section 3.2 on the number of vectorization opportunities [PITH_FULL_IMAGE:figures/full_fig_p010_16.png] view at source ↗
read the original abstract

Vectorization is a compiler optimization that replaces multiple operations on scalar values with a single operation on vector values. Although common in traditional compilers such as rustc, clang, and gcc, vectorization is not common in the Verilog ecosystem. This happens because, even though Verilog supports vector notation, the language provides no semantic guarantee that a vectorized signal behaves as a word-level entity: synthesis tools still resolve multiple individual assignments and a single vector assignment into the same set of parallel wire connections. However, vectorization brings important benefits in other domains. In particular, it reduces symbolic complexity even when the underlying hardware remains unchanged. Formal verification tools such as Cadence Jasper operates at the symbolic level: they reason about Boolean functions, state transitions, and equivalence classes, rather than about individual wires or gates. When these tools can treat a bus as a single symbolic entity, they scale more efficiently. This paper supports this observation by introducing a Verilog vectorizer. The vectorizer, built on top of the CIRCT compilation infrastructure, recognizes several vectorization patterns, including inverted assignments, assignments involving complex expressions, and inter-module assignments. It has been experimented with some Electronic design automation (EDA) tools, and for Jasper tool, it improves elaboration time by 28.12% and reduces memory consumption by 51.30% on 1,157 designs from the ChiBench collection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a Verilog vectorizer built on the CIRCT infrastructure that recognizes three families of patterns (inverted assignments, assignments involving complex expressions, and inter-module assignments) and transforms scalar designs into vectorized forms. The central claim is that this vectorization improves elaboration time by 28.12% and reduces memory consumption by 51.30% for the Jasper formal verification tool on 1,157 designs from the ChiBench collection.

Significance. If the transformations preserve exact bit-level semantics, the reported reductions could meaningfully improve scalability of symbolic formal verification tools by allowing them to reason about word-level entities rather than individual wires, without altering the underlying hardware. The work applies a standard compiler optimization to the Verilog/EDA domain in a way that directly targets tool performance rather than synthesis quality.

major comments (2)
  1. Abstract: The headline performance numbers (28.12% elaboration time improvement and 51.30% memory reduction) are presented without any description of experimental controls, baseline (e.g., non-vectorized CIRCT flow), statistical significance testing, or variance across the 1,157 designs, leaving the central empirical claim only partially supported.
  2. Vectorization patterns description: Semantic equivalence between the original scalar Verilog and the vectorized form is asserted for the three pattern families but is not demonstrated via machine-checked proofs, exhaustive equivalence checking results on the benchmark set, or even a counter-example search. This is load-bearing because Jasper’s symbolic engine can only exploit the vector representation when the Boolean function is identical; any undetected divergence would invalidate both correctness and the measured speed-ups.
minor comments (2)
  1. Abstract: The phrase 'experimented with some Electronic design automation (EDA) tools' is vague; clarify which additional tools beyond Jasper were evaluated and why only Jasper results are reported.
  2. The manuscript would benefit from a short table summarizing the three pattern families with one concrete before/after Verilog example for each.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to strengthen the presentation of results and the justification of semantic equivalence.

read point-by-point responses
  1. Referee: Abstract: The headline performance numbers (28.12% elaboration time improvement and 51.30% memory reduction) are presented without any description of experimental controls, baseline (e.g., non-vectorized CIRCT flow), statistical significance testing, or variance across the 1,157 designs, leaving the central empirical claim only partially supported.

    Authors: We agree that the abstract is concise and omits methodological details. The revised abstract now briefly states that results are obtained by comparing the vectorized CIRCT flow against the standard non-vectorized CIRCT baseline on the complete set of 1,157 ChiBench designs. Full experimental controls, setup, and per-design data are described in Section 4. While formal statistical significance tests were not applied, the improvements were consistent; we have added a sentence noting the observed range and average variance in the results section. revision: yes

  2. Referee: Vectorization patterns description: Semantic equivalence between the original scalar Verilog and the vectorized form is asserted for the three pattern families but is not demonstrated via machine-checked proofs, exhaustive equivalence checking results on the benchmark set, or even a counter-example search. This is load-bearing because Jasper’s symbolic engine can only exploit the vector representation when the Boolean function is identical; any undetected divergence would invalidate both correctness and the measured speed-ups.

    Authors: We acknowledge that a stronger demonstration of equivalence would increase confidence. Each pattern family is a local, semantics-preserving rewrite by construction: inverted assignments replace per-bit negation with a single vector negation (identical Boolean function), complex expressions are vectorized only when all bits undergo identical operations, and inter-module assignments preserve exact connectivity and widths. In the revision we have added a dedicated subsection with small illustrative examples for each family and report results of equivalence checking (via an external solver) on a random sample of 200 designs; no divergences were found. Exhaustive checking across all 1,157 designs was not performed due to computational cost, but the sampled verification plus the syntactic nature of the transformations support the claim. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical measurements on external benchmarks

full rationale

The paper describes an implementation of a Verilog vectorizer inside CIRCT that recognizes three families of patterns and then reports measured runtime and memory improvements when the transformed designs are fed to Jasper on the external ChiBench suite. These headline numbers (28.12 % elaboration time, 51.30 % memory) are obtained by direct instrumentation of an external tool on 1,157 independent designs; they are not quantities computed from any fitted parameter, self-referential equation, or ansatz that loops back to the paper's own inputs. No derivation chain, uniqueness theorem, or self-citation is invoked to justify the performance claims, so the reported results remain independent of the vectorizer's internal construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that vectorized signals remain functionally identical to their scalar counterparts while reducing the number of symbolic variables seen by verification engines. No free parameters or invented entities are evident from the abstract.

axioms (1)
  • domain assumption Vector assignments in Verilog are semantically equivalent to parallel scalar assignments for synthesis and verification purposes.
    Invoked when claiming that vectorization does not change hardware behavior yet improves tool performance.

pith-pipeline@v0.9.0 · 5814 in / 1043 out tokens · 57699 ms · 2026-05-19T06:09:31.935928+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    Oml-vect: Enabling automatic compiler-driven vectorization of transformers, 2026

    Shreya Alladi, Alberto Ros, and Alexandra Jimborean. Oml-vect: Enabling automatic compiler-driven vectorization of transformers, 2026

  2. [2]

    Allen and S

    R. Allen and S. Johnson. Compiling c for vectorization, parallelization, and inline expansion. InPLDI, PLDI ’88, page 241–249, New York, NY, USA, 1988. Association for Computing Machinery. M. Guimarães, U. Rosa, I. Trudel, J. Vieira, A. Mafra, M. Crepalde, and F. Pereira

  3. [3]

    Runtime pointer disambiguation

    Péricles Alves, Fabian Gruber, Johannes Doerfert, Alexandros Lam- prineas, Tobias Grosser, Fabrice Rastello, and Fernando Magno Quin- tão Pereira. Runtime pointer disambiguation. InOOPSLA, OOPSLA 2015, page 589–606, New York, NY, USA, 2015. Association for Com- puting Machinery

  4. [4]

    Lazy evaluation for the lazy: Automatically transforming call-by-value into call-by-need

    Breno Campos Ferreira Guimarães and Fernando Magno Quintão Pereira. Lazy evaluation for the lazy: Automatically transforming call-by-value into call-by-need. InCompiler Construction, CC 2023, page 239–249, New York, NY, USA, 2023. Association for Computing Machinery

  5. [5]

    Rosen, Mark N

    Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single assignment form and the control dependence graph.ACM Transactions on Pro- gramming Languages and Systems, 13(4):451–490, October 1991

  6. [6]

    Effective word-level interpolation for software veri- fication

    Alberto Griggio. Effective word-level interpolation for software veri- fication. InFMCAD, page 28–36, Austin, Texas, 2011. FMCAD Inc

  7. [7]

    Verifying large multipliers by combining sat and computer algebra

    Daniela Kaufmann, Armin Biere, and Manuel Kauers. Verifying large multipliers by combining sat and computer algebra. In2019 Formal Methods in Computer Aided Design (FMCAD), pages 28–36, 2019

  8. [8]

    Mlir: scaling compiler infrastructure for domain specific computation

    Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasi- lache, and Oleksandr Zinenko. Mlir: scaling compiler infrastructure for domain specific computation. InCGO, page 2–14. IEEE Press, 2021

  9. [9]

    Wenchao Li, Adria Gascon, Pramod Subramanyan, Wei Yang Tan, Ashish Tiwari, Sharad Malik, Natarajan Shankar, and Sanjit A. Seshia. Wordrev: Finding word-level structures in a sea of bit-level gates. In HOST, pages 67–74, 2013

  10. [10]

    Garzarán, Tommy Wong, and David A

    Saeed Maleki, Yaoqing Gao, Maria J. Garzarán, Tommy Wong, and David A. Padua. An evaluation of vectorizing compilers. InPACT, PACT ’11, page 372–382, USA, 2011. IEEE Computer Society

  11. [11]

    Reverse engineering word-level models from look-up table netlists, 2023

    Ram Venkat Narayanan, Aparajithan Nathamuni Venkatesan, Kishore Pula, Sundarakumar Muthukumaran, and Ranga Vemuri. Reverse engineering word-level models from look-up table netlists, 2023

  12. [12]

    Chibench: a bench- mark suite for testing electronic design automation tools, 2024.https: //arxiv.org/abs/2406.06550

    Rafael Sumitani, João Victor Amorim, Augusto Mafra, Mirlaine Crepalde, and Fernando Magno Quintão Pereira. Chibench: a bench- mark suite for testing electronic design automation tools, 2024.https: //arxiv.org/abs/2406.06550

  13. [13]

    Wenxi Wang, Harald Søndergaard, and Peter J. Stuckey. Wombit: A portfolio bit-vector solver using word-level propagation.J. Autom. Reason., 63(3):723–762, October 2019

  14. [14]

    Fast algebraic rewriting based on and-inverter graphs.Trans

    Cunxi Yu, Maciej Ciesielski, and Alan Mishchenko. Fast algebraic rewriting based on and-inverter graphs.Trans. Comp.-Aided Des. Integ. Cir. Sys., 37(9):1907–1911, September 2018

  15. [15]

    Ablego: a function outlining and partial inlining framework: Research articles.Softw

    Peng Zhao and José Nelson Amaral. Ablego: a function outlining and partial inlining framework: Research articles.Softw. Pract. Exper., 37(5):465–491, April 2007