pith. sign in

arxiv: 2604.03079 · v1 · submitted 2026-04-03 · 💻 cs.AR

EEspice: A Modular Circuit Simulation Platform with Parallel Device Model Evaluation via Graph Coloring

Pith reviewed 2026-05-13 19:11 UTC · model grok-4.3

classification 💻 cs.AR
keywords circuit simulationSPICEparallel computinggraph coloringMOSFET modelingmatrix stampingmulti-core speedupmodular architecture
0
0 comments X

The pith

Graph coloring partitions MOSFETs into independent groups that can be evaluated and stamped in parallel, removing the serial matrix-update bottleneck in SPICE simulators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Repeated SPICE runs are now required inside AI-driven sizing loops, yet the stamping phase where device contributions are added to the shared system matrix remains serial because multiple cores would otherwise write to the same entries. EEspice decouples model evaluation into replaceable kernels and applies graph coloring to the MOSFET instances so that devices whose matrix contributions do not overlap form separate color classes. These classes are processed concurrently with no synchronization inside the stamping step. Measurements on a 64-core workstation show up to 45 times speedup over single-thread execution when the circuit produces large independent color groups. Speedup scales with core count and varies directly with circuit topology.

Core claim

By constructing a conflict graph whose vertices are MOSFET instances and whose edges connect devices that share Jacobian-matrix locations, graph coloring yields a partition into independent color groups; each group can be model-evaluated and stamped without inter-core contention, allowing the entire nonlinear iteration to run with near-linear scaling on many cores.

What carries the argument

Graph coloring of the device-conflict graph to produce independent color classes that execute model evaluation and matrix stamping in parallel.

If this is right

  • Matrix stamping no longer forces serial execution inside each Newton iteration.
  • Speedup grows with available cores provided color classes remain balanced.
  • Device-model kernels can be swapped without touching the parallel stamping layer.
  • Simulation time becomes sensitive to the chromatic number and class sizes of the device graph.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same coloring approach could reduce assembly time in any sparse-matrix code that assembles contributions from independent objects.
  • For optimization loops that call the simulator thousands of times, the reported speedups would translate directly into shorter design cycles.
  • Dynamic recoloring at each bias point might further improve performance on circuits whose operating regions change markedly during simulation.

Load-bearing premise

Real circuit topologies produce enough devices with non-overlapping matrix contributions that the parallel stamping gains exceed the cost of coloring and any modular-kernel overhead.

What would settle it

Run the simulator on a small, densely connected netlist in which every MOSFET shares at least one matrix row with every other MOSFET; if wall-clock time shows no improvement or becomes slower than the single-thread baseline, the parallel-stamping claim fails.

Figures

Figures reproduced from arXiv: 2604.03079 by Danial Chitnis, Xuanhao Bao.

Figure 3
Figure 3. Figure 3: Example of mapping of a 5-transistor Operational Transconductance [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Accuracy comparison versus Ngspice: (a) BSIM4 FET DC [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: BSIM4 evaluation time breakdown for thread count [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Scaling of coloring-based parallel stamping with thread count and [PITH_FULL_IMAGE:figures/full_fig_p003_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Transient comparison for bit 0 of the 64-bit ripple-carry adder. Node [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗
read the original abstract

As modern analogue/mixed-signal design increasingly relies on optimization-in-the-loop flows, such as AI and LLM-based sizing agents that repeatedly invoke SPICE-efficient, accurate high-performance simulators have become an indispensable foundation for modern integrated circuit (IC) design. However, the computational cost of evaluating nonlinear models, particularly for BSIM models, remains a significant bottleneck. In standard parallelization approaches, devices such as transistors are easily distributed across processors. The subsequent stamping phase, where each device's contributions are added to the shared system matrix, often creates a bottleneck. Because multiple processor cores compete to update the same matrix elements simultaneously, the system is forced to process tasks one at a time to avoid errors. This paper introduces EEspice, an open-source circuit simulation framework whose modular architecture decouples device model evaluation into independently replaceable kernels, enabling a parallel stamping strategy that overcomes this bottleneck. It partitions MOSFET instances into independent color groups, which can be processed in parallel. Our results show that on a 64-core workstation, the proposed approach achieves up to 45x speedup over single-thread performance when conflicts are low. Our analysis also explores how performance depends on circuit topology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces EEspice, an open-source modular circuit simulation framework that decouples device model evaluation into independently replaceable kernels and employs graph coloring to partition MOSFET instances into independent color groups for parallel stamping of contributions into the shared system matrix. It reports concrete wall-clock speedups of up to 45x over single-thread performance on a 64-core workstation when conflicts are low, together with an analysis of how performance depends on circuit topology.

Significance. If the parallel stamping strategy can be shown to deliver net gains on representative analog and mixed-signal netlists without offsetting costs from coloring or modular dispatch, the work would provide a practical foundation for accelerating repeated SPICE evaluations inside optimization-in-the-loop flows.

major comments (2)
  1. [Abstract] Abstract: The headline claim of up to 45x speedup is conditioned on low conflicts, yet the manuscript supplies no quantitative data on observed conflict rates, color-class sizes, or stamping overhead for any concrete benchmark circuits, leaving the practical scope of the result unanchored.
  2. [Results] Results section: No benchmark netlists, error metrics, baseline comparisons, or scaling measurements of the graph-coloring overhead are reported, so the central performance claim cannot be verified beyond the single stated number.
minor comments (1)
  1. [Architecture] The modular kernel interface is mentioned but never illustrated; a short diagram or pseudocode listing the expected device-model API would clarify how kernels are swapped without affecting the parallel stamping path.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We agree that the current manuscript would benefit from additional quantitative data on conflict rates, color-class statistics, benchmark netlists, error metrics, and overhead scaling to better anchor the performance claims. We will revise the manuscript to address these points directly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim of up to 45x speedup is conditioned on low conflicts, yet the manuscript supplies no quantitative data on observed conflict rates, color-class sizes, or stamping overhead for any concrete benchmark circuits, leaving the practical scope of the result unanchored.

    Authors: We agree that quantitative metrics on conflict rates, color-class sizes, and stamping overhead are necessary to substantiate the practical scope of the 45x claim. In the revised manuscript we will add a dedicated subsection (and accompanying table) in the Results section that reports, for each evaluated circuit, the number of colors, average and maximum color-class sizes, measured conflict rate, and the wall-clock overhead of the graph-coloring and parallel-stamping phases relative to the total simulation time. revision: yes

  2. Referee: [Results] Results section: No benchmark netlists, error metrics, baseline comparisons, or scaling measurements of the graph-coloring overhead are reported, so the central performance claim cannot be verified beyond the single stated number.

    Authors: We acknowledge that the original Results section presented only a single headline speedup figure and a high-level topology dependence statement without the supporting details requested. We will expand the section to (1) list the concrete benchmark netlists (including transistor counts and topology characteristics), (2) report error metrics (e.g., voltage and current differences versus a reference serial simulator), (3) include baseline comparisons against both a standard single-thread SPICE run and at least one other parallelization approach, and (4) add scaling plots and tables that isolate the graph-coloring overhead as a function of circuit size and core count. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation or performance claims

full rationale

The paper describes an implemented modular simulator using graph coloring to partition device evaluations for parallel stamping. Its central result is an empirical wall-clock speedup (up to 45x on 64 cores under low-conflict conditions) obtained from direct measurements on the running code. No equations, fitted parameters, or first-principles derivations appear in the provided text; the speedup is not obtained by renaming a fitted quantity or by reducing to a self-citation chain. The topology dependence is explicitly stated as a measured observation rather than a self-referential definition, leaving the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters, invented entities, or ad-hoc axioms are described; the approach relies on standard graph-coloring algorithms and multi-threaded matrix updates.

axioms (1)
  • standard math Standard assumptions of multi-core shared-memory systems with atomic or lock-free updates when color groups are independent
    Invoked implicitly when claiming conflict-free parallel stamping

pith-pipeline@v0.9.0 · 5501 in / 1138 out tokens · 45960 ms · 2026-05-13T19:11:18.336264+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Spice2: A computer program to simulate semiconductor circuits,

    L. W. Nagel, “Spice2: A computer program to simulate semiconductor circuits,”College of Engineering, University of California, Berkeley, 1975

  2. [2]

    EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit,

    C. Liu and D. Chitnis, “EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit,”IEEE Transactions on Circuits and Systems I: Regular Papers, pp. 1–10, 2025

  3. [3]

    Solutions for mixed-signal soc verification using real number models,

    S. Balasubramanian and P. Hardee, “Solutions for mixed-signal soc verification using real number models,”Cadence Design Systems, pp. 1–4, 2013

  4. [4]

    The modified nodal approach to network analysis,

    C.-W. Ho, A. Ruehli, and P. Brennan, “The modified nodal approach to network analysis,”IEEE Transactions on circuits and systems, vol. 22, no. 6, pp. 504–509, 1975

  5. [5]

    Performance comparison of single-precision spice model-evaluation on fpga, gpu, cell, and multi-core processors,

    N. Kapre and A. DeHon, “Performance comparison of single-precision spice model-evaluation on fpga, gpu, cell, and multi-core processors,” in2009 International Conference on Field Programmable Logic and Applications, 2009, pp. 65–72

  6. [6]

    Massive parallelization of spice de- vice model evaluation on gpu-based simd architectures,

    A. M. Bayoumi and Y . Y . Hanafy, “Massive parallelization of spice de- vice model evaluation on gpu-based simd architectures,” inProceedings of the 1st international forum on Next-generation multicore/manycore technologies, 2008, pp. 1–5

  7. [7]

    Tinyspice: A parallel spice simulator on gpu for massively repeated small circuit simulations,

    L. Han, X. Zhao, and Z. Feng, “Tinyspice: A parallel spice simulator on gpu for massively repeated small circuit simulations,” inProceedings of the 50th Annual Design Automation Conference, 2013, pp. 1–8

  8. [8]

    Xyce: Open source simulation for large-scale circuits

    J. Verley, E. R. Keiter, and H. K. Thornquist, “Xyce: Open source simulation for large-scale circuits.” Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2018

  9. [9]

    Ngspice user’s manual version 34 (ngspice release version),

    H. V ogt, G. Atkinson, P. Nenzi, and D. Warning, “Ngspice user’s manual version 34 (ngspice release version),” 2021

  10. [10]

    A parallel graph coloring heuristic,

    M. T. Jones and P. E. Plassmann, “A parallel graph coloring heuristic,” SIAM Journal on Scientific Computing, vol. 14, no. 3, pp. 654–669, 1993

  11. [11]

    Greedy algorithms: a review and open problems,

    A. Garc ´ıa, “Greedy algorithms: a review and open problems,”Journal of Inequalities and Applications, vol. 2025, no. 1, p. 11, 2025

  12. [12]

    Estimation of sparse jacobian matrices and graph coloring problems,

    T. F. Coleman and J. J. Mor ´e, “Estimation of sparse jacobian matrices and graph coloring problems,”SIAM journal on Numerical Analysis, vol. 20, no. 1, pp. 187–209, 1983

  13. [13]

    Graph coloring algorithms for multi-core and massively multithreaded architectures,

    ¨U. V . C ¸ ataly¨urek, J. Feo, A. H. Gebremedhin, M. Halappanavar, and A. Pothen, “Graph coloring algorithms for multi-core and massively multithreaded architectures,”Parallel Computing, vol. 38, no. 10-11, pp. 576–594, 2012

  14. [14]

    Parallel assembly of finite element matrices on multicore computers,

    P. Krysl, “Parallel assembly of finite element matrices on multicore computers,”Computer Methods in Applied Mechanics and Engineering, vol. 428, p. 117076, 2024

  15. [15]

    Bsim-cmg: Standard finfet compact model for advanced circuit design,

    J. P. Duarte, S. Khandelwal, A. Medury, C. Hu, P. Kushwaha, H. Agar- wal, A. Dasgupta, and Y . S. Chauhan, “Bsim-cmg: Standard finfet compact model for advanced circuit design,” inESSCIRC Conference 2015-41st European Solid-State Circuits Conference (ESSCIRC). IEEE, 2015, pp. 196–201

  16. [16]

    Low-power pass-transistor logic-based full adder and 8-bit multiplier,

    N. Yin, W. Pan, Y . Yu, C. Tang, and Z. Yu, “Low-power pass-transistor logic-based full adder and 8-bit multiplier,”Electronics, vol. 12, no. 15, p. 3209, 2023