EEspice: A Modular Circuit Simulation Platform with Parallel Device Model Evaluation via Graph Coloring

Danial Chitnis; Xuanhao Bao

arxiv: 2604.03079 · v1 · submitted 2026-04-03 · 💻 cs.AR

EEspice: A Modular Circuit Simulation Platform with Parallel Device Model Evaluation via Graph Coloring

Xuanhao Bao , Danial Chitnis This is my paper

Pith reviewed 2026-05-13 19:11 UTC · model grok-4.3

classification 💻 cs.AR

keywords circuit simulationSPICEparallel computinggraph coloringMOSFET modelingmatrix stampingmulti-core speedupmodular architecture

0 comments

The pith

Graph coloring partitions MOSFETs into independent groups that can be evaluated and stamped in parallel, removing the serial matrix-update bottleneck in SPICE simulators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Repeated SPICE runs are now required inside AI-driven sizing loops, yet the stamping phase where device contributions are added to the shared system matrix remains serial because multiple cores would otherwise write to the same entries. EEspice decouples model evaluation into replaceable kernels and applies graph coloring to the MOSFET instances so that devices whose matrix contributions do not overlap form separate color classes. These classes are processed concurrently with no synchronization inside the stamping step. Measurements on a 64-core workstation show up to 45 times speedup over single-thread execution when the circuit produces large independent color groups. Speedup scales with core count and varies directly with circuit topology.

Core claim

By constructing a conflict graph whose vertices are MOSFET instances and whose edges connect devices that share Jacobian-matrix locations, graph coloring yields a partition into independent color groups; each group can be model-evaluated and stamped without inter-core contention, allowing the entire nonlinear iteration to run with near-linear scaling on many cores.

What carries the argument

Graph coloring of the device-conflict graph to produce independent color classes that execute model evaluation and matrix stamping in parallel.

If this is right

Matrix stamping no longer forces serial execution inside each Newton iteration.
Speedup grows with available cores provided color classes remain balanced.
Device-model kernels can be swapped without touching the parallel stamping layer.
Simulation time becomes sensitive to the chromatic number and class sizes of the device graph.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coloring approach could reduce assembly time in any sparse-matrix code that assembles contributions from independent objects.
For optimization loops that call the simulator thousands of times, the reported speedups would translate directly into shorter design cycles.
Dynamic recoloring at each bias point might further improve performance on circuits whose operating regions change markedly during simulation.

Load-bearing premise

Real circuit topologies produce enough devices with non-overlapping matrix contributions that the parallel stamping gains exceed the cost of coloring and any modular-kernel overhead.

What would settle it

Run the simulator on a small, densely connected netlist in which every MOSFET shares at least one matrix row with every other MOSFET; if wall-clock time shows no improvement or becomes slower than the single-thread baseline, the parallel-stamping claim fails.

Figures

Figures reproduced from arXiv: 2604.03079 by Danial Chitnis, Xuanhao Bao.

**Figure 2.** Figure 2: Accuracy comparison versus Ngspice: (a) BSIM4 FET DC [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 5.** Figure 5: BSIM4 evaluation time breakdown for thread count [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗

**Figure 6.** Figure 6: Scaling of coloring-based parallel stamping with thread count and [PITH_FULL_IMAGE:figures/full_fig_p003_6.png] view at source ↗

**Figure 7.** Figure 7: Transient comparison for bit 0 of the 64-bit ripple-carry adder. Node [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗

read the original abstract

As modern analogue/mixed-signal design increasingly relies on optimization-in-the-loop flows, such as AI and LLM-based sizing agents that repeatedly invoke SPICE-efficient, accurate high-performance simulators have become an indispensable foundation for modern integrated circuit (IC) design. However, the computational cost of evaluating nonlinear models, particularly for BSIM models, remains a significant bottleneck. In standard parallelization approaches, devices such as transistors are easily distributed across processors. The subsequent stamping phase, where each device's contributions are added to the shared system matrix, often creates a bottleneck. Because multiple processor cores compete to update the same matrix elements simultaneously, the system is forced to process tasks one at a time to avoid errors. This paper introduces EEspice, an open-source circuit simulation framework whose modular architecture decouples device model evaluation into independently replaceable kernels, enabling a parallel stamping strategy that overcomes this bottleneck. It partitions MOSFET instances into independent color groups, which can be processed in parallel. Our results show that on a 64-core workstation, the proposed approach achieves up to 45x speedup over single-thread performance when conflicts are low. Our analysis also explores how performance depends on circuit topology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EEspice adds a modular open-source SPICE platform that parallelizes matrix stamping with graph coloring, claiming up to 45x speedup on 64 cores for low-conflict cases.

read the letter

EEspice introduces a modular circuit simulator that decouples device model evaluation into replaceable kernels and uses graph coloring to partition MOSFET instances for parallel stamping. The main result is the reported 45x wall-clock speedup over single-thread execution on a 64-core machine when conflicts stay low, plus an analysis showing that gains track circuit topology. The modular split is a practical step: it keeps model code independent of the solver, which should make it easier to plug in updated BSIM variants or custom models without touching the matrix assembly path. The graph-coloring strategy directly targets the write-contention bottleneck that usually forces serial stamping even when device evaluations run in parallel. That engineering choice is straightforward and grounded in an actual implementation rather than abstract theory. The paper is honest that performance depends on how many independent color groups a netlist produces. Still, the abstract supplies no benchmark circuits, no quantitative conflict rates for typical analog blocks, no error metrics against a reference simulator, and no breakdown of coloring or dispatch overhead. Without those numbers it is hard to judge how often the low-conflict regime occurs in real mixed-signal designs or whether net gains survive on denser topologies. The basic approach contains no circular fitting or invented parameters. This work is aimed at EDA tool developers and researchers who run repeated high-fidelity SPICE calls inside optimization or AI sizing loops. It deserves peer review so referees can inspect the code, request representative netlist results, and check whether the scaling holds beyond the favorable cases shown.

Referee Report

2 major / 1 minor

Summary. The paper introduces EEspice, an open-source modular circuit simulation framework that decouples device model evaluation into independently replaceable kernels and employs graph coloring to partition MOSFET instances into independent color groups for parallel stamping of contributions into the shared system matrix. It reports concrete wall-clock speedups of up to 45x over single-thread performance on a 64-core workstation when conflicts are low, together with an analysis of how performance depends on circuit topology.

Significance. If the parallel stamping strategy can be shown to deliver net gains on representative analog and mixed-signal netlists without offsetting costs from coloring or modular dispatch, the work would provide a practical foundation for accelerating repeated SPICE evaluations inside optimization-in-the-loop flows.

major comments (2)

[Abstract] Abstract: The headline claim of up to 45x speedup is conditioned on low conflicts, yet the manuscript supplies no quantitative data on observed conflict rates, color-class sizes, or stamping overhead for any concrete benchmark circuits, leaving the practical scope of the result unanchored.
[Results] Results section: No benchmark netlists, error metrics, baseline comparisons, or scaling measurements of the graph-coloring overhead are reported, so the central performance claim cannot be verified beyond the single stated number.

minor comments (1)

[Architecture] The modular kernel interface is mentioned but never illustrated; a short diagram or pseudocode listing the expected device-model API would clarify how kernels are swapped without affecting the parallel stamping path.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We agree that the current manuscript would benefit from additional quantitative data on conflict rates, color-class statistics, benchmark netlists, error metrics, and overhead scaling to better anchor the performance claims. We will revise the manuscript to address these points directly.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim of up to 45x speedup is conditioned on low conflicts, yet the manuscript supplies no quantitative data on observed conflict rates, color-class sizes, or stamping overhead for any concrete benchmark circuits, leaving the practical scope of the result unanchored.

Authors: We agree that quantitative metrics on conflict rates, color-class sizes, and stamping overhead are necessary to substantiate the practical scope of the 45x claim. In the revised manuscript we will add a dedicated subsection (and accompanying table) in the Results section that reports, for each evaluated circuit, the number of colors, average and maximum color-class sizes, measured conflict rate, and the wall-clock overhead of the graph-coloring and parallel-stamping phases relative to the total simulation time. revision: yes
Referee: [Results] Results section: No benchmark netlists, error metrics, baseline comparisons, or scaling measurements of the graph-coloring overhead are reported, so the central performance claim cannot be verified beyond the single stated number.

Authors: We acknowledge that the original Results section presented only a single headline speedup figure and a high-level topology dependence statement without the supporting details requested. We will expand the section to (1) list the concrete benchmark netlists (including transistor counts and topology characteristics), (2) report error metrics (e.g., voltage and current differences versus a reference serial simulator), (3) include baseline comparisons against both a standard single-thread SPICE run and at least one other parallelization approach, and (4) add scaling plots and tables that isolate the graph-coloring overhead as a function of circuit size and core count. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation or performance claims

full rationale

The paper describes an implemented modular simulator using graph coloring to partition device evaluations for parallel stamping. Its central result is an empirical wall-clock speedup (up to 45x on 64 cores under low-conflict conditions) obtained from direct measurements on the running code. No equations, fitted parameters, or first-principles derivations appear in the provided text; the speedup is not obtained by renaming a fitted quantity or by reducing to a self-citation chain. The topology dependence is explicitly stated as a measured observation rather than a self-referential definition, leaving the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters, invented entities, or ad-hoc axioms are described; the approach relies on standard graph-coloring algorithms and multi-threaded matrix updates.

axioms (1)

standard math Standard assumptions of multi-core shared-memory systems with atomic or lock-free updates when color groups are independent
Invoked implicitly when claiming conflict-free parallel stamping

pith-pipeline@v0.9.0 · 5501 in / 1138 out tokens · 45960 ms · 2026-05-13T19:11:18.336264+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Spice2: A computer program to simulate semiconductor circuits,

L. W. Nagel, “Spice2: A computer program to simulate semiconductor circuits,”College of Engineering, University of California, Berkeley, 1975

work page 1975
[2]

EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit,

C. Liu and D. Chitnis, “EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit,”IEEE Transactions on Circuits and Systems I: Regular Papers, pp. 1–10, 2025

work page 2025
[3]

Solutions for mixed-signal soc verification using real number models,

S. Balasubramanian and P. Hardee, “Solutions for mixed-signal soc verification using real number models,”Cadence Design Systems, pp. 1–4, 2013

work page 2013
[4]

The modified nodal approach to network analysis,

C.-W. Ho, A. Ruehli, and P. Brennan, “The modified nodal approach to network analysis,”IEEE Transactions on circuits and systems, vol. 22, no. 6, pp. 504–509, 1975

work page 1975
[5]

Performance comparison of single-precision spice model-evaluation on fpga, gpu, cell, and multi-core processors,

N. Kapre and A. DeHon, “Performance comparison of single-precision spice model-evaluation on fpga, gpu, cell, and multi-core processors,” in2009 International Conference on Field Programmable Logic and Applications, 2009, pp. 65–72

work page 2009
[6]

Massive parallelization of spice de- vice model evaluation on gpu-based simd architectures,

A. M. Bayoumi and Y . Y . Hanafy, “Massive parallelization of spice de- vice model evaluation on gpu-based simd architectures,” inProceedings of the 1st international forum on Next-generation multicore/manycore technologies, 2008, pp. 1–5

work page 2008
[7]

Tinyspice: A parallel spice simulator on gpu for massively repeated small circuit simulations,

L. Han, X. Zhao, and Z. Feng, “Tinyspice: A parallel spice simulator on gpu for massively repeated small circuit simulations,” inProceedings of the 50th Annual Design Automation Conference, 2013, pp. 1–8

work page 2013
[8]

Xyce: Open source simulation for large-scale circuits

J. Verley, E. R. Keiter, and H. K. Thornquist, “Xyce: Open source simulation for large-scale circuits.” Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2018

work page 2018
[9]

Ngspice user’s manual version 34 (ngspice release version),

H. V ogt, G. Atkinson, P. Nenzi, and D. Warning, “Ngspice user’s manual version 34 (ngspice release version),” 2021

work page 2021
[10]

A parallel graph coloring heuristic,

M. T. Jones and P. E. Plassmann, “A parallel graph coloring heuristic,” SIAM Journal on Scientific Computing, vol. 14, no. 3, pp. 654–669, 1993

work page 1993
[11]

Greedy algorithms: a review and open problems,

A. Garc ´ıa, “Greedy algorithms: a review and open problems,”Journal of Inequalities and Applications, vol. 2025, no. 1, p. 11, 2025

work page 2025
[12]

Estimation of sparse jacobian matrices and graph coloring problems,

T. F. Coleman and J. J. Mor ´e, “Estimation of sparse jacobian matrices and graph coloring problems,”SIAM journal on Numerical Analysis, vol. 20, no. 1, pp. 187–209, 1983

work page 1983
[13]

Graph coloring algorithms for multi-core and massively multithreaded architectures,

¨U. V . C ¸ ataly¨urek, J. Feo, A. H. Gebremedhin, M. Halappanavar, and A. Pothen, “Graph coloring algorithms for multi-core and massively multithreaded architectures,”Parallel Computing, vol. 38, no. 10-11, pp. 576–594, 2012

work page 2012
[14]

Parallel assembly of finite element matrices on multicore computers,

P. Krysl, “Parallel assembly of finite element matrices on multicore computers,”Computer Methods in Applied Mechanics and Engineering, vol. 428, p. 117076, 2024

work page 2024
[15]

Bsim-cmg: Standard finfet compact model for advanced circuit design,

J. P. Duarte, S. Khandelwal, A. Medury, C. Hu, P. Kushwaha, H. Agar- wal, A. Dasgupta, and Y . S. Chauhan, “Bsim-cmg: Standard finfet compact model for advanced circuit design,” inESSCIRC Conference 2015-41st European Solid-State Circuits Conference (ESSCIRC). IEEE, 2015, pp. 196–201

work page 2015
[16]

Low-power pass-transistor logic-based full adder and 8-bit multiplier,

N. Yin, W. Pan, Y . Yu, C. Tang, and Z. Yu, “Low-power pass-transistor logic-based full adder and 8-bit multiplier,”Electronics, vol. 12, no. 15, p. 3209, 2023

work page 2023

[1] [1]

Spice2: A computer program to simulate semiconductor circuits,

L. W. Nagel, “Spice2: A computer program to simulate semiconductor circuits,”College of Engineering, University of California, Berkeley, 1975

work page 1975

[2] [2]

EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit,

C. Liu and D. Chitnis, “EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit,”IEEE Transactions on Circuits and Systems I: Regular Papers, pp. 1–10, 2025

work page 2025

[3] [3]

Solutions for mixed-signal soc verification using real number models,

S. Balasubramanian and P. Hardee, “Solutions for mixed-signal soc verification using real number models,”Cadence Design Systems, pp. 1–4, 2013

work page 2013

[4] [4]

The modified nodal approach to network analysis,

C.-W. Ho, A. Ruehli, and P. Brennan, “The modified nodal approach to network analysis,”IEEE Transactions on circuits and systems, vol. 22, no. 6, pp. 504–509, 1975

work page 1975

[5] [5]

Performance comparison of single-precision spice model-evaluation on fpga, gpu, cell, and multi-core processors,

N. Kapre and A. DeHon, “Performance comparison of single-precision spice model-evaluation on fpga, gpu, cell, and multi-core processors,” in2009 International Conference on Field Programmable Logic and Applications, 2009, pp. 65–72

work page 2009

[6] [6]

Massive parallelization of spice de- vice model evaluation on gpu-based simd architectures,

A. M. Bayoumi and Y . Y . Hanafy, “Massive parallelization of spice de- vice model evaluation on gpu-based simd architectures,” inProceedings of the 1st international forum on Next-generation multicore/manycore technologies, 2008, pp. 1–5

work page 2008

[7] [7]

Tinyspice: A parallel spice simulator on gpu for massively repeated small circuit simulations,

L. Han, X. Zhao, and Z. Feng, “Tinyspice: A parallel spice simulator on gpu for massively repeated small circuit simulations,” inProceedings of the 50th Annual Design Automation Conference, 2013, pp. 1–8

work page 2013

[8] [8]

Xyce: Open source simulation for large-scale circuits

J. Verley, E. R. Keiter, and H. K. Thornquist, “Xyce: Open source simulation for large-scale circuits.” Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2018

work page 2018

[9] [9]

Ngspice user’s manual version 34 (ngspice release version),

H. V ogt, G. Atkinson, P. Nenzi, and D. Warning, “Ngspice user’s manual version 34 (ngspice release version),” 2021

work page 2021

[10] [10]

A parallel graph coloring heuristic,

M. T. Jones and P. E. Plassmann, “A parallel graph coloring heuristic,” SIAM Journal on Scientific Computing, vol. 14, no. 3, pp. 654–669, 1993

work page 1993

[11] [11]

Greedy algorithms: a review and open problems,

A. Garc ´ıa, “Greedy algorithms: a review and open problems,”Journal of Inequalities and Applications, vol. 2025, no. 1, p. 11, 2025

work page 2025

[12] [12]

Estimation of sparse jacobian matrices and graph coloring problems,

T. F. Coleman and J. J. Mor ´e, “Estimation of sparse jacobian matrices and graph coloring problems,”SIAM journal on Numerical Analysis, vol. 20, no. 1, pp. 187–209, 1983

work page 1983

[13] [13]

Graph coloring algorithms for multi-core and massively multithreaded architectures,

¨U. V . C ¸ ataly¨urek, J. Feo, A. H. Gebremedhin, M. Halappanavar, and A. Pothen, “Graph coloring algorithms for multi-core and massively multithreaded architectures,”Parallel Computing, vol. 38, no. 10-11, pp. 576–594, 2012

work page 2012

[14] [14]

Parallel assembly of finite element matrices on multicore computers,

P. Krysl, “Parallel assembly of finite element matrices on multicore computers,”Computer Methods in Applied Mechanics and Engineering, vol. 428, p. 117076, 2024

work page 2024

[15] [15]

Bsim-cmg: Standard finfet compact model for advanced circuit design,

J. P. Duarte, S. Khandelwal, A. Medury, C. Hu, P. Kushwaha, H. Agar- wal, A. Dasgupta, and Y . S. Chauhan, “Bsim-cmg: Standard finfet compact model for advanced circuit design,” inESSCIRC Conference 2015-41st European Solid-State Circuits Conference (ESSCIRC). IEEE, 2015, pp. 196–201

work page 2015

[16] [16]

Low-power pass-transistor logic-based full adder and 8-bit multiplier,

N. Yin, W. Pan, Y . Yu, C. Tang, and Z. Yu, “Low-power pass-transistor logic-based full adder and 8-bit multiplier,”Electronics, vol. 12, no. 15, p. 3209, 2023

work page 2023