EEspice: A Modular Circuit Simulation Platform with Parallel Device Model Evaluation via Graph Coloring
Pith reviewed 2026-05-13 19:11 UTC · model grok-4.3
The pith
Graph coloring partitions MOSFETs into independent groups that can be evaluated and stamped in parallel, removing the serial matrix-update bottleneck in SPICE simulators.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By constructing a conflict graph whose vertices are MOSFET instances and whose edges connect devices that share Jacobian-matrix locations, graph coloring yields a partition into independent color groups; each group can be model-evaluated and stamped without inter-core contention, allowing the entire nonlinear iteration to run with near-linear scaling on many cores.
What carries the argument
Graph coloring of the device-conflict graph to produce independent color classes that execute model evaluation and matrix stamping in parallel.
If this is right
- Matrix stamping no longer forces serial execution inside each Newton iteration.
- Speedup grows with available cores provided color classes remain balanced.
- Device-model kernels can be swapped without touching the parallel stamping layer.
- Simulation time becomes sensitive to the chromatic number and class sizes of the device graph.
Where Pith is reading between the lines
- The same coloring approach could reduce assembly time in any sparse-matrix code that assembles contributions from independent objects.
- For optimization loops that call the simulator thousands of times, the reported speedups would translate directly into shorter design cycles.
- Dynamic recoloring at each bias point might further improve performance on circuits whose operating regions change markedly during simulation.
Load-bearing premise
Real circuit topologies produce enough devices with non-overlapping matrix contributions that the parallel stamping gains exceed the cost of coloring and any modular-kernel overhead.
What would settle it
Run the simulator on a small, densely connected netlist in which every MOSFET shares at least one matrix row with every other MOSFET; if wall-clock time shows no improvement or becomes slower than the single-thread baseline, the parallel-stamping claim fails.
Figures
read the original abstract
As modern analogue/mixed-signal design increasingly relies on optimization-in-the-loop flows, such as AI and LLM-based sizing agents that repeatedly invoke SPICE-efficient, accurate high-performance simulators have become an indispensable foundation for modern integrated circuit (IC) design. However, the computational cost of evaluating nonlinear models, particularly for BSIM models, remains a significant bottleneck. In standard parallelization approaches, devices such as transistors are easily distributed across processors. The subsequent stamping phase, where each device's contributions are added to the shared system matrix, often creates a bottleneck. Because multiple processor cores compete to update the same matrix elements simultaneously, the system is forced to process tasks one at a time to avoid errors. This paper introduces EEspice, an open-source circuit simulation framework whose modular architecture decouples device model evaluation into independently replaceable kernels, enabling a parallel stamping strategy that overcomes this bottleneck. It partitions MOSFET instances into independent color groups, which can be processed in parallel. Our results show that on a 64-core workstation, the proposed approach achieves up to 45x speedup over single-thread performance when conflicts are low. Our analysis also explores how performance depends on circuit topology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces EEspice, an open-source modular circuit simulation framework that decouples device model evaluation into independently replaceable kernels and employs graph coloring to partition MOSFET instances into independent color groups for parallel stamping of contributions into the shared system matrix. It reports concrete wall-clock speedups of up to 45x over single-thread performance on a 64-core workstation when conflicts are low, together with an analysis of how performance depends on circuit topology.
Significance. If the parallel stamping strategy can be shown to deliver net gains on representative analog and mixed-signal netlists without offsetting costs from coloring or modular dispatch, the work would provide a practical foundation for accelerating repeated SPICE evaluations inside optimization-in-the-loop flows.
major comments (2)
- [Abstract] Abstract: The headline claim of up to 45x speedup is conditioned on low conflicts, yet the manuscript supplies no quantitative data on observed conflict rates, color-class sizes, or stamping overhead for any concrete benchmark circuits, leaving the practical scope of the result unanchored.
- [Results] Results section: No benchmark netlists, error metrics, baseline comparisons, or scaling measurements of the graph-coloring overhead are reported, so the central performance claim cannot be verified beyond the single stated number.
minor comments (1)
- [Architecture] The modular kernel interface is mentioned but never illustrated; a short diagram or pseudocode listing the expected device-model API would clarify how kernels are swapped without affecting the parallel stamping path.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We agree that the current manuscript would benefit from additional quantitative data on conflict rates, color-class statistics, benchmark netlists, error metrics, and overhead scaling to better anchor the performance claims. We will revise the manuscript to address these points directly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claim of up to 45x speedup is conditioned on low conflicts, yet the manuscript supplies no quantitative data on observed conflict rates, color-class sizes, or stamping overhead for any concrete benchmark circuits, leaving the practical scope of the result unanchored.
Authors: We agree that quantitative metrics on conflict rates, color-class sizes, and stamping overhead are necessary to substantiate the practical scope of the 45x claim. In the revised manuscript we will add a dedicated subsection (and accompanying table) in the Results section that reports, for each evaluated circuit, the number of colors, average and maximum color-class sizes, measured conflict rate, and the wall-clock overhead of the graph-coloring and parallel-stamping phases relative to the total simulation time. revision: yes
-
Referee: [Results] Results section: No benchmark netlists, error metrics, baseline comparisons, or scaling measurements of the graph-coloring overhead are reported, so the central performance claim cannot be verified beyond the single stated number.
Authors: We acknowledge that the original Results section presented only a single headline speedup figure and a high-level topology dependence statement without the supporting details requested. We will expand the section to (1) list the concrete benchmark netlists (including transistor counts and topology characteristics), (2) report error metrics (e.g., voltage and current differences versus a reference serial simulator), (3) include baseline comparisons against both a standard single-thread SPICE run and at least one other parallelization approach, and (4) add scaling plots and tables that isolate the graph-coloring overhead as a function of circuit size and core count. revision: yes
Circularity Check
No circularity in derivation or performance claims
full rationale
The paper describes an implemented modular simulator using graph coloring to partition device evaluations for parallel stamping. Its central result is an empirical wall-clock speedup (up to 45x on 64 cores under low-conflict conditions) obtained from direct measurements on the running code. No equations, fitted parameters, or first-principles derivations appear in the provided text; the speedup is not obtained by renaming a fitted quantity or by reducing to a self-citation chain. The topology dependence is explicitly stated as a measured observation rather than a self-referential definition, leaving the derivation chain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard assumptions of multi-core shared-memory systems with atomic or lock-free updates when color groups are independent
Reference graph
Works this paper leans on
-
[1]
Spice2: A computer program to simulate semiconductor circuits,
L. W. Nagel, “Spice2: A computer program to simulate semiconductor circuits,”College of Engineering, University of California, Berkeley, 1975
work page 1975
-
[2]
EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit,
C. Liu and D. Chitnis, “EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit,”IEEE Transactions on Circuits and Systems I: Regular Papers, pp. 1–10, 2025
work page 2025
-
[3]
Solutions for mixed-signal soc verification using real number models,
S. Balasubramanian and P. Hardee, “Solutions for mixed-signal soc verification using real number models,”Cadence Design Systems, pp. 1–4, 2013
work page 2013
-
[4]
The modified nodal approach to network analysis,
C.-W. Ho, A. Ruehli, and P. Brennan, “The modified nodal approach to network analysis,”IEEE Transactions on circuits and systems, vol. 22, no. 6, pp. 504–509, 1975
work page 1975
-
[5]
N. Kapre and A. DeHon, “Performance comparison of single-precision spice model-evaluation on fpga, gpu, cell, and multi-core processors,” in2009 International Conference on Field Programmable Logic and Applications, 2009, pp. 65–72
work page 2009
-
[6]
Massive parallelization of spice de- vice model evaluation on gpu-based simd architectures,
A. M. Bayoumi and Y . Y . Hanafy, “Massive parallelization of spice de- vice model evaluation on gpu-based simd architectures,” inProceedings of the 1st international forum on Next-generation multicore/manycore technologies, 2008, pp. 1–5
work page 2008
-
[7]
Tinyspice: A parallel spice simulator on gpu for massively repeated small circuit simulations,
L. Han, X. Zhao, and Z. Feng, “Tinyspice: A parallel spice simulator on gpu for massively repeated small circuit simulations,” inProceedings of the 50th Annual Design Automation Conference, 2013, pp. 1–8
work page 2013
-
[8]
Xyce: Open source simulation for large-scale circuits
J. Verley, E. R. Keiter, and H. K. Thornquist, “Xyce: Open source simulation for large-scale circuits.” Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2018
work page 2018
-
[9]
Ngspice user’s manual version 34 (ngspice release version),
H. V ogt, G. Atkinson, P. Nenzi, and D. Warning, “Ngspice user’s manual version 34 (ngspice release version),” 2021
work page 2021
-
[10]
A parallel graph coloring heuristic,
M. T. Jones and P. E. Plassmann, “A parallel graph coloring heuristic,” SIAM Journal on Scientific Computing, vol. 14, no. 3, pp. 654–669, 1993
work page 1993
-
[11]
Greedy algorithms: a review and open problems,
A. Garc ´ıa, “Greedy algorithms: a review and open problems,”Journal of Inequalities and Applications, vol. 2025, no. 1, p. 11, 2025
work page 2025
-
[12]
Estimation of sparse jacobian matrices and graph coloring problems,
T. F. Coleman and J. J. Mor ´e, “Estimation of sparse jacobian matrices and graph coloring problems,”SIAM journal on Numerical Analysis, vol. 20, no. 1, pp. 187–209, 1983
work page 1983
-
[13]
Graph coloring algorithms for multi-core and massively multithreaded architectures,
¨U. V . C ¸ ataly¨urek, J. Feo, A. H. Gebremedhin, M. Halappanavar, and A. Pothen, “Graph coloring algorithms for multi-core and massively multithreaded architectures,”Parallel Computing, vol. 38, no. 10-11, pp. 576–594, 2012
work page 2012
-
[14]
Parallel assembly of finite element matrices on multicore computers,
P. Krysl, “Parallel assembly of finite element matrices on multicore computers,”Computer Methods in Applied Mechanics and Engineering, vol. 428, p. 117076, 2024
work page 2024
-
[15]
Bsim-cmg: Standard finfet compact model for advanced circuit design,
J. P. Duarte, S. Khandelwal, A. Medury, C. Hu, P. Kushwaha, H. Agar- wal, A. Dasgupta, and Y . S. Chauhan, “Bsim-cmg: Standard finfet compact model for advanced circuit design,” inESSCIRC Conference 2015-41st European Solid-State Circuits Conference (ESSCIRC). IEEE, 2015, pp. 196–201
work page 2015
-
[16]
Low-power pass-transistor logic-based full adder and 8-bit multiplier,
N. Yin, W. Pan, Y . Yu, C. Tang, and Z. Yu, “Low-power pass-transistor logic-based full adder and 8-bit multiplier,”Electronics, vol. 12, no. 15, p. 3209, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.