Noisy Quantum Simulation Using Tracking, Uncomputation and Sampling
Pith reviewed 2026-05-21 23:03 UTC · model grok-4.3
The pith
TUSQ speeds up noisy quantum circuit simulation by tracking equivalent errors and reusing states via uncomputation in a tree.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TUSQ reduces redundant work in noisy simulation by combining an Error Characterization Module that identifies equivalent error configurations via Error Tallying and Error Commutation and then prunes with importance sampling, together with Depth-First Tree Traversal that builds a tree of the surviving circuits and traverses it using compute and uncompute steps to share intermediate states without extra memory, thereby preserving the statistical correctness of the final averaged noisy outcomes.
What carries the argument
The Error Characterization Module (ECM) that finds equivalent error configurations through tallying and commutation followed by importance sampling for pruning, together with Depth-First Tree Traversal (DFTT) that organizes circuits into a tree and reuses states via compute/uncompute operations.
If this is right
- Many noise realizations become redundant after error tallying and commutation, so far fewer full circuit simulations are required.
- Tree-structured traversal with uncomputation reuses partial state computations across similar noisy circuits without storing extra intermediate data.
- Importance sampling during pruning keeps the final averaged measurement statistics correct while cutting the total number of evaluations.
- The same speedups hold across 198 benchmarks when each uses one million shots and when both compute and memory are constrained.
Where Pith is reading between the lines
- The same tracking and uncomputation pattern could be adapted to other stochastic sampling tasks that exhibit repeated configurations.
- Faster noisy simulation feedback loops could let researchers iterate more quickly on error-mitigation techniques for mid-size circuits.
- Extending the tree traversal to include dynamic circuit rewriting might further reduce the number of unique error patterns that must be simulated.
Load-bearing premise
Grouping equivalent error configurations via tallying and commutation and then applying importance sampling during pruning leaves the statistical distribution of the averaged noisy simulation results unbiased.
What would settle it
Run TUSQ and a full brute-force sampler on a small test circuit with a known exact noisy expectation value and verify that the two sets of averaged results agree to within the expected statistical fluctuation for one million shots.
Figures
read the original abstract
Quantum computers have rapidly improved in scale and fidelity, yet access to large systems remains limited for most researchers. This makes accurate and scalable noisy quantum simulation essential. While density matrix simulation provides the most faithful representation of noisy quantum systems, its exponential memory overhead severely limits scalability. Consequently, noisy simulations are commonly performed by: (a) sampling multiple circuit instances with fixed noise realizations from stochastic noise channels, and (b) executing simulations of these sampled circuits and averaging the results. However, this introduces significant computational overhead due to the large number of circuit evaluations required. Existing approaches reduce this overhead by caching intermediate states for reuse, but such methods become impractical when simulations are both compute and memory constrained. To address this challenge, we propose TUSQ - Tracking, Uncomputation, and Sampling for Noisy Quantum Simulation. TUSQ consists of two components: the Error Characterization Module (ECM) and Depth-First Tree Traversal (DFTT). ECM reduces redundant simulation by identifying equivalent error configurations through Error Tallying and Error Commutation, followed by importance sampling during pruning to reduce the number of circuit instances requiring simulation. DFTT then exploits structural similarity across the remaining circuits by organizing them into a tree and traversing it using compute/uncompute operations to efficiently reuse intermediate computation without additional memory overhead. We evaluate TUSQ across 198 benchmarks with 1 million shots each. TUSQ achieves average(maximum) speedups of 59.06x(7878.03x) over Qiskit and 13.38x(439.38x) over CUDA-Q. Compared to TQSim in compute and memory constrained settings, TUSQ achieves average and maximum speedups of 39.32x and 3134.31x, respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents TUSQ for scalable noisy quantum simulation. It introduces the Error Characterization Module (ECM) that identifies equivalent error configurations via Error Tallying and Error Commutation and applies importance sampling during pruning to reduce the number of circuits simulated, paired with Depth-First Tree Traversal (DFTT) that structures remaining circuits as a tree and reuses intermediate states through compute/uncompute operations without extra memory. Evaluation on 198 benchmarks with 1 million shots each reports average (maximum) speedups of 59.06x (7878.03x) versus Qiskit, 13.38x (439.38x) versus CUDA-Q, and 39.32x (3134.31x) versus TQSim under compute/memory constraints.
Significance. If the pruning and sampling steps produce an unbiased estimator of the noisy expectation value, the approach would meaningfully extend the reach of stochastic noisy simulation to larger circuits under realistic resource limits. The combination of equivalence reduction with memory-free reuse via uncomputation is technically interesting, and the scale of the benchmark suite (198 instances) supplies a reasonable empirical foundation. The central performance claims rest on the correctness of the importance-sampling re-weighting, which is not yet demonstrated.
major comments (2)
- [ECM / abstract] ECM description (abstract and §3): the manuscript states that Error Tallying, Error Commutation, and importance sampling during pruning reduce the set of circuits while preserving statistical correctness, yet supplies neither the explicit re-weighting formula (e.g., scaling by class multiplicity or original probability mass) nor a proof that the resulting Monte Carlo average remains unbiased. This is load-bearing for every speedup claim.
- [§5] Benchmark section (§5): reported average and maximum speedups are given without error bars on timing, without variance across runs, and without quantitative verification that output distributions match a reference simulator within statistical error for the pruned ensemble. The absence of these checks leaves the quantitative claims difficult to interpret.
minor comments (2)
- [ECM] Notation for error classes and commutation rules would benefit from a small worked example early in the ECM section.
- [DFTT] Figure captions for the DFTT tree traversal could more explicitly label the compute and uncompute arrows.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment below and will incorporate clarifications to strengthen the statistical foundation and empirical validation of the work.
read point-by-point responses
-
Referee: [ECM / abstract] ECM description (abstract and §3): the manuscript states that Error Tallying, Error Commutation, and importance sampling during pruning reduce the set of circuits while preserving statistical correctness, yet supplies neither the explicit re-weighting formula (e.g., scaling by class multiplicity or original probability mass) nor a proof that the resulting Monte Carlo average remains unbiased. This is load-bearing for every speedup claim.
Authors: We thank the referee for identifying this critical omission. In the revised manuscript we will add to §3 the explicit re-weighting formula: for an equivalence class of multiplicity m with aggregate original probability mass P, we draw samples according to an importance distribution q and multiply each observed outcome by the factor (m · P / q). We will also include a short proof that the resulting Monte Carlo estimator is unbiased, relying on the linearity of expectation and the standard change-of-measure argument for importance sampling. These additions directly substantiate the correctness of all reported speedups. revision: yes
-
Referee: [§5] Benchmark section (§5): reported average and maximum speedups are given without error bars on timing, without variance across runs, and without quantitative verification that output distributions match a reference simulator within statistical error for the pruned ensemble. The absence of these checks leaves the quantitative claims difficult to interpret.
Authors: We agree that additional statistical reporting is necessary for interpretability. In the revision we will augment §5 with (i) error bars on all timing measurements obtained from repeated runs, (ii) the observed variance of speedups across the 198-benchmark suite, and (iii) a new verification subsection that compares noisy expectation values and output distributions produced by TUSQ against an unpruned reference simulator on a representative subset of circuits, confirming agreement within statistical error. revision: yes
Circularity Check
No significant circularity; performance claims are empirical
full rationale
The paper reports measured speedups of TUSQ over Qiskit, CUDA-Q, and TQSim across 198 concrete benchmarks with 1 million shots each. These results are obtained by direct timing of the implemented ECM (Error Tallying, Error Commutation, importance sampling) and DFTT (tree traversal with compute/uncompute) components rather than by deriving them from fitted parameters, self-referential definitions, or a self-citation chain. The statistical-correctness assumption for the pruned estimator is stated as a premise but is not used to compute the reported speedups; the speedups themselves are externally verifiable by re-running the same benchmark suite. No load-bearing step reduces to its own input by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
“Hamming weight,” https://en.wikipedia.org/wiki/Hamming weight, ac- cessed: 2025-03-27
work page 2025
-
[2]
“Ibm quantum experience,” https://quantum-computing.ibm.com, ac- cessed: 2025-4-9
work page 2025
-
[3]
Qiskit statevector simulator - gpu,
“Qiskit statevector simulator - gpu,” https://qiskit.github.io/qiskit-aer/ stubs/qiskit aer.StatevectorSimulator.html
-
[4]
“Quantinuum cloud access,” https://www.quantinuum.com/products- solutions/quantinuum-systems, accessed: 2025-4-9
work page 2025
- [5]
-
[6]
cuquantum sdk: A high-performance library for accelerating quantum science,
H. Bayraktar, A. Charara, D. Clark, S. Cohen, T. Costa, Y .-L. L. Fang, Y . Gao, J. Guan, J. Gunnels, A. Haidar et al. , “cuquantum sdk: A high-performance library for accelerating quantum science,” in 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 1. IEEE, 2023, pp. 1050–1061
work page 2023
-
[7]
Low-rank density-matrix evolution for noisy quantum circuits,
Y .-T. Chen, C. Farquhar, and R. M. Parrish, “Low-rank density-matrix evolution for noisy quantum circuits,” npj Quantum Information, vol. 7, no. 1, p. 61, 2021
work page 2021
-
[8]
A new quantum ripple-carry addition circuit
S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. P. Moulton, “A new quantum ripple-carry addition circuit,”arXiv preprint quant-ph/0410184, 2004
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[9]
Wave-function approach to dissipative processes in quantum optics,
J. Dalibard, Y . Castin, and K. Mølmer, “Wave-function approach to dissipative processes in quantum optics,”Physical review letters, vol. 68, no. 5, p. 580, 1992
work page 1992
-
[10]
Afs: Accurate, fast, and scalable error- decoding for fault-tolerant quantum computers,
P. Das, C. A. Pattison, S. Manne, D. M. Carmean, K. M. Svore, M. Qureshi, and N. Delfosse, “Afs: Accurate, fast, and scalable error- decoding for fault-tolerant quantum computers,” in 2022 IEEE Interna- tional Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2022, pp. 259–273
work page 2022
-
[11]
A Quantum Approximate Optimization Algorithm,
E. Farhi, J. Goldstone, and S. Gutmann, “A Quantum Approximate Optimization Algorithm,” Nov. 2014
work page 2014
-
[12]
Effects of noise on the overparametrization of quantum neural networks,
D. Garc ´ıa-Mart´ın, M. Larocca, and M. Cerezo, “Effects of noise on the overparametrization of quantum neural networks,” Physical Review Research, vol. 6, no. 1, p. 013295, 2024
work page 2024
-
[13]
Surface code with deco- herence: An analysis of three superconducting architectures,
J. Ghosh, A. G. Fowler, and M. R. Geller, “Surface code with deco- herence: An analysis of three superconducting architectures,” Physical Review A—Atomic, Molecular, and Optical Physics , vol. 86, no. 6, p. 062318, 2012
work page 2012
-
[14]
Stim: a fast stabilizer circuit simulator,
C. Gidney, “Stim: a fast stabilizer circuit simulator,” Quantum, vol. 5, p. 497, Jul. 2021. [Online]. Available: https://doi.org/10.22331/q-2021- 07-06-497
-
[15]
J. Gince, “Fermionic machine learning,” 2023. [Online]. Available: https://github.com/MatchCake/MatchCake
work page 2023
-
[16]
J. Gince, J.-M. Pag ´e, M. Armenta, A. Sarkar, and S. Kourtis, “Fermionic machine learning,” 2024
work page 2024
-
[17]
A fast quantum mechanical algorithm for database search,
L. K. Grover, “A fast quantum mechanical algorithm for database search,” in Proceedings of the twenty-eighth annual ACM symposium on Theory of computing , 1996, pp. 212–219
work page 1996
-
[18]
Nisq+: Boosting quantum computing power by approximating quantum error correction,
A. Holmes, M. R. Jokar, G. Pasandi, Y . Ding, M. Pedram, and F. T. Chong, “Nisq+: Boosting quantum computing power by approximating quantum error correction,” in 2020 ACM/IEEE 47th annual international symposium on computer architecture (ISCA). IEEE, 2020, pp. 556–569
work page 2020
-
[19]
Logical abstractions for noisy variational quantum algorithm simulation,
Y . Huang, S. Holtzen, T. Millstein, G. Van den Broeck, and M. Martonosi, “Logical abstractions for noisy variational quantum algorithm simulation,” in Proceedings of the 26th ACM international conference on architectural support for programming languages and operating systems, 2021, pp. 456–472
work page 2021
-
[20]
Bqsim: Gpu-accelerated batch quantum circuit simulation using decision di- agram,
S. Jiang, Y .-H. Chung, C.-C. Chang, T.-Y . Ho, and T.-W. Huang, “Bqsim: Gpu-accelerated batch quantum circuit simulation using decision di- agram,” in Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2025, pp. 79–94
work page 2025
-
[21]
Flatdd: A high-performance quantum circuit simulator using decision diagram and flat array,
S. Jiang, R. Fu, L. Burgholzer, R. Wille, T.-Y . Ho, and T.-W. Huang, “Flatdd: A high-performance quantum circuit simulator using decision diagram and flat array,” in Proceedings of the 53rd International Conference on Parallel Processing, 2024, pp. 388–399
work page 2024
-
[22]
Quest and high per- formance simulation of quantum computers,
T. Jones, A. Brown, I. Bush, and S. C. Benjamin, “Quest and high per- formance simulation of quantum computers,” Scientific reports, vol. 9, no. 1, p. 10736, 2019
work page 2019
-
[23]
Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets,
A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta, “Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets,” Nature, vol. 549, no. 7671, pp. 242–246, 2017. 12
work page 2017
-
[24]
Checkpointing and rollback-recovery for dis- tributed systems,
R. Koo and S. Toueg, “Checkpointing and rollback-recovery for dis- tributed systems,” IEEE Transactions on software Engineering , no. 1, pp. 23–31, 1987
work page 1987
-
[25]
Sv-sim: scalable pgas-based state vector simulation of quantum circuits,
A. Li, B. Fang, C. Granade, G. Prawiroatmodjo, B. Heim, M. Roet- teler, and S. Krishnamoorthy, “Sv-sim: scalable pgas-based state vector simulation of quantum circuits,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021, pp. 1–14
work page 2021
-
[26]
Density matrix quantum circuit simulation via the bsp machine on modern gpu clusters,
A. Li, O. Subasi, X. Yang, and S. Krishnamoorthy, “Density matrix quantum circuit simulation via the bsp machine on modern gpu clusters,” in Sc20: international conference for high performance computing, networking, storage and analysis . IEEE, 2020, pp. 1–15
work page 2020
-
[27]
Eliminating redundant computation in noisy quantum computing simulation,
G. Li, Y . Ding, and Y . Xie, “Eliminating redundant computation in noisy quantum computing simulation,” in 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020, pp. 1–6
work page 2020
-
[28]
Efficient variational quantum simulator incorporating active error minimization,
Y . Li and S. C. Benjamin, “Efficient variational quantum simulator incorporating active error minimization,” Phys. Rev. X , vol. 7, p. 021050, Jun 2017. [Online]. Available: https://link.aps.org/doi/10.1103/ PhysRevX.7.021050
work page 2017
-
[29]
Codesign of quantum error-correcting codes and modular chiplets in the presence of defects,
S. F. Lin, J. Viszlai, K. N. Smith, G. S. Ravi, C. Yuan, F. T. Chong, and B. J. Brown, “Codesign of quantum error-correcting codes and modular chiplets in the presence of defects,” in Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 , 2024, pp. 216–231
work page 2024
-
[30]
Fast simulation of high-depth qaoa circuits,
D. Lykov, R. Shaydulin, Y . Sun, Y . Alexeev, and M. Pistoia, “Fast simulation of high-depth qaoa circuits,” in Proceedings of the SC’23 Workshops of The International Conference on High Performance Com- puting, Network, Storage, and Analysis , 2023, pp. 1443–1451
work page 2023
-
[31]
Monte carlo wavefunctions in quantum optics,
K. Mølmer and Y . Castin, “Monte carlo wavefunctions in quantum optics,” Quantum and Semiclassical Optics: Journal of the European Optical Society Part B , vol. 8, no. 1, p. 49, 1996
work page 1996
-
[32]
Quantum computation and quantum information,
M. A. Nielsen and I. L. Chuang, “Quantum computation and quantum information,” Phys. Today, vol. 54, no. 2, p. 60, 2001
work page 2001
-
[33]
T. L. Patti, T. Nguyen, J. G. Lietz, A. J. McCaskey, and B. Khailany, “Augmenting simulated noisy quantum data collection by orders of magnitude using pre-trajectory sampling with batched execution,” arXiv preprint arXiv:2504.16297, 2025
-
[34]
A variational eigenvalue solver on a photonic quantum processor,
A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien, “A variational eigenvalue solver on a photonic quantum processor,”Nature communications, vol. 5, no. 1, p. 4213, 2014
work page 2014
-
[35]
The quantum-jump approach to dissipative dynamics in quantum optics,
M. B. Plenio and P. L. Knight, “The quantum-jump approach to dissipative dynamics in quantum optics,” Reviews of Modern Physics , vol. 70, no. 1, p. 101, 1998
work page 1998
-
[36]
Quantum computing in the cloud: Analyzing job and machine characteristics,
G. S. Ravi, K. N. Smith, P. Gokhale, and F. T. Chong, “Quantum computing in the cloud: Analyzing job and machine characteristics,” in 2021 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2021, pp. 39–50
work page 2021
-
[37]
P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,” SIAM Journal on Computing, vol. 26, no. 5, p. 1484–1509, Oct 1997. [Online]. Available: http://dx.doi.org/10.1137/S0097539795293172
-
[38]
qHiPSTER: The Quantum High Performance Software Testing Environment
M. Smelyanskiy, N. P. Sawaya, and A. Aspuru-Guzik, “qhipster: The quantum high performance software testing environment,”arXiv preprint arXiv:1601.07195, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[39]
Supermarq: A scalable quantum benchmark suite,
T. Tomesh, P. Gokhale, V . Omole, G. S. Ravi, K. N. Smith, J. Viszlai, X.- C. Wu, N. Hardavellas, M. R. Martonosi, and F. T. Chong, “Supermarq: A scalable quantum benchmark suite,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2022
work page 2022
-
[40]
Enabling scalable vqe simulation on leading hpc systems,
M. Wang, F. Hua, C. Liu, N. Bauman, K. Kowalski, D. Claudino, T. Humble, P. Nair, and A. Li, “Enabling scalable vqe simulation on leading hpc systems,” in Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis , 2023, pp. 1460–1467
work page 2023
-
[41]
Tqsim: a case for reuse-focused tree-based quantum circuit simulation,
M. Wang, R. Huang, S. Tannu, and P. Nair, “Tqsim: a case for reuse-focused tree-based quantum circuit simulation,” arXiv preprint arXiv:2203.13892, 2022
-
[42]
Accelerating simulation of quantum circuits under noise via computational reuse,
M. Wang, S. Tannu, and P. J. Nair, “Accelerating simulation of quantum circuits under noise via computational reuse,” inProceedings of the 52nd Annual International Symposium on Computer Architecture , 2025, pp. 1539–1553
work page 2025
-
[43]
Noise-induced barren plateaus in variational quantum algorithms,
S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, L. Cincio, and P. J. Coles, “Noise-induced barren plateaus in variational quantum algorithms,” Nature communications, vol. 12, no. 1, p. 6961, 2021
work page 2021
-
[44]
Special session: Noise characterization and error mitigation in near-term quantum computers,
C. J. Wood, “Special session: Noise characterization and error mitigation in near-term quantum computers,” in 2020 IEEE 38th International Conference on Computer Design (ICCD) . IEEE, 2020, pp. 13–16
work page 2020
-
[45]
Full-state quantum circuit simulation by using data compression,
X.-C. Wu, S. Di, E. M. Dasgupta, F. Cappello, H. Finkel, Y . Alexeev, and F. T. Chong, “Full-state quantum circuit simulation by using data compression,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , 2019, pp. 1–24
work page 2019
-
[46]
Hyquas: hybrid partitioner based quantum circuit simulation system on gpu,
C. Zhang, Z. Song, H. Wang, K. Rong, and J. Zhai, “Hyquas: hybrid partitioner based quantum circuit simulation system on gpu,” in Pro- ceedings of the 35th ACM International Conference on Supercomputing , 2021, pp. 443–454
work page 2021
-
[47]
Q-gpu: A recipe of optimizations for quantum circuit simulation using gpus,
Y . Zhao, Y . Guo, Y . Yao, A. Dumi, D. M. Mulvey, S. Upadhyay, Y . Zhang, K. D. Jordan, J. Yang, and X. Tang, “Q-gpu: A recipe of optimizations for quantum circuit simulation using gpus,” in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2022, pp. 726–740
work page 2022
-
[48]
Advanced simulation of quantum compu- tations,
A. Zulehner and R. Wille, “Advanced simulation of quantum compu- tations,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 38, no. 5, pp. 848–859, 2018. 13
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.