Recognition: unknown
Orkan: Cache-friendly simulation of quantum operations on hermitian operators
Pith reviewed 2026-05-10 08:58 UTC · model grok-4.3
The pith
Orkan simulates quantum operations on Hermitian operators using a cache-friendly tiled lower-triangle layout, halving memory and achieving 2-4x speedups over Qiskit Aer, QuEST, and Qulacs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
I introduce Orkan, a simulation library that uses a tiled memory layout storing only the lower triangle of the hermitian matrix at tile granularity, roughly halving both the memory footprint and the wall time to simulate the evolution of quantum states under generic quantum operations.
Load-bearing premise
That the overhead of managing the tiled layout and the k-local update algorithms remains negligible compared with the memory savings across the full range of n-qubit systems and operation types.
Figures
read the original abstract
Classical simulation of quantum operations is essential for algorithm design, noise characterisation, and benchmarking of quantum hardware. The most general physically realisable operation can be described by a positive linear map acting on a hermitian operator, representing either a density matrix or an observable. Established simulators vectorise the density matrix on an $n$-qubit Hilbert space and reuse state-vector kernels, storing all $2^{2n}$ elements and forgoing the benefits of hermitian symmetry. In this work, I introduce \emph{Orkan}, a simulation library that uses a tiled memory layout storing only the lower triangle of the hermitian matrix at tile granularity, roughly halving both the memory footprint and the wall time to simulate the evolution of quantum states under generic quantum operations. The implementation treats any hermitian operator uniformly and is agnostic to whether the Schr\"{o}dinger or Heisenberg picture is used. Dedicated $k$-local conjugation algorithms update all entries of the hermitian matrix in a single pass. Benchmarks against Qiskit Aer, QuEST, and Qulacs show consistent wall-clock speedups of $2$-$4{\times}$ partly attributable to the reduced memory footprint.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Orkan, a simulation library for quantum operations acting on Hermitian operators (density matrices or observables). It employs a tiled memory layout that stores only the lower triangle of the Hermitian matrix at tile granularity, claiming this roughly halves both memory footprint and wall-clock time for evolution under generic quantum operations. The library is agnostic to Schrödinger or Heisenberg picture and uses dedicated k-local conjugation algorithms to update all matrix entries in a single pass. Benchmarks against Qiskit Aer, QuEST, and Qulacs report consistent 2–4× speedups.
Significance. If the performance claims and generality hold, the tiled lower-triangle approach offers a practical advance in memory-efficient classical simulation of quantum systems by exploiting Hermitian symmetry without vectorization. This could benefit algorithm design, noise characterization, and hardware benchmarking. The single-pass k-local update strategy and uniform treatment of operators are notable strengths, as is the provision of reproducible benchmarks against established simulators.
major comments (2)
- [Abstract] Abstract: The central claim states that Orkan supports 'generic quantum operations' via 'positive linear maps' on Hermitian operators while storing only the lower triangle at tile granularity and using k-local conjugation algorithms that 'update all entries of the hermitian matrix in a single pass.' General positive maps (including arbitrary CPTP maps) cannot always be reduced to k-local conjugations; they typically require vectorization or full superoperator action whose cost is not shown to be compatible with the tiled layout. This is load-bearing for the 'roughly halving ... wall time' claim for generic operations.
- [Benchmarks] Benchmarks (throughout): The reported 2–4× speedups and 'roughly halving' memory/time claims are asserted without error bars, detailed methodology, data-exclusion rules, specific n-qubit sizes, or operation types tested. This prevents independent verification of whether the tiled layout overhead remains negligible across the claimed range of systems and maps.
minor comments (2)
- [Abstract] Abstract: The LaTeX markup (e.g., 'Schrö{dinger}') should be rendered consistently in the final version.
- [Introduction] The manuscript would benefit from an explicit statement of the supported class of maps (e.g., whether only unitary conjugations or a broader set of k-local channels are implemented) to avoid ambiguity with the 'generic' phrasing.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below and indicate the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim states that Orkan supports 'generic quantum operations' via 'positive linear maps' on Hermitian operators while storing only the lower triangle at tile granularity and using k-local conjugation algorithms that 'update all entries of the hermitian matrix in a single pass.' General positive maps (including arbitrary CPTP maps) cannot always be reduced to k-local conjugations; they typically require vectorization or full superoperator action whose cost is not shown to be compatible with the tiled layout. This is load-bearing for the 'roughly halving ... wall time' claim for generic operations.
Authors: We appreciate the referee pointing out this scope clarification. The Orkan implementation and all reported benchmarks are restricted to operations that admit a k-local conjugation representation (unitary evolution under k-local Hamiltonians and certain local noise models), for which the single-pass tiled update is valid and yields the claimed memory and time reductions. While the underlying tiled lower-triangle storage can in principle be used with general positive maps via superoperator action, the manuscript does not claim or benchmark single-pass performance for arbitrary CPTP maps. We will revise the abstract, introduction, and methods sections to explicitly delimit the supported operation class and note that general maps may incur additional overhead not covered by the current single-pass algorithms. revision: yes
-
Referee: [Benchmarks] Benchmarks (throughout): The reported 2–4× speedups and 'roughly halving' memory/time claims are asserted without error bars, detailed methodology, data-exclusion rules, specific n-qubit sizes, or operation types tested. This prevents independent verification of whether the tiled layout overhead remains negligible across the claimed range of systems and maps.
Authors: We fully agree that the current benchmark presentation lacks the necessary detail for independent verification. In the revised manuscript we will expand the benchmarks section to report: (i) error bars from at least ten independent runs per data point, (ii) complete methodology including hardware specifications, compiler flags, timing methodology, and warm-up procedures, (iii) explicit data-exclusion rules, (iv) the precise qubit counts tested (4–12 qubits), and (v) the exact operation families benchmarked (random k-local unitaries, Pauli-string conjugations, and local depolarizing channels). We will also release the benchmark scripts and raw timing data as supplementary material. revision: yes
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Hermitian matrices satisfy M = M†, so only the lower triangle needs to be stored.
invented entities (1)
-
Tiled lower-triangle memory layout
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Exhaustive and feasible parametrisation with applications to the travelling salesperson problem
Exhaustively parametrised feasibility-respecting quantum circuits can reach every feasible solution to problems like TSP with certainty using fixed parameters by leveraging group actions and generating sequences.
Reference graph
Works this paper leans on
-
[1]
From barren plateaus through fertile valleys: Conic exten- sions of parameterised quantum circuits,
L. Binkowski, G. Koßmann, T. J. Osborne, R. Schwonnek, and T. Ziegler, “From barren plateaus through fertile valleys: Conic exten- sions of parameterised quantum circuits,” in2025 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 01, 2025, pp. 111–118
2025
-
[2]
One for all: Universal quantum conic programming framework for hard-constrained combinatorial optimization problems,
L. Binkowski, T. J. Osborne, M. Schwiering, R. Schwonnek, and T. Ziegler, “One for all: Universal quantum conic programming framework for hard-constrained combinatorial optimization problems,”
-
[3]
Available: https://arxiv.org/abs/2411.00435
[Online]. Available: https://arxiv.org/abs/2411.00435
-
[4]
A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, “Surface codes: Towards practical large-scale quantum computation,” Phys. Rev. A, vol. 86, p. 032324, Sep 2012. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.86.032324
-
[5]
Wave-function approach to dissipative processes in quantum optics,
J. Dalibard, Y . Castin, and K. Mølmer, “Wave-function approach to dissipative processes in quantum optics,”Phys. Rev. Lett., vol. 68, pp. 580–583, Feb 1992. [Online]. Available: https://link.aps.org/doi/10. 1103/PhysRevLett.68.580
1992
-
[6]
Monte carlo wave-function method in quantum optics,
K. Mølmer, Y . Castin, and J. Dalibard, “Monte carlo wave-function method in quantum optics,”J. Opt. Soc. Am. B, vol. 10, no. 3, pp. 524–538, Mar 1993. [Online]. Available: https://opg.optica.org/josab/ abstract.cfm?URI=josab-10-3-524
1993
-
[7]
The heisenberg representation of quantum computers,
D. Gottesman, “The heisenberg representation of quantum computers,”
-
[8]
The Heisenberg Representation of Quantum Computers
[Online]. Available: https://arxiv.org/abs/quant-ph/9807006
work page internal anchor Pith review arXiv
-
[9]
Improved simulation of stabilizer circuits.Physical Review A, 70(5), 2004
S. Aaronson and D. Gottesman, “Improved simulation of stabilizer circuits,”Phys. Rev. A, vol. 70, p. 052328, Nov 2004. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.70.052328
-
[10]
Improving gate-level simulation of quantum circuits,
G. F. Viamontes, I. L. Markov, and J. P. Hayes, “Improving gate-level simulation of quantum circuits,” 2003. [Online]. Available: https://arxiv.org/abs/quant-ph/0309060
-
[11]
Advanced simulation of quantum computations,
A. Zulehner and R. Wille, “Advanced simulation of quantum computations,” 2018. [Online]. Available: https://arxiv.org/abs/1707. 00865
2018
-
[12]
Efficient classical simulation of slightly entangled quantum computations,
G. Vidal, “Efficient classical simulation of slightly entangled quantum computations,”Phys. Rev. Lett., vol. 91, no. 14, p. 147902, 2003
2003
-
[13]
Tensor network circuit simulation at exascale,
J. Brennan, M. Allalen, H.-J. Bungartz, C. Denniston, M. Galgon, M. Ganahl, A. Hawkins, D. Lykov, R. Orus, and Y . Alexeev, “Tensor network circuit simulation at exascale,” inProc. IEEE/ACM Second Int. Workshop on Quantum Computing Software (QCS), 2021, pp. 26–31
2021
-
[14]
Simulating quantum computa- tions on classical machines: A survey,
K. Young, M. Scese, and A. Ebnenasir, “Simulating quantum computations on classical machines: A survey,” 2023. [Online]. Available: https://arxiv.org/abs/2311.16505
-
[15]
QuEST and high performance simulation of quantum computers,
T. Jones, A. Brown, I. Bush, and S. C. Benjamin, “QuEST and high performance simulation of quantum computers,”Scientific Reports, vol. 9, no. 1, p. 10736, 2019
2019
-
[16]
Y . Suzuki, Y . Kawase, Y . Masumura, Y . Hiraga, M. Nakadai, J. Chen, K. M. Nakanishi, K. Mitarai, R. Imai, S. Tamiya, T. Yamamoto, T. Yan, T. Kawakubo, Y . O. Nakagawa, Y . Ibe, Y . Zhang, H. Yamashita, H. Yoshimura, A. Hayashi, and K. Fujii, “Qulacs: a fast and versatile quantum circuit simulator for research purpose,”Quantum, vol. 5, p. 559, October 2...
-
[17]
Watrous,The Theory of Quantum Information
J. Watrous,The Theory of Quantum Information. Cambridge University Press, 2018
2018
-
[18]
A parallel quantum computer simulator,
K. M. Obenland and A. M. Despain, “A parallel quantum computer simulator,” 1998. [Online]. Available: https://arxiv.org/abs/quant-ph/ 9804039
1998
-
[19]
Massively parallel quantum computer simulator,
K. De Raedt, K. Michielsen, H. De Raedt, B. Trieu, G. Arnold, M. Richter, T. Lippert, H. Watanabe, and N. Ito, “Massively parallel quantum computer simulator,”Computer Physics Communications, vol. 176, no. 2, p. 121–136, jan 2007. [Online]. Available: http://dx.doi.org/10.1016/j.cpc.2006.08.007
-
[20]
mpiqulacs: A distributed quantum computer simulator for a64fx-based cluster systems,
S. Imamura, M. Yamazaki, T. Honda, A. Kasagi, A. Tabuchi, H. Nakao, N. Fukumoto, and K. Nakashima, “mpiqulacs: A distributed quantum computer simulator for a64fx-based cluster systems,” 2022. [Online]. Available: https://arxiv.org/abs/2203.16044
-
[21]
Nemirovsky and D
M. Nemirovsky and D. M. Tullsen,Multithreading Architecture, ser. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, 2013. [Online]. Available: https://doi.org/10.2200/ S00458ED1V01Y201212CAC021
2013
-
[22]
Understanding caching,
J. Bottomly, “Understanding caching,”LINUX journal, jan 2004. [Online]. Available: https://www.linuxjournal.com/article/7105
2004
-
[23]
Understanding the Impact of Memory Access Patterns in Intel Processors ,
M. A. H. Monil, S. Lee, J. S. Vetter, and A. D. Malony, “ Understanding the Impact of Memory Access Patterns in Intel Processors ,” in2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). Los Alamitos, CA, USA: IEEE Computer Society, Nov 2020, pp. 52–61. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/MCHPC51950.2020.00012
-
[24]
False sharing and its effect on shared memory performance,
W. J. Bolosky and M. L. Scott, “False sharing and its effect on shared memory performance,” inUSENIX Experiences with Distributed and Multiprocessor Systems (SEDMS IV). San Diego, CA: USENIX Association, Sep
-
[25]
Available: https://www.usenix.org/conference/sedms-iv/ false-sharing-and-its-effect-shared-memory-performance
[Online]. Available: https://www.usenix.org/conference/sedms-iv/ false-sharing-and-its-effect-shared-memory-performance
-
[26]
Exploiting superword level parallelism with multimedia instruction sets,
S. Larsen and S. Amarasinghe, “Exploiting superword level parallelism with multimedia instruction sets,” inProceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). New York, NY , USA: Association for Computing Machinery,
-
[27]
Available: https://doi.org/10.1145/349299.349320
[Online]. Available: https://doi.org/10.1145/349299.349320
-
[28]
Superword-level parallelism in the presence of control flow,
J. Shin, M. Hall, and J. Chame, “Superword-level parallelism in the presence of control flow,” inInternational Symposium on Code Generation and Optimization (CGO). USA: IEEE Computer Society,
-
[29]
Available: https://doi.org/10.1109/CGO.2005.33
[Online]. Available: https://doi.org/10.1109/CGO.2005.33
-
[30]
Density matrix quantum circuit simulation via the bsp machine on modern gpu clusters,
A. Li, O. Subasi, X. Yang, and S. Krishnamoorthy, “Density matrix quantum circuit simulation via the bsp machine on modern gpu clusters,” inSC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020, pp. 1–15
2020
-
[31]
Tiling, block data layout, and memory hierarchy performance,
N. Park, B. Hong, and V . K. Prasanna, “Tiling, block data layout, and memory hierarchy performance,”IEEE Trans. Parallel Distrib. Syst., vol. 14, no. 7, p. 640–654, jul 2003. [Online]. Available: https://doi.org/10.1109/TPDS.2003.1214317
-
[32]
Algorithm 865: Fortran 95 subroutines for cholesky factorization in block hybrid format,
F. G. Gustavson, J. K. Reid, and J. Wa ´sniewski, “Algorithm 865: Fortran 95 subroutines for cholesky factorization in block hybrid format,”ACM Trans. Math. Softw., vol. 33, no. 1, p. 8–es, mar 2007. [Online]. Available: https://doi.org/10.1145/1206040.1206048
-
[33]
Slate: design of a modern distributed and accelerated linear algebra library,
M. Gates, J. Kurzak, A. Charara, A. YarKhan, and J. Dongarra, “Slate: design of a modern distributed and accelerated linear algebra library,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’19. New York, NY , USA: Association for Computing Machinery, 2019. [Online]. Available: https:...
-
[34]
A. Javadi-Abhari, M. Treinish, K. Krsulich, C. J. Wood, J. Lishman, J. Gacon, S. Martiel, P. D. Nation, L. S. Bishop, A. W. Cross, B. R. Johnson, and J. M. Gambetta, “Quantum computing with qiskit,” 2024. [Online]. Available: https://arxiv.org/abs/2405.08810
work page internal anchor Pith review arXiv 2024
-
[35]
Qiskit aer: High-performance quantum computing simulators,
Qiskit Contributors, “Qiskit aer: High-performance quantum computing simulators,” 2024, software, version 0.17.2. [Online]. Available: https://github.com/Qiskit/qiskit-aer
2024
-
[36]
QuEST: The quantum exact simulation toolkit,
T. Jones, K. Sherbert, S. Brown, and QuEST-Kit Contributors, “QuEST: The quantum exact simulation toolkit,” 2024, software, version 4.2.0. [Online]. Available: https://github.com/QuEST-Kit/QuEST
2024
-
[37]
Qulacs: Variational quantum circuit simulator,
Y . Suzuki, Y . Kawase, Y . Masumura, and Qulacs Contributors, “Qulacs: Variational quantum circuit simulator,” 2021, software, version 0.6.12. [Online]. Available: https://github.com/qulacs/qulacs
2021
-
[38]
Openmp: an industry standard api for shared- memory programming,
L. Dagum and R. Menon, “Openmp: an industry standard api for shared- memory programming,”IEEE Computational Science and Engineering, vol. 5, no. 1, pp. 46–55, 1998. APPENDIX A. Cross-tile subroutines This appendix details the three subroutines invoked by Al- gorithm 4. Unlike the two-pass approach( U⊗I) (I⊗U) vec(ρ) used by established simulators (Section...
1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.