pith. sign in

arxiv: 2511.03359 · v3 · pith:WVXYEL47new · submitted 2025-11-05 · 🪐 quant-ph · cs.DC· physics.comp-ph

Universal Quantum Computer Simulation of 50 Qubits on Europe`s First Exascale Supercomputer Harnessing Its Heterogeneous CPU-GPU Architecture

Pith reviewed 2026-05-21 20:22 UTC · model grok-4.3

classification 🪐 quant-ph cs.DCphysics.comp-ph
keywords quantum simulationexascale computingGPUqubitshigh-performance computinguniversal quantum computermemory optimization
0
0 comments X

The pith

JUQCS-50 simulates 50-qubit universal quantum computers on the JUPITER exascale supercomputer for the first time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an updated Jülich universal quantum computer simulator called JUQCS-50 that runs on the heterogeneous CPU-GPU architecture of the JUPITER supercomputer. It reaches 50 qubits by extending memory access through high-bandwidth CPU-GPU links and LPDDR5 memory, applying adaptive data encoding to shrink the memory footprint, and adding an on-the-fly network traffic optimizer. These changes deliver a 16.6-fold speedup compared with the prior 48-qubit record on the K computer. A reader would care because full simulation of 50-qubit circuits brings the behavior of near-term quantum hardware into reach without requiring physical devices.

Core claim

The authors have developed JUQCS-50, which through extending usable memory beyond GPU limits via high-bandwidth CPU-GPU interconnects and LPDDR5 memory, adaptive data encoding to reduce memory footprint with acceptable trade-offs in precision and compute effort, and an on-the-fly network traffic optimizer, enables simulations of a 50-qubit universal quantum computer for the first time and produces a 16.6-fold speedup over the previous 48-qubit record on the K computer.

What carries the argument

The JUQCS-50 simulator, which extends memory via CPU-GPU interconnects, applies adaptive data encoding to cut memory use, and optimizes network traffic on the fly.

If this is right

  • Larger quantum circuits can now be simulated directly on exascale machines without fitting entirely in GPU memory.
  • Researchers gain the ability to test 50-qubit algorithms and error-correction schemes on classical hardware before physical devices reach that scale.
  • The same memory-extension and encoding techniques can be applied to other CPU-GPU supercomputers for quantum simulations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method opens a path to routine simulation of noisy 50-qubit devices, allowing direct comparison with experimental data from current quantum processors.
  • If the encoding scheme generalizes, similar gains may appear in classical simulations of other high-dimensional quantum systems such as many-body spin chains.
  • Performance data from JUPITER could guide hardware designers on the minimum interconnect bandwidth needed for future quantum simulators.

Load-bearing premise

The adaptive data encoding maintains sufficient numerical precision and produces acceptable compute trade-offs for the quantum simulations that the authors consider scientifically relevant.

What would settle it

Compare the output of JUQCS-50 on a 20-qubit circuit whose exact results are already known from smaller-scale runs or analytic formulas; any systematic deviation beyond expected floating-point error would indicate that the precision trade-off has failed.

Figures

Figures reproduced from arXiv: 2511.03359 by Andreas Herten, Hans De Raedt, Jiri Kraus, Kristel Michielsen, Markus Hrywniak, Mathis Bode, Thomas Lippert, Vrinda Mehta.

Figure 1
Figure 1. Figure 1: Overview of JUPITER’s quad GH200 node design with included technology and bandwidths. Each node contains four GH200 superchips, [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Total and compute elapsed times per gate operation for [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Graphical representation of the benchmark circuit [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Total elapsed and compute times per gate operation for the [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Measured communication bandwidth (in GiB/s) as a func [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Structure of quantum circuit to add three [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Realization of the quantum adder circuit shown in Fig. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

We have developed a new version of the high-performance J\"ulich universal quantum computer simulator (JUQCS-50) that leverages key features of the GH200 superchips as used in the JUPITER supercomputer, enabling simulations of a 50-qubit universal quantum computer for the first time. JUQCS-50 achieves this through three key innovations: (1) extending usable memory beyond GPU limits via high-bandwidth CPU-GPU interconnects and LPDDR5 memory; (2) adaptive data encoding to reduce memory footprint with acceptable trade-offs in precision and compute effort; and (3) an on-the-fly network traffic optimizer. These advances result in a 16.6-fold speedup over the previous 48-qubit record on the K computer

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents JUQCS-50, an updated Jülich universal quantum computer simulator that achieves the first simulation of a 50-qubit universal quantum computer on the JUPITER exascale supercomputer. It reports three innovations—extending usable memory via high-bandwidth CPU-GPU interconnects and LPDDR5, adaptive data encoding to shrink the state-vector footprint, and an on-the-fly network traffic optimizer—yielding a 16.6-fold speedup over the prior 48-qubit record on the K computer.

Significance. If the adaptive encoding preserves sufficient fidelity for arbitrary circuits, the result would mark a clear advance in the scale of universal quantum simulation on heterogeneous CPU-GPU systems, extending the practical reach of full-state-vector methods beyond current GPU-memory limits.

major comments (2)
  1. Abstract and Methods: the central claim of a verified 50-qubit universal simulation rests on the adaptive data encoding, yet no quantitative bounds on precision loss, no fidelity-versus-depth curves against a double-precision reference, and no statement of the maximum validated circuit depth or gate set are provided; without these the 50-qubit result cannot be assessed for scientific usability.
  2. Results section (performance comparison): the 16.6-fold speedup is reported against the external K-computer record, but no internal baseline (e.g., full double-precision run on the same JUPITER hardware or a non-adaptive variant) is shown, leaving open whether the speedup is attributable to the new encoding or to architectural differences.
minor comments (2)
  1. Title: the apostrophe in “Europe`s” is typographically incorrect and should be “Europe’s”.
  2. Abstract: the phrase “acceptable trade-offs in precision and compute effort” is used without a definition or metric; a brief parenthetical (e.g., “fidelity > 0.99 for circuits up to depth D”) would clarify the claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: Abstract and Methods: the central claim of a verified 50-qubit universal simulation rests on the adaptive data encoding, yet no quantitative bounds on precision loss, no fidelity-versus-depth curves against a double-precision reference, and no statement of the maximum validated circuit depth or gate set are provided; without these the 50-qubit result cannot be assessed for scientific usability.

    Authors: We agree that explicit quantitative validation of the adaptive encoding is necessary to substantiate the 50-qubit claim. In the revised manuscript we will expand the Methods section with measured bounds on precision loss, fidelity-versus-depth curves for representative circuits against double-precision references, and explicit statements of the maximum validated circuit depth together with the supported gate set. revision: yes

  2. Referee: Results section (performance comparison): the 16.6-fold speedup is reported against the external K-computer record, but no internal baseline (e.g., full double-precision run on the same JUPITER hardware or a non-adaptive variant) is shown, leaving open whether the speedup is attributable to the new encoding or to architectural differences.

    Authors: The reported speedup is measured against the published 48-qubit record obtained on different hardware. To isolate the contribution of our techniques we will add internal benchmarks on JUPITER, comparing adaptive versus non-adaptive runs for feasible qubit counts. A full double-precision 50-qubit simulation exceeds available memory, so we will clarify this limitation and supply the strongest feasible internal baselines. revision: partial

Circularity Check

0 steps flagged

No circularity: performance claims benchmarked against external prior record with independent implementation details

full rationale

The paper's central claims rest on engineering innovations for memory extension, adaptive encoding, and network optimization, with the 16.6-fold speedup explicitly measured against an external 48-qubit record on the K computer rather than any self-referential fit or redefinition. No equations or derivations reduce the reported results to parameters defined by the same data; the adaptive encoding is presented as an implementation choice whose precision trade-offs are asserted but not derived from the target simulation outcomes. The work is self-contained against external benchmarks and does not invoke self-citations as load-bearing uniqueness theorems or smuggle ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an engineering implementation paper; it introduces no mathematical axioms, free parameters fitted to data, or new postulated entities. The central claim rests on standard assumptions about supercomputer interconnect performance and the acceptability of reduced-precision arithmetic for quantum state vectors.

pith-pipeline@v0.9.0 · 5700 in / 1142 out tokens · 36840 ms · 2026-05-21T20:22:41.176813+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Bowtie VarQTE: A Resource-Efficient Quantum State Preparation Primitive

    quant-ph 2026-05 unverdicted novelty 6.0

    Bowtie VarQTE is a hybrid classical-quantum variational time evolution method that exploits causal light-cones to reduce quantum resource use for state preparation while achieving fidelities comparable to approximate ...

  2. Large-Scale Quantum Circuit Simulation on an Exascale System for QPU Benchmarking

    quant-ph 2026-04 unverdicted novelty 5.0

    Exascale classical simulation validates noise-tolerant performance of a 98-qubit QPU up to 48 qubits for LR-QAOA, with statistical analysis showing coherent regime up to 93 qubits before outputs become indistinguishab...

  3. The Impact of Qubit Connectivity on Quantum Advantage in Noisy IQP Circuits

    quant-ph 2026-04 unverdicted novelty 5.0

    Sparse qubit connectivity raises compiled depth in noisy IQP circuits, requiring lower effective noise to remain outside the classically simulatable regime compared to fully connected layouts.

  4. Compton Form Factor Extraction using Quantum Deep Neural Networks

    cs.LG 2025-04 unverdicted novelty 4.0

    Quantum-inspired deep neural networks extract Compton form factors from JLab data with higher predictive accuracy and tighter uncertainties than classical DNNs on pseudodata benchmarks, then applied to real measurements.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · cited by 4 Pith papers · 4 internal anchors

  1. [1]

    S. S. Gill, S. Tuli, M. Xu, I. C. Singh, S. Dust- dar, R. Buyya, Quantum computing: A taxonomy, systematic review and future directions, Soft- ware: Practice and Experience 52 (2022) 92–136. doi:10.1002/spe.2958

  2. [2]

    Nielsen, I

    M. Nielsen, I. Chuang, Quantum Computation and Quantum Information, 10th anniversary edi- tion ed., Cambridge University Press, Cambridge,

  3. [3]

    doi:10.1017/cbo9780511976667. 13

  4. [4]

    De Raedt, K

    K. De Raedt, K. Michielsen, H. De Raedt, B. Trieu, G. Arnold, M. Richter, Th. Lippert, H. Watanabe, N. Ito, Massively parallel quan- tum computer simulator, Comp. Phys. Comm. 176 (2007) 121 – 136. doi:10.1016/j.cpc.2006.0 8.007

  5. [5]

    URL: https://qiskit.org/

    The Qiskit Community, Qiskit: An open-source framework for quantum computing, 2023. URL: https://qiskit.org/

  6. [6]

    URL:https: //quantumai.google/cirq

    The Cirq Developers, Cirq: A python framework for nisq-era quantum circuits, 2023. URL:https: //quantumai.google/cirq

  7. [7]

    URL:https://eviden.c om/solutions/advanced-computing/quant um-computing/qaptiva-hpc/

    Eviden, Qaptiva: Quantum application develop- ment platform, 2023. URL:https://eviden.c om/solutions/advanced-computing/quant um-computing/qaptiva-hpc/

  8. [8]

    De Raedt, F

    H. De Raedt, F. Jin, D. Willsch, M. Willsch, N. Yoshioka, N. Ito, S. Yuan, K. Michielsen, Massively parallel quantum computer simulator, eleven years later, Comp. Phys. Comm. 237 (2019) 47 – 61. doi:10.1016/j.cpc.2018.1 1.005

  9. [9]

    Willsch, D

    M. Willsch, D. Willsch, F. Jin, H. De Raedt, K. Michielsen, Benchmarking the quantum ap- proximate optimization algorithm, Quantum In- formation Processing 19 (2020). doi:10.1007/s1 1128-020-02692-8

  10. [10]

    Willsch, M

    D. Willsch, M. Willsch, F. Jin, K. Michielsen, H. De Raedt, GPU-accelerated simulations of quantum annealing and the quantum approximate optimization algorithm, Comp. Phys. Comm. 278 (2022) 108411. doi:10.1016/j.cpc.2022.108 411

  11. [11]

    URL:https:// www.hpcwire.com/2010/06/28/quantum_com puter_simulation_new_world_record_on_j ugene/

    HPCwire, Quantum computer simulation: New world record on jugene, 2010. URL:https:// www.hpcwire.com/2010/06/28/quantum_com puter_simulation_new_world_record_on_j ugene/

  12. [12]

    URL:https://www

    HPCwire, World record: Quantum computer with 46 qubits simulated, 2017. URL:https://www. hpcwire.com/2017/12/18/world-record-q uantum-computer-46-qubits-simulated/

  13. [13]

    D. Alvarez, JUWELS cluster and booster: Exas- cale pathfinder with modular supercomputing ar- chitecture at Jülich Supercomputing Centre, Jour- nal of large-scale research facilities JLSRF 7 (2021). doi:10.17815/jlsrf-7-183

  14. [14]

    quantum supremacy

    Y . Liu, X. Liu, F. Li, H. Fu, Y . Yang, J. Song, P. Zhao, Z. Wang, D. Peng, H. Chen, C. Guo, H. Huang, W. Wu, D. Chen, Closing the “quantum supremacy” gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer, in: Proceedings of the Interna- tional Conference for High Performance Comput- ing, Networking, Storage and...

  15. [15]

    doi:10 .1038/s41586-019-1666-5

    Google AI Quantum, collaborators, Quantum supremacy using a programmable superconduct- ing processor, Nature 574 (2019) 505–510. doi:10 .1038/s41586-019-1666-5

  16. [16]

    Willsch, M

    D. Willsch, M. Willsch, F. Jin, H. De Raedt, K. Michielsen, Large-scale simulation of Shor’s quantum factoring algorithm, Mathematics 11 (2023) 4222. doi:10.3390/math11194222

  17. [17]

    URL:ht tps://docs.nvidia.com/cuda/cuda-c-pro gramming-guide/index.html#data-usage -hints

    Cuda c++ programming guide - data usage hints section, Last retrieved on April 1, 2025. URL:ht tps://docs.nvidia.com/cuda/cuda-c-pro gramming-guide/index.html#data-usage -hints

  18. [18]

    URL:https: //docs.nvidia.com/cuda/archive/12.4

    Cuda 12.4 release notes - general cuda section, Last retrieved on April 1, 2025. URL:https: //docs.nvidia.com/cuda/archive/12.4. 0/cuda-toolkit-release-notes/index.h tml#general-cuda

  19. [19]

    Willsch, H

    D. Willsch, H. Lagemann, M. Willsch, F. Jin, H. De Raedt, K. Michielsen, Benchmarking su- percomputers with the Jülich Universal Quantum Computer Simulator (2019). doi:10.48550/ARX IV.1912.03243

  20. [20]

    Shor, Polynomial-time algorithms for prime fac- torization and discrete logarithms on a quantum computer, SIAM Review 41 (1999) 303

    P. Shor, Polynomial-time algorithms for prime fac- torization and discrete logarithms on a quantum computer, SIAM Review 41 (1999) 303

  21. [21]

    T. G. Draper, Addition on a quantum computer, arXiv:quant-ph/0008033 (2000)

  22. [22]

    Michielsen, M

    K. Michielsen, M. Nocon, D. Willsch, F. Jin, Th. Lippert, H. De Raedt, Benchmarking gate-based quantum computers, Comp. Phys. Comm. 220 (2017) 44 – 55

  23. [23]

    Hilfer fractional advection-diffusion equations with power-law initial condition; a Numerical study using variational iteration method

    A. Herten, S. Achilles, D. Alvarez, J. Badwaik, E. Behle, M. Bode, T. Breuer, D. Caviedes- V oullième, M. Cherti, A. Dabah, S. E. Sayed, W. Frings, A. Gonzalez-Nicolas, E. B. Gregory, K. H. Mood, T. Hater, J. Jitsev, C. M. John, 14 J. H. Meinke, C. I. Meyer, P. Mezentsev, J.-O. Mirus, S. Nassyr, C. Penke, M. Römmer, U. Sinha, B. v. S. Vieth, O. Stein, E. ...

  24. [24]

    JUNIQ: The Jülich UNified Infrastructure for Quantum Computing, Last retrieved on April 1,

  25. [25]

    URL:https://www.fz-juelich.de/ en/ias/jsc/systems/quantum-computing/j uniq-facility/juniq

  26. [26]

    Love, Alán Aspuru-Guzik, and Jeremy L

    A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, J. L. O’Brien, A variational eigenvalue solver on a quantum processor, Nature Communications 5 (2014) 4213. doi:10.1038/ncomms5213

  27. [27]

    J. R. McClean, J. Romero, R. Babbush, A. Aspuru-Guzik, The theory of varia- tional hybrid quantum-classical algorithms, New Journal of Physics 18 (2016) 023023. doi:10.1088/1367-2630/18/2/023023

  28. [28]

    Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R

    M. Cerezo, A. Arrasmith, R. Babbush, S. Ben- jamin, S. Endo, K. Fujii, J. McClean, K. Mitarai, X. Yuan, L. Cincio, P. Coles, Variational quan- tum algorithms, Nature Reviews Physics 3 (2021) 625–644. doi:10.1038/s42254-021-00348-9

  29. [29]

    A Quantum Approximate Optimization Algorithm

    E. Farhi, J. Goldstone, S. Gutmann, A quan- tum approximate optimization algorithm, arXiv preprint arXiv:1411.4028 (2014). URL:https: //arxiv.org/abs/1411.4028

  30. [30]

    Zhou, S.-T

    L. Zhou, S.-T. Wang, S. Choi, H. Pichler, M. D. Lukin, Quantum approximate optimization algo- rithm: Performance, mechanism, and implemen- tation on near-term devices, Phys. Rev. X 10 (2020) 021067. URL:https://link.aps.o rg/doi/10.1103/PhysRevX.10.021067. doi:10.1103/PhysRevX.10.021067

  31. [31]

    P. C. Lotshaw, T. Nguyen, A. Santana, et al., Scal- ing quantum approximate optimization on near- term hardware, Sci. Rep. 12 (2022). URL:https: //doi.org/10.1038/s41598-022-14767-w. doi:10.1038/s41598-022-14767-w

  32. [32]

    K. Vyas, F. Jin, H. De Raedt, K. Michielsen, Quantum speed-up for solving the one- dimensional Hubbard model using quantum annealing, arXiv:quant-ph/2510.02141 (2025). 15 Appendix A. Raw performance data Table A.5: Elapsed and compute times for executing Hadamard gates [2] in a sequence designed to challenge both computation and communication on JUPITER, ...