End-to-end performance of quantum-accelerated large-scale linear algebra workflows
Pith reviewed 2026-05-15 09:55 UTC · model grok-4.3
The pith
A quantum graph partitioner inside LS-DYNA cuts amortized wall-clock time by 5.9 to 14.6 percent on large FEA meshes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By routing the graph-partitioning subproblem of LS-DYNA through Iterative-QAOA on up to 150 qubits, the hybrid framework lowers amortized wall-clock time for vibrational analysis of a sedan and a jet engine and for transient simulation of a drill and an impeller; the observed reductions are at least 5.9 percent on every model and reach 14.6 percent on some, after full accounting for MPI-distributed execution, quantum hardware calls, and classical overhead on meshes of up to 35 million elements.
What carries the argument
Iterative-QAOA applied to the graph-partitioning problem that minimizes fill-in during sparse-matrix factorization inside the LS-DYNA finite-element solver.
If this is right
- The same quantum partitioner can be dropped into other sparse-direct or iterative solvers that rely on graph ordering.
- As qubit count and fidelity increase, the same workflow extends without code changes to meshes larger than 35 million elements.
- End-to-end timing already includes data-transfer overhead, so further reductions require only faster quantum hardware rather than new classical interfaces.
- The measured gains are independent of the particular FEA physics (vibration or transient), suggesting broad applicability inside multiphysics packages.
Where Pith is reading between the lines
- Engineering codes that already expose a modular partitioner step can adopt quantum acceleration with minimal refactoring.
- If quantum-classical latency drops by another factor of two, the same technique would become attractive for real-time design iterations rather than overnight batch jobs.
- The current results set a baseline against which future fault-tolerant quantum partitioners can be compared without changing the classical outer loop.
Load-bearing premise
The time saved by the quantum partitioner exceeds the added cost of moving mesh data to and from the quantum device plus any classical post-processing for the mesh sizes and solver settings that were tested.
What would settle it
Re-running the identical LS-DYNA jobs on the same hardware with the classical partitioner restored and finding that total wall-clock time is equal or longer on every model would falsify the reported speed-up.
Figures
read the original abstract
Solving large-scale sparse linear systems is a challenging computational task due to the introduction of non-zero elements, or "fill-in." The Graph Partitioning Problem (GPP) arises naturally when minimizing fill-in and accelerating solvers. In this paper, we measure the end-to-end performance of a hybrid quantum-classical framework designed to accelerate Finite Element Analysis (FEA) by integrating a quantum solver for GPP into Synopsys/Ansys' LS-DYNA multiphysics simulation software. The quantum solver we use is based on Iterative-QAOA, a scalable, non-variational quantum approach for optimization. We focus on two specific classes of FEA problems, namely vibrational (eigenmode) analysis and transient simulation. We report numerical simulations on up to 150 qubits done on NVIDIA's CUDA-Q/cuTensorNet and implementation on IonQ's Forte quantum hardware. The potential impact on LS-DYNA workflows is quantified by measuring the wall-clock time-to-solution for complex problem instances, including vibrational analysis of large finite element models of a sedan car and a Rolls-Royce jet engine, as well as transient simulations of a drill and an impeller. We performed end-to-end performance measurements on meshes comprising up to 35 million elements. Measurements were conducted using LS-DYNA in distributed-memory mode via Message Passing Interface (MPI) on AWS and Synopsys compute clusters. Our findings indicate that with a quantum computer in the loop, amortized LS-DYNA wall-clock time can be improved by up to 14.6% for specific cases and by at least 5.9% for all models considered. These results highlight the significant potential of quantum computing to reduce time-to-solution for large-scale FEA simulations within the Noisy Intermediate-Scale Quantum (NISQ) era, offering an approach that is scalable and extendable into the fault-tolerant quantum computing regime.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a hybrid quantum-classical framework that embeds Iterative-QAOA graph partitioning into LS-DYNA to reduce fill-in during large-scale finite-element solves. It reports end-to-end wall-clock time reductions of 5.9–14.6 % on vibrational and transient problems with meshes up to 35 million elements, obtained from 150-qubit CUDA-Q simulations and IonQ Forte hardware runs.
Significance. If the net speedups survive full overhead accounting, the work supplies one of the first concrete demonstrations that a NISQ optimizer can measurably accelerate an industrial multiphysics code on production-scale meshes, thereby linking quantum combinatorial solvers to real engineering workflows.
major comments (3)
- [Abstract and performance-evaluation section] Abstract and performance-evaluation section: the headline 5.9–14.6 % amortized improvements are stated without any breakdown of quantum versus classical wall-clock components, without error bars, and without the number of QAOA iterations or shots. Consequently it is impossible to confirm that the reported savings exceed the combined costs of mesh encoding, quantum-classical data movement, Iterative-QAOA runtime, and classical post-processing for the 35 M-element cases.
- [Performance-evaluation section] Performance-evaluation section: no timing or quality comparison is supplied against the classical partitioner METIS (or any other standard baseline) under identical LS-DYNA MPI configurations. Without this control it cannot be established that the quantum partitioner, rather than simply a different partitioning heuristic, is responsible for the observed solver-time reduction.
- [Methods and results] Methods and results: the mapping from measured partition quality to actual fill-in reduction and solver runtime is not quantified; the manuscript therefore provides no direct evidence that the Iterative-QAOA partitions improve the linear-algebra kernels enough to offset interface overhead on the tested meshes.
minor comments (2)
- [Abstract] Abstract: the phrase “amortized LS-DYNA wall-clock time” is used without defining the amortization window or the number of repeated solves over which the quantum overhead is spread.
- [Figures and text] Figure captions and text: qubit counts, circuit depths, and hardware versus simulation distinctions for each model should be tabulated for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review. The comments highlight important aspects of clarity and validation that we address below. We have revised the manuscript to incorporate additional details, comparisons, and quantifications as described in the point-by-point responses.
read point-by-point responses
-
Referee: [Abstract and performance-evaluation section] Abstract and performance-evaluation section: the headline 5.9–14.6 % amortized improvements are stated without any breakdown of quantum versus classical wall-clock components, without error bars, and without the number of QAOA iterations or shots. Consequently it is impossible to confirm that the reported savings exceed the combined costs of mesh encoding, quantum-classical data movement, Iterative-QAOA runtime, and classical post-processing for the 35 M-element cases.
Authors: We agree that the original presentation lacked sufficient granularity. In the revised manuscript we have expanded both the abstract and the performance-evaluation section with a new table that decomposes amortized wall-clock time into quantum (encoding, Iterative-QAOA runtime on CUDA-Q and IonQ Forte, shots, and iterations) and classical (data movement, post-processing, and LS-DYNA solver) components. Error bars are now reported from repeated runs, and the number of QAOA iterations and shots is stated explicitly for each mesh size. The updated data confirm that net savings for the 35 M-element cases remain positive after all overheads. revision: yes
-
Referee: [Performance-evaluation section] Performance-evaluation section: no timing or quality comparison is supplied against the classical partitioner METIS (or any other standard baseline) under identical LS-DYNA MPI configurations. Without this control it cannot be established that the quantum partitioner, rather than simply a different partitioning heuristic, is responsible for the observed solver-time reduction.
Authors: We accept that a direct baseline is necessary to isolate the contribution of the quantum solver. The revised performance-evaluation section now includes a side-by-side comparison of Iterative-QAOA partitions against METIS (and Scotch) under identical LS-DYNA MPI configurations on the same AWS and Synopsys clusters. We report both partition quality metrics (edge cut, balance) and the resulting LS-DYNA wall-clock times, demonstrating that the observed solver-time reductions are attributable to the Iterative-QAOA partitions rather than to a generic change in heuristic. revision: yes
-
Referee: [Methods and results] Methods and results: the mapping from measured partition quality to actual fill-in reduction and solver runtime is not quantified; the manuscript therefore provides no direct evidence that the Iterative-QAOA partitions improve the linear-algebra kernels enough to offset interface overhead on the tested meshes.
Authors: We have added a new quantitative subsection in Methods that explicitly maps partition quality metrics (edge cut and vertex balance) to measured fill-in reduction and LS-DYNA solver runtime for each mesh. Using data from the vibrational and transient test cases, we show the correlation coefficients and the net offset of interface overhead, thereby providing direct evidence that the linear-algebra improvements exceed the hybrid overhead on the 35 M-element meshes. revision: yes
Circularity Check
No circularity: end-to-end claims rest on direct wall-clock measurements
full rationale
The paper reports empirical wall-clock time reductions (5.9–14.6 %) obtained by running LS-DYNA with Iterative-QAOA graph partitions on concrete meshes up to 35 M elements, using CUDA-Q simulations and IonQ hardware. These numbers are external observables measured on AWS/Synopsys clusters; they are not outputs of any fitted parameter, self-referential equation, or derivation that reduces to the paper’s own inputs by construction. No load-bearing step invokes a uniqueness theorem, ansatz smuggled via self-citation, or renaming of a known result. The central claim therefore remains independent of the paper’s internal formalism.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Iterative-QAOA ... linear-ramp schedule ... Boltzmann-weighted average ... spectral coarsening ... Fiduccia–Mattheyses refinement ... end-to-end wall-clock time improvements of 5.9–14.6 %
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Nested Dissection ... Graph Partitioning Problem ... QUBO cost Hamiltonian HC
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Nested dissection of a regular finite element mesh,
A. George, “Nested dissection of a regular finite element mesh,”SIAM journal on numerical analysis, vol. 10, no. 2, pp. 345–363, 1973. [Online]. Available: https://doi.org/10.1137/0710032
-
[2]
“METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices,” technical report, Department of Computer Science, University of Minnesota, 1998. [Online]. Available: https://hdl.handle.net/11299/2 15346
work page 1998
-
[3]
K. Andreev and H. Racke, “Balanced graph partitioning,”Theory of Computing Systems, vol. 39, no. 6, pp. 929–939, Oct 2006. [Online]. Available: https://doi.org/10.1007/s00224-006-1350-7
-
[4]
W. Aboumrad, D. Zhu, C. Girotto, F.-H. Rouet, J. Jojo, R. Lucas, J. Pathak, A. Kaushik, and M. Roetteler, “Accelerating large-scale linear algebra using variational quantum imaginary time evolution,” in2025 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 1. IEEE, 2025, pp. 1965–1970. [Online]. Available: https://doi.org/10.1...
-
[6]
Available: https://arxiv.org/abs/2510.26859
[Online]. Available: https://arxiv.org/abs/2510.26859
-
[7]
Warm-starting quantum optimization,
D. J. Egger, J. Mare ˇcek, and S. Woerner, “Warm-starting quantum optimization,”Quantum, vol. 5, p. 479, 17 Jun. 2021. [Online]. Available: https://quantum-journal.org/papers/q-2021-06-17-479/pdf/
work page 2021
-
[8]
GPU-accelerated simulations of quantum annealing and the quantum approximate optimization algorithm,
D. Willsch, M. Willsch, F. Jin, K. Michielsen, and H. De Raedt, “GPU-accelerated simulations of quantum annealing and the quantum approximate optimization algorithm,”Comput. Phys. Commun., vol. 278, no. 108411, p. 108411, 1 Sep. 2022. [Online]. Available: http://dx.doi.org/10.1016/j.cpc.2022.108411
-
[9]
Quantum annealing initialization of the quantum approximate optimization algorithm,
S. H. Sack and M. Serbyn, “Quantum annealing initialization of the quantum approximate optimization algorithm,”Quantum, vol. 5, p. 491, 1 Jul. 2021. [Online]. Available: http://dx.doi.org/10.22331/q-202 1-07-01-491
-
[10]
Quantum approximate optimization algorithm with adaptive bias fields,
Y . Yu, C. Cao, C. Dewey, X.-B. Wang, N. Shannon, and R. Joynt, “Quantum approximate optimization algorithm with adaptive bias fields,”Phys. Rev. Res., vol. 4, no. 2, p. 023249, 27 Jun. 2022. [Online]. Available: http://dx.doi.org/10.1103/PhysRevResearch.4.023249
-
[11]
Solution of SAT problems with the adaptive-bias quantum approximate optimization algorithm,
Y . Yu, C. Cao, X.-B. Wang, N. Shannon, and R. Joynt, “Solution of SAT problems with the adaptive-bias quantum approximate optimization algorithm,”Phys. Rev. Res., vol. 5, no. 2, p. 023147, 2 Jun. 2023. [Online]. Available: http://dx.doi.org/10.1103/PhysRevResearch.5.023 147
-
[12]
L. Zhu, H. L. Tang, G. S. Barron, F. A. Calderon-Vargas, N. J. Mayhall, E. Barnes, and S. E. Economou, “Adaptive quantum approximate optimization algorithm for solving combinatorial problems on a quantum computer,”Phys. Rev. Res., vol. 4, no. 3, p. 033029, 11 Jul
-
[13]
Hybrid quantum-classical reservoir comput- ing of thermal convection flow.Phys
[Online]. Available: http://dx.doi.org/10.1103/PhysRevResearch.4 .033029
-
[14]
R. Tate, J. Moondra, B. Gard, G. Mohler, and S. Gupta, “Warm-started QAOA with custom mixers provably converges and computationally beats goemans-williamson’s max-cut at low circuit depths,”Quantum, vol. 7, p. 1121, 26 Sep. 2023. [Online]. Available: http://dx.doi.org/10. 22331/q-2023-09-26-1121 10 Fig. 7: LR-QAOA cost landscape for 24-qubit instances of ...
work page 2023
-
[15]
Bias-field digitized counterdiabatic quantum optimization,
A. G. Cadavid, A. Dalal, A. Simen, E. Solano, and N. N. Hegade, “Bias-field digitized counterdiabatic quantum optimization,”Phys. Rev. Res., vol. 7, no. 2, p. L022010, 9 Apr. 2025. [Online]. Available: http://dx.doi.org/10.1103/PhysRevResearch.7.L022010
-
[16]
Warm-start adaptive- bias quantum approximate optimization algorithm,
Y . Yu, X.-B. Wang, N. Shannon, and R. Joynt, “Warm-start adaptive- bias quantum approximate optimization algorithm,”Phys. Rev. A, vol. 112, no. 1, p. 012422, 23 Jul. 2025. [Online]. Available: http://dx.doi.org/10.1103/nt3w-j4mj
-
[17]
National highway traffic safety administration (nhtsa), crash simulation vehicle models,
“National highway traffic safety administration (nhtsa), crash simulation vehicle models,” https://www.nhtsa.gov/crash-simulation- vehicle-models. [Online]. Available: https://www.nhtsa.gov/crash-simul ation-vehicle-models
-
[18]
Quantum alternating operator ansatz (qaoa) beyond low depth with gradu- ally changing unitaries
V . Kremenetski, A. Apte, T. Hogg, S. Hadfield, and N. M. Tubman, “Quantum alternating operator ansatz (QAOA) beyond low depth with gradually changing unitaries,”arXiv [quant-ph], 8 May 2023. [Online]. Available: http://arxiv.org/abs/2305.04455
-
[19]
J. A. Montañez-Barrera and K. Michielsen, “Toward a linear- ramp QAOA protocol: evidence of a scaling advantage in solving some combinatorial optimization problems,”Npj Quantum Inf., vol. 11, no. 1, pp. 1–12, 4 Aug. 2025. [Online]. Available: http://dx.doi.org/10.1038/s41534-025-01082-1
-
[20]
C. Godsil and G. Royle,Algebraic Graph Theory, ser. Graduate Texts in Mathematics. New York: Springer-Verlag, 2001, vol. 207. [Online]. Available: https://link.springer.com/book/10.1007/978-1-4613-0163-9 11 10 4 10 3 10 2 10 1 100 Quasi-probability Iter = 0 Iter = 3 Iter = 9 Impeller 20 0 20 Objective Function Cost 10 4 10 3 10 2 10 1 100 Quasi-probabilit...
-
[21]
A linear-time heuristic for improving network partitions,
C. M. Fiduccia and R. M. Mattheyses, “A linear-time heuristic for improving network partitions,” inPapers on Twenty-Five Years of Electronic Design Automation, ser. 25 years of DAC. New York, NY , USA: Association for Computing Machinery, 1988, p. 241–247. [Online]. Available: https://doi.org/10.1145/62882.62910
-
[22]
Benchmarking a trapped-ion quantum computer with 30 qubits,
J.-S. Chen, E. Nielsen, M. Ebert, V . Inlek, K. Wright, V . Chaplin, A. Maksymov, E. Páez, A. Poudel, P. Maunzet al., “Benchmarking a trapped-ion quantum computer with 30 qubits,”Quantum, vol. 8, p. 1516, 2024. [Online]. Available: https://doi.org/10.22331/q-2024-11-0 7-1516
-
[23]
S. Kim, R. R. McLeod, M. Saffman, and K. H. Wagner, “Doppler-free, multiwavelength acousto-optic deflector for two-photon addressing arrays of rb atoms in a quantum information processor,”Appl. Opt., vol. 47, no. 11, pp. 1816–1831, Apr. 2008. [Online]. Available: https://doi.org/10.1364/AO.47.001816
-
[24]
Compact ion-trap quantum computing demonstrator,
I. Pogorelov, T. Feldker, C. D. Marciniak, L. Postler, G. Jacob, O. Krieglsteiner, V . Podlesnic, M. Meth, V . Negnevitsky, M. Stadler et al., “Compact ion-trap quantum computing demonstrator,”PRX Quantum, vol. 2, no. 2, p. 020343, Jun. 2021. [Online]. Available: https://doi.org/10.1103/PRXQuantum.2.020343
-
[25]
A. Maksymov, J. Nguyen, Y . Nam, and I. Markov, “Enhancing quantum computer performance via symmetrization,” 2023. [Online]. Available: https://arxiv.org/abs/2301.07233 12 40 60 80 100 120 140 0.85 0.90 0.95 1.00 1.05 1.10 Factorization WCT Drill 40 60 80 100 120 140 0.85 0.95 1.05 1.15 1.25 1.35 Factorization WCT Impeller 40 60 80 100 120 140 Nodes 0.85 ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.