GPU Performance of an Entropy-Stable Discontinuous Galerkin Euler Solver with Non-Conservative Terms
Pith reviewed 2026-05-19 20:34 UTC · model grok-4.3
pith:ZM7Y2R4G Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{ZM7Y2R4G}
Prints a linked pith:ZM7Y2R4G badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
Entropy-stable discontinuous Galerkin solver for Euler equations with buoyancy reaches nearly 70% of GPU peak performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By porting the entropy-stable discontinuous Galerkin discretization of the compressible Euler equations with buoyancy to GPU hardware and applying targeted modifications that reduce complex operations, lower memory traffic, improve load balance, and extend symmetry-based flux savings to the non-symmetric gravity term, the solver attains nearly 70% of 64-bit floating-point peak performance for the volume terms kernel on NVIDIA A100 hardware. The resulting implementation delivers a factor of 10 speedup and over 13 times better energy efficiency versus a highly optimized CPU code, achieves the expected doubling of speed when run at 32-bit precision, and shows strong and weak scaling performance
What carries the argument
Symmetry-based flux savings extended to the non-symmetric gravity term within the discontinuous Galerkin volume kernel, which preserves nearly the full factor-of-two speedup for entropy-stable fluxes.
Load-bearing premise
The sequence of GPU-specific modifications preserves both the formal entropy stability and the numerical accuracy of the original discontinuous Galerkin scheme when non-conservative buoyancy terms are present.
What would settle it
Direct side-by-side execution of the GPU and CPU solvers on the rising thermal bubble test case that shows divergence in computed entropy or solution accuracy would disprove preservation of the scheme properties.
Figures
read the original abstract
The entropy-stable discontinuous Galerkin method for compressible Euler equations with buoyancy is implemented on graphics processing unit (GPU) hardware. We measure the performance of the solver on three-dimensional problems: the rising thermal bubble and the baroclinic instability in a channel. On NVIDIA A100 hardware, the solver achieves nearly 70\% of 64-bit floating-point peak performance for the most computationally expensive kernel (volume terms) and significantly reduces the computational overhead typically incurred by two point entropy-stable fluxes in the volume terms. We also present impressive strong and weak scaling performance of the solver and compare to a highly-optimized central processing unit (CPU) code showing that the GPU kernels are a factor of $10\times$ faster and better than $13\times$ more energy efficient than the CPU code. We also show that the solver achieves the expected $2\times$ speedup when run at 32-bit floating-point peak performance. We discuss the different modifications that we implemented to reach the final form of the GPU implementation and measure the performance gain of each of the implementation strategies ranging from reduction in complex operations and memory traffic as well as load balancing. We also extend symmetry-based flux savings to the non-symmetric gravity term, preserving nearly the full factor-of-two speedup achieved for the symmetric flux.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a GPU implementation of an entropy-stable discontinuous Galerkin solver for the compressible Euler equations with non-conservative buoyancy terms. It evaluates performance on NVIDIA A100 hardware for 3D test cases including the rising thermal bubble and baroclinic instability, reporting nearly 70% of FP64 peak performance for the volume kernel, a 10× speedup and 13× energy efficiency improvement over a CPU reference, nearly 2× speedup at FP32, strong/weak scaling results, and a breakdown of gains from optimizations such as reduced operations, lower memory traffic, load balancing, and extension of symmetry-based flux savings to the non-symmetric gravity term.
Significance. If the GPU kernels preserve the original scheme's entropy stability and accuracy, the work demonstrates that entropy-stable DG methods with non-conservative terms can achieve high fractions of peak GPU performance and substantial speedups over CPU implementations. The per-optimization performance quantification and scaling data would be useful for practitioners developing high-order CFD solvers on accelerators.
major comments (2)
- [GPU implementation and flux optimizations] The extension of symmetry-based flux savings to the non-symmetric gravity term (described in the implementation section) is claimed to preserve nearly the full factor-of-two speedup, but the manuscript provides no verification—via discrete entropy inequality checks, comparison of solutions to the CPU reference, or reproduction of the entropy-consistency condition—that the modified two-point volume kernel retains the entropy stability property when buoyancy terms are present. This verification is load-bearing for the claim that the reported performance is achieved by an entropy-stable scheme.
- [Performance results and test cases] Performance results (e.g., 70% of peak for the volume kernel, 10× speedup) are presented without error bars, multiple timing runs, or statistical measures of variability. There is also no explicit confirmation that numerical accuracy (error norms or solution fidelity on the test cases) is unchanged after the sequence of GPU modifications including reduced complex operations and load balancing.
minor comments (2)
- [Abstract] The abstract states results on 'three-dimensional problems' with two named test cases; clarify whether additional cases were used or if this is a minor wording inconsistency.
- [CPU comparison] The CPU baseline is described as 'highly-optimized' but the manuscript does not detail the specific CPU implementation choices, compiler flags, or hardware used for the comparison; this would aid reproducibility of the speedup claims.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment below and agree that the suggested additions will strengthen the paper. Revisions will be incorporated in the next version.
read point-by-point responses
-
Referee: [GPU implementation and flux optimizations] The extension of symmetry-based flux savings to the non-symmetric gravity term (described in the implementation section) is claimed to preserve nearly the full factor-of-two speedup, but the manuscript provides no verification—via discrete entropy inequality checks, comparison of solutions to the CPU reference, or reproduction of the entropy-consistency condition—that the modified two-point volume kernel retains the entropy stability property when buoyancy terms are present. This verification is load-bearing for the claim that the reported performance is achieved by an entropy-stable scheme.
Authors: We agree that explicit verification is necessary to support the claim. The extension of the symmetry-based savings to the gravity term is constructed to preserve the two-point entropy-conservative flux properties required by the underlying scheme, but we did not include numerical checks in the submitted manuscript. In the revision we will add discrete entropy inequality verification on the rising thermal bubble and baroclinic instability cases together with direct solution comparisons (including entropy values) against the CPU reference implementation. revision: yes
-
Referee: [Performance results and test cases] Performance results (e.g., 70% of peak for the volume kernel, 10× speedup) are presented without error bars, multiple timing runs, or statistical measures of variability. There is also no explicit confirmation that numerical accuracy (error norms or solution fidelity on the test cases) is unchanged after the sequence of GPU modifications including reduced complex operations and load balancing.
Authors: We accept that statistical measures and accuracy confirmation should be provided. The original manuscript reported single-run timings focused on the achieved performance and scaling; we will revise the performance section to include multiple timing runs with error bars or standard deviations. We will also add a direct comparison of numerical accuracy (error norms and solution fidelity) between the CPU reference and the final GPU code after all optimizations to confirm that accuracy is preserved. revision: yes
Circularity Check
No circularity: performance results are direct empirical timings against external CPU baseline
full rationale
The paper presents an implementation of a prior entropy-stable DG scheme on GPU hardware, with all central claims consisting of measured wall-clock times, achieved fractions of peak FLOPS, speedup ratios, and energy-efficiency comparisons on A100 hardware versus a CPU reference. These quantities are obtained by direct execution and instrumentation; they do not reduce to any fitted parameter, self-citation loop, or redefinition of the input scheme. The extension of symmetry-based flux savings to the gravity term is described as an implementation choice that preserves the original two-point flux form, but the performance numbers themselves are independent of that choice and are validated by explicit timing breakdowns. No derivation chain collapses by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Symmetry-based flux savings extend to the non-symmetric gravity term while preserving entropy stability and nearly the full factor-of-two reduction in operations.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
entropy-stable discontinuous Galerkin method for compressible Euler equations with buoyancy
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Betts, John T. , year =. Practical. doi:10.1137/1.9781611976199 , publisher =
-
[2]
An. SIAM Review , author =. 2017 , note =. doi:10.1137/16M1062569 , abstract =
- [3]
-
[4]
Calculus of variations and optimal control theory: a concise introduction , isbn =
Liberzon, Daniel , year =. Calculus of variations and optimal control theory: a concise introduction , isbn =
- [5]
-
[6]
Mohamad, A. A. , year =. Lattice. doi:10.1007/978-0-85729-455-5 , language =
-
[7]
Pugh, Charles C. , year =. Real. doi:10.1007/978-3-319-17771-7 , language =
-
[8]
Giraldo, Francis X. , year =. An. doi:10.1007/978-3-030-55069-1 , language =
- [9]
-
[10]
and Halperin, Igor and Bilokon, Paul , year =
Dixon, Matthew F. and Halperin, Igor and Bilokon, Paul , year =. Machine. doi:10.1007/978-3-030-41068-1 , language =
-
[11]
Alberty, Jochen and Carstensen, Carsten and Funken, Stefan A , file =. Remarks around 50 lines of
-
[12]
Archives of Computational Methods in Engineering , author =
Eighty. Archives of Computational Methods in Engineering , author =. 2022 , pages =. doi:10.1007/s11831-022-09740-9 , abstract =
-
[13]
Davidson, P. A. , month = dec, year =. Introduction to. doi:10.1017/9781316672853 , language =
-
[14]
Journal of Computational Physics , author =
A linearity preserving nodal variation limiting algorithm for continuous. Journal of Computational Physics , author =. 2020 , pages =. doi:10.1016/j.jcp.2020.109390 , abstract =
-
[15]
Journal of Computational Physics , author =
Hyperbolic. Journal of Computational Physics , author =. 2002 , pages =. doi:10.1006/jcph.2001.6961 , language =
- [16]
-
[17]
Toro, E. F. , year =. Riemann solvers and numerical methods for fluid dynamics: a practical introduction , isbn =
-
[18]
Fifty challenging problems in probability with solutions , isbn =
Mosteller, Frederick , year =. Fifty challenging problems in probability with solutions , isbn =
-
[19]
Datseris, George and Parlitz, Ulrich , year =. Nonlinear. doi:10.1007/978-3-030-91032-7 , language =
-
[20]
Chaos and fractals: new frontiers of science , isbn =
Peitgen, Heinz-Otto and Jürgens, Hartmut and Saupe, Dietmar , year =. Chaos and fractals: new frontiers of science , isbn =
-
[21]
Laaksonen, Antti , year =. Guide to. doi:10.1007/978-3-319-72547-5 , language =
-
[22]
Journal of Scientific Computing , author =
A. Journal of Scientific Computing , author =. 2022 , pages =. doi:10.1007/s10915-022-01918-4 , abstract =
-
[23]
Journal of Computational Physics , author =
Ideal. Journal of Computational Physics , author =. 2018 , pages =. doi:10.1016/j.jcp.2018.03.002 , abstract =
-
[24]
and Worden, Simon Peter and Hastings, Daniel E
Ferguson, Dale C. and Worden, Simon Peter and Hastings, Daniel E. , month = sep, year =. The. IEEE Transactions on Plasma Science , publisher =. doi:10.1109/tps.2015.2412775 , abstract =
-
[25]
Tóth, Gábor , month = jul, year =. The ∇·. Journal of Computational Physics , publisher =. doi:10.1006/jcph.2000.6519 , language =
-
[26]
The Journal of Political Economy , author =
The. The Journal of Political Economy , author =. 1973 , pages =
work page 1973
-
[27]
Pettersson, Mass Per and Iaccarino, Gianluca and Nordström, Jan , year =. Polynomial. doi:10.1007/978-3-319-10714-1 , language =
-
[28]
Bulletin of the Polish Academy of Sciences: Technical Sciences , author =
Numerical solutions of magnetohydrodynamic equations , volume =. Bulletin of the Polish Academy of Sciences: Technical Sciences , author =. 2011 , file =. doi:10.2478/v10175-011-0027-9 , abstract =
- [29]
-
[30]
Applied partial differential equations: with
Haberman, Richard , year =. Applied partial differential equations: with
-
[31]
Bender, Carl M. and Orszag, Steven A. , year =. Advanced. doi:10.1007/978-1-4757-3069-2 , language =
-
[32]
Strogatz, Steven H , month = jan, year =. Nonlinear. doi:10.1201/9780429398490 , language =
-
[33]
Kundu, Pijush K. and Cohen, Ira M. and Dowling, David R. and Tryggvason, Gretar , year =. Fluid mechanics , isbn =
-
[34]
Shreve, Steven E. , year =. Stochastic. doi:10.1007/978-0-387-22527-2 , language =
-
[35]
The SMAI Journal of computational mathematics , author =
Theoretical and. The SMAI Journal of computational mathematics , author =. 2023 , pages =. doi:10.5802/smai-jcm.95 , abstract =
-
[36]
Hairer, Ernst and Wanner, Gerhard , year =. Solving. doi:10.1007/978-3-642-05221-7 , language =
-
[37]
Jeevanjee, Nadir , year =. An. doi:10.1007/978-3-319-14794-9 , language =
-
[38]
Principles of mathematical analysis , isbn =
Rudin, Walter , year =. Principles of mathematical analysis , isbn =
- [39]
-
[40]
Kelley, C. T. , year =. Solving nonlinear equations with iterative methods: solvers and examples in
-
[41]
Nahin, Paul J. , year =. Inside. doi:10.1007/978-3-030-43788-6 , language =
-
[42]
Kamberaj, Hiqmet , year =. Electromagnetism:. doi:10.1007/978-3-030-96780-2 , language =
-
[43]
Journal of Fluid Mechanics , author =
Compressible magnetoconvection in three dimensions: planforms and nonlinear behaviour , volume =. Journal of Fluid Mechanics , author =. 1995 , pages =. doi:10.1017/S0022112095004630 , abstract =
-
[44]
The Astrophysical Journal , author =
Flux. The Astrophysical Journal , author =. 1998 , pages =. doi:10.1086/311240 , abstract =
- [45]
-
[46]
Krause, Andreas and Hübotter, Jonas , month = feb, year =. Probabilistic. doi:10.48550/arXiv.2502.05244 , abstract =
-
[47]
Di Pietro, Daniele Antonio and Ern, Alexandre , year =. Mathematical. doi:10.1007/978-3-642-22980-0 , language =
-
[48]
Archives of Computational Methods in Engineering , author =
A. Archives of Computational Methods in Engineering , author =. 2016 , pages =. doi:10.1007/s11831-015-9152-1 , language =
-
[49]
Numerical. 2011 , file =. doi:10.1007/978-3-642-11640-7 , language =
-
[50]
Gander, Martin J. , editor =. 50. Multiple. 2015 , note =. doi:10.1007/978-3-319-23321-5_3 , abstract =
-
[51]
Archives of Computational Methods in Engineering , author =
Current and. Archives of Computational Methods in Engineering , author =. 2019 , pages =. doi:10.1007/s11831-018-9261-8 , abstract =
-
[52]
and Knuth, Donald Ervin and Patashnik, Oren , year =
Graham, Ronald L. and Knuth, Donald Ervin and Patashnik, Oren , year =. Concrete mathematics: a foundation for computer science , isbn =
-
[53]
From. SIAM Review , author =. 2012 , pages =. doi:10.1137/100804036 , abstract =
-
[54]
An analysis of the finite element method , language =
-
[55]
Gander, Martin , file =. Time
-
[56]
Discontinuous. 2000 , file =. doi:10.1007/978-3-642-59721-3 , language =
-
[57]
Lasaint, P. and Raviart, P.A. , year =. On a. Mathematical. doi:10.1016/B978-0-12-208350-1.50008-X , language =
-
[58]
Triangular mesh methods for the neutron transport equation , url =
Reed, W H and Hill, T R , month = oct, year =. Triangular mesh methods for the neutron transport equation , url =
-
[59]
Journal of Computational Physics , author =
A finite-element method for the 1-. Journal of Computational Physics , author =. 1982 , pages =. doi:10.1016/0021-9991(82)90107-3 , language =
-
[60]
Polynomial approximation of functions and derivatives , language =
Venturi, Daniele , file =. Polynomial approximation of functions and derivatives , language =
-
[61]
ESAIM: Mathematical Modelling and Numerical Analysis , author =
The local projection \. ESAIM: Mathematical Modelling and Numerical Analysis , author =. 1989 , pages =. doi:10.1051/m2an/1989230405651 , language =
-
[62]
Shu, Bernardo Cockburn Chi-Wang , file =. The
-
[63]
An artificial viscosity approach to high order entropy stable discontinuous
Chan, Jesse , month = jan, year =. An artificial viscosity approach to high order entropy stable discontinuous. doi:10.48550/arXiv.2501.16529 , abstract =
-
[64]
Applied Numerical Mathematics , author =
An explicit finite element method for the wave equation , volume =. Applied Numerical Mathematics , author =. 1994 , pages =. doi:10.1016/0168-9274(94)00048-4 , abstract =
-
[65]
Journal of Computational Physics , author =. 1989 , pages =. doi:10.1016/0021-9991(89)90183-6 , language =
-
[66]
Cockburn, Bernardo and Shu, Chi-Wang , year =
-
[67]
Lowrie, R and Roe, N Leer, B, P , month = jun, year =. A space-time discontinuous. 12th. doi:10.2514/6.1995-1658 , language =
-
[68]
Journal of Computational Physics , author =
Efficient implementation of essentially non-oscillatory shock-capturing schemes , volume =. Journal of Computational Physics , author =. 1988 , pages =. doi:10.1016/0021-9991(88)90177-5 , language =
-
[69]
Cockburn, Bernardo and Hou, Suchung and Shu, Chi-Wang , year =. The
-
[70]
Journal of Computational Physics , author =
On positivity-preserving high order discontinuous. Journal of Computational Physics , author =. 2010 , pages =. doi:10.1016/j.jcp.2010.08.016 , abstract =
-
[71]
Journal of Computational Physics , author =
Entropy stable high order discontinuous. Journal of Computational Physics , author =. 2017 , pages =. doi:10.1016/j.jcp.2017.05.025 , language =
-
[72]
Advances in Water Resources , author =
Positivity-preserving high order well-balanced discontinuous. Advances in Water Resources , author =. 2010 , pages =. doi:10.1016/j.advwatres.2010.08.005 , abstract =
-
[73]
Computer Methods in Applied Mechanics and Engineering , author =
A space-time finite element method for the wave equation , volume =. Computer Methods in Applied Mechanics and Engineering , author =. 1993 , pages =. doi:10.1016/0045-7825(93)90172-T , language =
-
[74]
Journal of Computational Physics , author =
Space–. Journal of Computational Physics , author =. 2002 , pages =. doi:10.1016/S0021-9991(02)97185-8 , language =
-
[75]
Journal of Computational Physics , author =
A space–time discontinuous. Journal of Computational Physics , author =. 2013 , pages =. doi:10.1016/j.jcp.2012.08.052 , abstract =
-
[76]
Frontin, Cory V. and Walters, Gage S. and Witherden, Freddie D. and Lee, Carl W. and Williams, David M. and Darmofal, David L. , month = mar, year =. Foundations of space-time finite element methods: polytopes, interpolation, and integration , shorttitle =. doi:10.48550/arXiv.2012.08701 , abstract =
-
[77]
Computational Mechanics , author =
Space–time computations in practical engineering applications: a summary of the 25-year history , volume =. Computational Mechanics , author =. 2019 , pages =. doi:10.1007/s00466-018-1620-7 , language =
-
[78]
Kopriva, David A. , year =. Implementing. doi:10.1007/978-90-481-2261-5 , language =
-
[79]
Journal of Scientific Computing , author =
Entropy. Journal of Scientific Computing , author =. 2019 , pages =. doi:10.1007/s10915-019-00933-2 , abstract =
-
[80]
Journal of Computational Physics , author =
Approximate tensor-product preconditioners for very high order discontinuous. Journal of Computational Physics , author =. 2018 , pages =. doi:10.1016/j.jcp.2017.10.030 , language =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.