Exploiting repeated matrix block structures for more efficient CFD on modern supercomputers

\`Adel Alsalti-Baldellou; Assensi Oliva; F.Xavier Trias; Guillem Colomer; Josep Plana-Riu; Xavier \'Alvarez-Farr\'e

arxiv: 2508.06710 · v5 · submitted 2025-08-08 · ⚛️ physics.flu-dyn · physics.comp-ph

Exploiting repeated matrix block structures for more efficient CFD on modern supercomputers

Josep Plana-Riu , F.Xavier Trias , \`Adel Alsalti-Baldellou , Xavier \'Alvarez-Farr\'e , Guillem Colomer , Assensi Oliva This is my paper

Pith reviewed 2026-05-18 23:34 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn physics.comp-ph

keywords CFDSpMVSpMMmesh refinementarithmetic intensityHPCturbulent flowsupercomputers

0 comments

The pith

By grouping repeated matrix blocks, CFD codes can replace sparse matrix-vector multiplies with matrix-matrix multiplies to raise arithmetic intensity and cut run times.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

CFD simulations on supercomputers are often limited by slow memory access in sparse matrix operations. This work shows how repeated block structures common in CFD discretizations can be exploited to process multiple vectors simultaneously. The result is a shift to sparse matrix-matrix multiplication that reuses coefficients and improves compute efficiency. An inline mesh refinement that starts coarse and switches to fine further reduces the time needed to reach steady flow. Tests confirm speed improvements up to more than 50 percent in the combined approach.

Core claim

The paper claims that repeated matrix block structures allow the reformulation of sparse matrix-vector products into sparse matrix-matrix products for CFD operators. This enables the handling of several right-hand sides at once, increasing arithmetic intensity by reusing matrix data. The approach is complemented by an inline mesh-refinement method that establishes statistically steady flow on a coarse mesh before refining to the target resolution, thereby reducing overall computational time while maintaining solution quality.

What carries the argument

The reformulation of SpMV as SpMM through exploitation of identical repeated matrix blocks across multiple right-hand sides in CFD operators.

If this is right

SpMM can be applied to all major operators including divergence, gradient and Laplacian for broad efficiency gains.
Substantial speed-ups are achieved, exceeding 50% in setups that include the mesh-refinement strategy.
The coarse-mesh initialization reaches transition to steady state faster at equivalent cost.
Theoretical bounds and test cases validate the performance shift on modern HPC hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could apply to other codes with block-repetitive structures in their linear operators.
Dynamic adaptation of block grouping during simulation might further optimize performance.
Energy savings on large clusters may result from the reduced wall-clock times.
Integration with adaptive refinement techniques could extend benefits to unsteady problems.

Load-bearing premise

CFD discrete operators feature enough identical matrix blocks for the SpMM reformulation to produce noticeable arithmetic intensity gains.

What would settle it

A benchmark comparing wall-clock time and performance metrics such as flops per byte for a CFD simulation before and after applying the SpMM transformation on the same mesh and hardware.

Figures

Figures reproduced from arXiv: 2508.06710 by \`Adel Alsalti-Baldellou, Assensi Oliva, F.Xavier Trias, Guillem Colomer, Josep Plana-Riu, Xavier \'Alvarez-Farr\'e.

**Figure 1.** Figure 1: Simplified version of a roofline model in which the memory-bound (blue) and compute-bound (red) regions are depicted. The goal of the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Speed-up bounds associated to a SpMM with a sparse square matrix with nc = nr = 106 with 7 and 13 non-zeros per row for m = [1, 128] (left) and a zoom up to 32 rhs (right). for this operation itself is also provided. Namely, by definition, the speed-up in the Poisson equation solution given m RHS will be Pm,Poisson = mTPoisson,1 TPoisson,m , (4) where TPoisson,m indicates the wall-clock time of the solutio… view at source ↗

**Figure 3.** Figure 3: Proposed ensemble averaging strategy in which the case is run until [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: shows the behaviour of β/β ˜ as a function of γ and Π, which clearly shows that the presence of a refinement of the mesh after TD will ultimately increase the effective times ratio, which will imply bigger speed-ups in the whole simulation. 100 101 102 10 3 Π = δ−1 1 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 β/ ˜β γ =1 γ =5 γ =10 γ =20 [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Average velocity in wall units (left) and rms streamwise velocity (right) profiles for 1, 2, 4, and 8 rhs in a turbulent planar channel flow [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Speed-up values obtained in the numerical solution of the Poisson equation (left), a whole projection method iteration (center), and the [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Two-dimensional representations of the stencil for [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: shows that the behaviour for all three cases run, 7p, 13p, and 27p, lies between the expected bounds of speed-up, yet the case using only the first neighbours has the closer-to-optimal behaviour, compared to the 13p and 27p cases, which are still close to the upper-bound, yet further than the 7p case. With regards to the extension to the whole solution of the Poisson equation using the aAMG preconditioned … view at source ↗

**Figure 9.** Figure 9: Speed-up in the numerical solution of the Poisson equation in a turbulent planar channel flow of Re [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Performance evaluation of the SpMM operations within the CG+aAMG solver framework run in a single MN5-GPP node in the numerical simulation of a turbulent planar channel flow with Reτ = 180. 4.2. Rayleigh-Bénard convection On the other hand, to test the robustness of the method when adding additional transport equations such as the energy equation, the method has been tested in a Rayleigh-Bénard convection… view at source ↗

**Figure 11.** Figure 11: Speed-up in a whole iteration in a turbulent planar channel flow with Re [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Speed-ups for the numerical solution of the Poisson equation using a CG [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 13.** Figure 13: Speed-ups in the execution of the SpMM operation in the Rayleigh-Bénard convection, compared to the theoretical upper and lower bounds (left), and estimation of the simulation speed-up given γ = 4 and β = 8 for both methods, calculating Φ or setting Φ = 1 (right). 4.3. Industrial case: 30P30N The method has been tested as well in an unstructured mesh to validate the performance of the method in an industr… view at source ↗

**Figure 14.** Figure 14: Close-up view of the mesh used for the simulation of the 30P30N airfoil. [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗

**Figure 15.** Figure 15: Speed-up values obtained in the SpMM kernel (left), the numerical solution of the Poisson equation (center), and a whole projection method iteration (right), for the presented mesh and all simulated flow states. The first two figures show the speed-ups compared to the theoretical bounds (dashed lines). between efficiency and accuracy, which is eventually preserved, as the relevant part of the simulation, … view at source ↗

**Figure 16.** Figure 16: Data from the sum of the Top10 of the HPCG (left) and the performance-per-watt (right), and trendline for the sum of the Top10 values [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗

**Figure 17.** Figure 17: Evolution of wall-clock time (WCT) for the next decade assuming the hypotheses of ideal speed-up and constant cost and load, together [PITH_FULL_IMAGE:figures/full_fig_p022_17.png] view at source ↗

read the original abstract

Computational Fluid Dynamics (CFD) simulations are often constrained by the memory-bound nature of sparse matrix-vector operations, which eventually limits performance on modern high-performance computing (HPC) systems. This work introduces a novel approach to increase arithmetic intensity in CFD by leveraging repeated matrix block structures. The method transforms the conventional sparse matrix-vector product (SpMV) into a sparse matrix-matrix product (SpMM), enabling simultaneous processing of multiple right-hand sides. This shifts the computation towards a more compute-bound regime by reusing matrix coefficients. Additionally, an inline mesh-refinement strategy is proposed: simulations initially run on a coarse mesh to establish a statistically steady flow, then refine to the target mesh. This reduces the wall-clock time to reach transition, leading to faster convergence with equivalent computational cost. The methodology is evaluated using theoretical performance bounds and validated through three test cases: a turbulent channel flow, Rayleigh-B\'{e}nard convection, and an industrial airfoil simulation. Results demonstrate substantial speed-ups - from modest improvements in basic configurations to over 50% in the mesh-refinement setup - highlighting the benefits of integrating SpMM across all CFD operators, including divergence, gradient, and Laplacian.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows how to cut CFD wall-clock time by turning SpMV into SpMM across all operators and adding an inline coarse-to-fine mesh step, with reported gains up to 50 percent, but the statistical equivalence after refinement is not yet clearly demonstrated.

read the letter

The main things to know are that this work takes the established SpMM trick for multiple right-hand sides and applies it uniformly to every discrete operator in a CFD code, then combines it with an inline mesh-refinement workflow that runs coarse until the flow settles before switching to the target resolution. The reported speed-ups range from modest in straightforward cases to more than 50 percent when the refinement is used, backed by theoretical bounds on arithmetic intensity and timings from three test problems including a channel flow, Rayleigh-Bénard convection, and an airfoil case. That integrated treatment and the concrete numbers are the actual contribution. The paper does a reasonable job laying out why the change moves the kernels toward a more compute-bound regime and showing that the same idea works for divergence, gradient, and Laplacian operators rather than just one kernel. The test cases give a practical sense of the payoff on both academic and industrial configurations. The soft spot is the refinement validation. The abstract states that the coarse phase establishes a statistically steady state and the refined run keeps equivalent statistics at lower total cost, yet it does not describe direct comparisons of mean profiles, Reynolds stresses, or integral quantities against a pure fine-mesh reference, nor does it give error bounds or convergence checks on those quantities. If those checks are absent or weak in the full text, part of the wall-clock saving could come from a slightly different attractor rather than from the SpMM change alone; that is a moderate rather than fatal issue, but it needs to be addressed for the 50 percent claim to land cleanly. This is aimed at CFD practitioners who run memory-bound production codes on current HPC hardware and want incremental engineering improvements they can try in their own solvers. A reader maintaining or extending a finite-volume code would find the operator-by-operator breakdown useful. It deserves a serious referee because the performance numbers rest on direct timings rather than self-referential derivations, even if the refinement evidence needs tightening. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes transforming sparse matrix-vector products (SpMV) into sparse matrix-matrix products (SpMM) in CFD codes by exploiting repeated block structures in discrete operators (divergence, gradient, Laplacian), thereby increasing arithmetic intensity. It further introduces an inline coarse-to-fine mesh-refinement strategy that runs an initial coarse-mesh phase to reach statistically steady flow before refining, claiming this reduces wall-clock time to transition with equivalent cost. Theoretical performance bounds and three test cases (turbulent channel flow, Rayleigh-Bénard convection, industrial airfoil) are presented, with reported speed-ups ranging from modest gains to over 50% in the refinement configuration.

Significance. If the performance claims hold after verification, the work offers a practical route to higher arithmetic intensity in memory-bound CFD kernels on modern HPC systems without altering the underlying discretization. The combination of SpMM reformulation across multiple operators and the inline refinement strategy could shorten time-to-solution for statistically steady turbulent simulations, provided the refinement preserves the target statistics.

major comments (2)

[§4 and results] §4 (Mesh-refinement strategy) and results section: the headline claim of >50% wall-clock reduction in the refinement setup rests on the assumption that the coarse-to-fine transition reaches an equivalent statistically steady state. No quantitative comparison (mean profiles, Reynolds stresses, integral quantities, or convergence metrics) is shown between a pure fine-mesh run and the inline-refined run; without such checks or error bounds on transient bias, the reported speed-up cannot be unambiguously attributed to the SpMM reformulation or the cheaper coarse phase.
[§3.2] §3.2 (SpMM implementation across operators): the theoretical bounds assume that all target operators contain sufficiently many identical matrix blocks to amortize the SpMM overhead. The manuscript should quantify the fraction of blocks that are actually repeated in each test case and demonstrate that the measured speed-ups scale with this fraction rather than with other implementation details.

minor comments (2)

[Figures and §5] Figure captions and timing methodology: clarify whether the reported wall-clock times include or exclude the refinement overhead and whether statistical convergence criteria (e.g., running averages of drag or Nusselt number) are identical across all compared runs.
[§2] Notation: the definition of arithmetic intensity and the transition from SpMV to SpMM should be stated explicitly with the relevant matrix dimensions before the performance model is introduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and constructive feedback on our manuscript. We address each major comment in detail below and outline the revisions we will make to improve clarity and rigor.

read point-by-point responses

Referee: [§4 and results] §4 (Mesh-refinement strategy) and results section: the headline claim of >50% wall-clock reduction in the refinement setup rests on the assumption that the coarse-to-fine transition reaches an equivalent statistically steady state. No quantitative comparison (mean profiles, Reynolds stresses, integral quantities, or convergence metrics) is shown between a pure fine-mesh run and the inline-refined run; without such checks or error bounds on transient bias, the reported speed-up cannot be unambiguously attributed to the SpMM reformulation or the cheaper coarse phase.

Authors: We agree that quantitative verification of statistical equivalence is necessary to support the claimed wall-clock reductions. In the revised manuscript we will add direct comparisons of mean velocity profiles, Reynolds stresses, skin-friction coefficients (channel flow), and Nusselt numbers (Rayleigh-Bénard) between the pure fine-mesh runs and the corresponding coarse-to-fine simulations. Convergence metrics and error bounds on transient bias will also be included for the two canonical cases; for the industrial airfoil we will report lift and drag coefficients. These additions will allow unambiguous attribution of the observed speed-ups. revision: yes
Referee: [§3.2] §3.2 (SpMM implementation across operators): the theoretical bounds assume that all target operators contain sufficiently many identical matrix blocks to amortize the SpMM overhead. The manuscript should quantify the fraction of blocks that are actually repeated in each test case and demonstrate that the measured speed-ups scale with this fraction rather than with other implementation details.

Authors: We accept the need for explicit quantification. The revised manuscript will contain a new table reporting the fraction of identical blocks for the divergence, gradient, and Laplacian operators in each of the three test cases. We will also add an analysis showing how the measured speed-ups correlate with these fractions, either by comparing operators that exhibit different repetition levels or by controlled numerical experiments that vary the repetition fraction while holding other factors fixed. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical timings and algorithmic reformulation stand independently

full rationale

The paper introduces an SpMM reformulation for repeated matrix blocks in CFD operators and an inline coarse-to-fine mesh strategy, then reports wall-clock speed-ups measured directly on three concrete test cases (turbulent channel, Rayleigh-Bénard, airfoil). No derivation chain reduces a claimed result to a fitted parameter or self-citation by construction; performance bounds are theoretical and independent, while numerical gains come from external timing runs rather than any self-referential prediction. The central claims therefore remain falsifiable against standard SpMV baselines without internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard finite-volume or finite-element discretizations that produce repeated blocks and on the assumption that coarse-mesh statistics transfer to the fine mesh without bias; no new physical entities or ad-hoc constants are introduced in the abstract.

axioms (2)

domain assumption CFD discrete operators contain sufficient identical matrix blocks to make SpMM advantageous
Invoked when the paper states that repeated block structures can be exploited to convert SpMV into SpMM.
domain assumption Coarse-mesh statistically steady state provides a valid initial condition for fine-mesh continuation
Underlying the claim that inline refinement reduces time to transition without changing final statistics.

pith-pipeline@v0.9.0 · 5768 in / 1426 out tokens · 49286 ms · 2026-05-18T23:34:00.023731+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

transforms the conventional sparse matrix-vector product (SpMV) into a generalized sparse matrix-matrix product (SpMM), enabling simultaneous processing of multiple right-hand sides
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

repeated matrix block structures ... I_s ⊗ Ã

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Patterson

S. Williams, A. Waterman, D. Patterson, Roofline: An insightful visual performance model for multicore architectures, Communications of the ACM 52 (2009) 65–76. doi:10.1145/1498765.1498785

work page doi:10.1145/1498765.1498785 2009
[2]

J. L. Greathouse, M. Daga, Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format, in: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, volume 2015-January, IEEE Computer Society, 2014, pp. 769–780. doi:10.1109/SC.2014.68, issue: January ISSN: 21674337

work page doi:10.1109/sc.2014.68 2015
[3]

Makarashvili, E

V . Makarashvili, E. Merzari, A. Obabko, A. Siegel, P. Fischer, A performance analysis of ensemble averaging for high fidelity turbulence simulations at the strong scaling limit, Computer Physics Communications 219 (2017) 236–245. doi:10.1016/j.cpc.2017.05.023, publisher: Elsevier B.V

work page doi:10.1016/j.cpc.2017.05.023 2017
[4]

Nastac, J

G. Nastac, J. W. Labahn, L. Magri, M. Ihme, Lyapunov exponent as a metric for assessing the dynamic content and predictability of large-eddy simulations, Physical Review Fluids 2 (2017). doi:10.1103/PhysRevFluids.2.094606, publisher: American Physical Society

work page doi:10.1103/physrevfluids.2.094606 2017
[5]

R. Tosi, M. Núñez, J. Pons-Prats, J. Principe, R. Rossi, On the use of ensemble averaging techniques to accelerate the Uncer- tainty Quantification of CFD predictions in wind engineering, Journal of Wind Engineering and Industrial Aerodynamics 228 (2022). doi:10.1016/j.jweia.2022.105105, publisher: Elsevier B.V

work page doi:10.1016/j.jweia.2022.105105 2022
[6]

B. I. Krasnopolsky, An approach for accelerating incompressible turbulent flow simulations based on simultaneous modelling of multiple ensembles, Computer Physics Communications 229 (2018) 8–19. doi:10.1016/j.cpc.2018.03.023, publisher: Elsevier B.V

work page doi:10.1016/j.cpc.2018.03.023 2018
[7]

Alsalti-Baldellou, X

À. Alsalti-Baldellou, X. Álvarez-Farré, G. Colomer, A. Gorobets, C. D. Pérez-Segarra, A. Oliva, F. X. Trias, Lighter and faster simulations on domains with symmetries, Computers and Fluids 275 (2024). doi:10.1016/j.compfluid.2024.106247, publisher: Elsevier Ltd

work page doi:10.1016/j.compfluid.2024.106247 2024
[8]

Alsalti-Baldellou, C

À. Alsalti-Baldellou, C. Janna, X. Álvarez-Farré, F. X. Trias, A Multigrid Reduction Framework for Domains with Symme- tries, SIAM Journal on Scientific Computing 46 (2024) B860–B883. URL:https://epubs.siam.org/doi/10.1137/24M1638513. doi:10.1137/24M1638513

work page doi:10.1137/24m1638513 2024
[9]

Álvarez, A

X. Álvarez, A. Gorobets, F. X. Trias, R. Borrell, G. Oyarzun, HPC2—A fully-portable, algebra-based framework for heterogeneous comput- ing. Application to CFD, Computers and Fluids 173 (2018) 285–292. doi:10.1016/j.compfluid.2018.01.034, publisher: Elsevier Ltd

work page doi:10.1016/j.compfluid.2018.01.034 2018
[10]

B. I. Krasnopolsky, Optimal Strategy for Modelling Turbulent Flows with Ensemble Averaging on High Performance Computing Systems, Lobachevskii Journal of Mathematics 39 (2018) 533–542. doi:10.1134/S199508021804008X, publisher: Pleiades Publishing

work page doi:10.1134/s199508021804008x 2018
[11]

Alsalti-Baldellou, X

A. Alsalti-Baldellou, X. Álvarez-Farré, F. X. Trias, A. Oliva, Exploiting spatial symmetries for solving Poisson’s equation, Journal of Computational Physics 486 (2023). doi:10.1016/j.jcp.2023.112133, publisher: Academic Press Inc

work page doi:10.1016/j.jcp.2023.112133 2023
[12]

F. X. Trias, O. Lehmkuhl, A. Oliva, C. D. Pérez-Segarra, R. W. Verstappen, Symmetry-preserving discretization of Navier-Stokes equations on collocated unstructured grids, Journal of Computational Physics 258 (2014) 246–267. doi:10.1016/j.jcp.2013.10.031, publisher: Academic Press Inc

work page doi:10.1016/j.jcp.2013.10.031 2014
[13]

N. V . Nikitin, Disturbance growth rate in turbulent wall flows, Fluid Dynamics 44 (2009) 652–657. doi:10.1134/S0015462809050032

work page doi:10.1134/s0015462809050032 2009
[14]

Keefe, P

L. Keefe, P. Moin, J. Kim, The dimension of attractors underlying periodic turbulent Poiseuille flow, Journal of Fluid Mechanics 242 (1992) 1–29. doi:10.1017/S0022112092002258

work page doi:10.1017/s0022112092002258 1992
[15]

Badii, K

R. Badii, K. Heinzelmann, P. F. Meier, A. Politi, Correlation functions and generalized Lyapunov exponents, Physical Review A 37 (1988) 1323–1328. URL:https://link.aps.org/doi/10.1103/PhysRevA.37.1323. doi:10.1103/PhysRevA.37.1323

work page doi:10.1103/physreva.37.1323 1988
[16]

C. F. O. Mendes, R. M. Da Silva, M. W. Beims, Decay of the distance autocorrelation and Lyapunov exponents, Physical Review E 99 (2019) 062206. URL:https://link.aps.org/doi/10.1103/PhysRevE.99.062206. doi:10.1103/PhysRevE.99.062206

work page doi:10.1103/physreve.99.062206 2019
[17]

Cheskidov, C

A. Cheskidov, C. Foias, On global attractors of the 3D Navier–Stokes equations, Journal of Differential Equations 231 (2006) 714–754. URL:https://linkinghub.elsevier.com/retrieve/pii/S002203960600338X. doi:10.1016/j.jde.2006.08.021

work page doi:10.1016/j.jde.2006.08.021 2006
[18]

M. C. Bortolan, A. N. de Carvalho, P. Marín-Rubio, J. Valero, Weak global attractor for the 3D-Navier-Stokes equations via the globally mod- ified Navier-Stokes equations (2024). URL:http://arxiv.org/abs/2402.06435. doi:10.48550/arXiv.2402.06435, arXiv: 2402.06435

work page doi:10.48550/arxiv.2402.06435 2024
[19]

R. W. C. P. Verstappen, A. E. P. Veldman, Symmetry-preserving discretization of turbulent flow, Journal of Computational Physics 187 (2003) 343–368. doi:10.1016/S0021-9991(03)00126-8, publisher: Academic Press Inc

work page doi:10.1016/s0021-9991(03)00126-8 2003
[20]

Sanderse, B

B. Sanderse, B. Koren, Accuracy analysis of explicit Runge-Kutta methods applied to the incompressible Navier-Stokes equations, Journal of Computational Physics 231 (2012) 3041–3063. doi:10.1016/j.jcp.2011.11.028, publisher: Academic Press Inc

work page doi:10.1016/j.jcp.2011.11.028 2012
[21]

Capuano, G

F. Capuano, G. Coppola, L. Rández, L. de Luca, Explicit Runge–Kutta schemes for incompressible flow with improved energy-conservation properties, Journal of Computational Physics 328 (2017) 86–94. doi:10.1016/j.jcp.2016.10.040, publisher: Academic Press Inc

work page doi:10.1016/j.jcp.2016.10.040 2017
[22]

F. X. Trias, X. Álvarez-Farré, A. Alsalti-Baldellou, A. Gorobets, A. Oliva, An efficient eigenvalue bounding method: CFL condition revisited, Computer Physics Communications 305 (2024). doi:10.1016/j.cpc.2024.109351, publisher: Elsevier B.V

work page doi:10.1016/j.cpc.2024.109351 2024
[23]

R. D. Falgout, J. E. Jones, U. M. Yang, The Design and Implementation of hypre, a Library of Parallel High Performance Preconditioners, in: A. Bruaset, A. Tveito (Eds.), Numerical Solution of Partial Differential Equations on Parallel Computers. Lecture Notes in Computational Science and Engineering, volume 51, Springer, Berlin, Heidelberg, 2006, pp. 267–294

work page 2006
[24]

Isotton, M

G. Isotton, M. Frigo, N. Spiezia, C. Janna, Chronos: A General Purpose Classical AMG Solver for High Performance Computing, SIAM Journal on Scientific Computing 43 (2021) C335–C357. doi:10.1137/21M1398586. 24

work page doi:10.1137/21m1398586 2021
[25]

V . A. Paludetto Magri, A. Franceschini, C. Janna, A Novel Algebraic Multigrid Approach Based on Adaptive Smooth- ing and Prolongation for Ill-Conditioned Systems, SIAM Journal on Scientific Computing 41 (2019) A190–A219. URL: https://epubs.siam.org/doi/10.1137/17M1161178. doi:10.1137/17M1161178

work page doi:10.1137/17m1161178 2019
[26]

De Sterck, U

H. De Sterck, U. M. Yang, J. J. Heys, Reducing Complexity in Parallel Algebraic Multigrid Preconditioners, SIAM Journal on Matrix Analysis and Applications 27 (2006) 1019–1039. URL:http://epubs.siam.org/doi/10.1137/040615729. doi:10.1137/040615729

work page doi:10.1137/040615729 2006
[27]

De Sterck, R

H. De Sterck, R. D. Falgout, J. W. Nolting, U. M. Yang, Distance-two interpolation for parallel algebraic multigrid, Numerical Linear Algebra with Applications 15 (2008) 115–139. URL:https://onlinelibrary.wiley.com/doi/10.1002/nla.559. doi:10.1002/nla.559

work page doi:10.1002/nla.559 2008
[28]

F. X. Trias, A. Gorobets, M. Soria, A. Oliva, Direct numerical simulation of a differentially heated cavity of aspect ratio 4 with Rayleigh numbers up to 1011 - Part I: Numerical methods and time-averaged flow, International Journal of Heat and Mass Transfer 53 (2010) 665–673. doi:10.1016/j.ijheatmasstransfer.2009.10.026

work page doi:10.1016/j.ijheatmasstransfer.2009.10.026 2010
[29]

Mosqueda-Otero, À

M. Mosqueda-Otero, À. Alsalti-Baldellou, X. Álvarez-Farré, J. Plana-Riu, G. Colomer, F. X. Trias, A. Oliva, A Portable Algebraic Im- plementation for Reliable Overnight Industrial LES, Proceedings of the 35th Parallel CFD International Conference 2024 35th Parallel CFD International Conference 2024 ParCFD 2024 (2025) pages 119 – 126. URL:https://juser.fz-...

work page doi:10.34734/fzj-2025-02469 2024
[30]

A. D. Demou, D. G. E. Grigoriadis, Direct numerical simulations of Rayleigh–Bénard convection in water with non-Oberbeck–Boussinesq effects, Journal of Fluid Mechanics 881 (2019) 1073–1096. doi:10.1017/jfm.2019.787

work page doi:10.1017/jfm.2019.787 2019
[31]

P. Moin, K. Mahesh, DIRECT NUMERICAL SIMULATION: A Tool in Turbulence Research, Annual Review of Fluid Mechanics 30 (1998) 539–578. URL:https://www.annualreviews.org/doi/10.1146/annurev.fluid.30.1.539. doi:10.1146/annurev.fluid.30.1.539

work page doi:10.1146/annurev.fluid.30.1.539 1998
[32]

Kolmogorov, The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers, Doklady Akademii Nauk SSSR 30 (1941) 9–13

A. Kolmogorov, The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers, Doklady Akademii Nauk SSSR 30 (1941) 9–13. doi:10.1098/rspa.1991.0075

work page doi:10.1098/rspa.1991.0075 1941
[33]

Kolmogorov, On degeneration (decay) of isotropic turbulence in an incompressible viscous liquid, Doklady Akademii Nauk SSSR 31 (1941) 538–540

A. Kolmogorov, On degeneration (decay) of isotropic turbulence in an incompressible viscous liquid, Doklady Akademii Nauk SSSR 31 (1941) 538–540

work page 1941
[34]

Courant, K

R. Courant, K. Friedrichs, H. Lewy, Über die partiellen Differenzengleichungen der mathematischen Physik, Mathematische Annalen 100 (1928) 32–74. doi:10.1007/BF01448839

work page doi:10.1007/bf01448839 1928
[35]

Dongarra, M

J. Dongarra, M. A. Heroux, P. Luszczek, High-performance conjugate-gradient benchmark: A new metric for ranking high- performance computing systems, The International Journal of High Performance Computing Applications 30 (2016) 3–10. URL: https://journals.sagepub.com/doi/10.1177/1094342015593158. doi:10.1177/1094342015593158. 25

work page doi:10.1177/1094342015593158 2016

[1] [1]

Patterson

S. Williams, A. Waterman, D. Patterson, Roofline: An insightful visual performance model for multicore architectures, Communications of the ACM 52 (2009) 65–76. doi:10.1145/1498765.1498785

work page doi:10.1145/1498765.1498785 2009

[2] [2]

J. L. Greathouse, M. Daga, Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format, in: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, volume 2015-January, IEEE Computer Society, 2014, pp. 769–780. doi:10.1109/SC.2014.68, issue: January ISSN: 21674337

work page doi:10.1109/sc.2014.68 2015

[3] [3]

Makarashvili, E

V . Makarashvili, E. Merzari, A. Obabko, A. Siegel, P. Fischer, A performance analysis of ensemble averaging for high fidelity turbulence simulations at the strong scaling limit, Computer Physics Communications 219 (2017) 236–245. doi:10.1016/j.cpc.2017.05.023, publisher: Elsevier B.V

work page doi:10.1016/j.cpc.2017.05.023 2017

[4] [4]

Nastac, J

G. Nastac, J. W. Labahn, L. Magri, M. Ihme, Lyapunov exponent as a metric for assessing the dynamic content and predictability of large-eddy simulations, Physical Review Fluids 2 (2017). doi:10.1103/PhysRevFluids.2.094606, publisher: American Physical Society

work page doi:10.1103/physrevfluids.2.094606 2017

[5] [5]

R. Tosi, M. Núñez, J. Pons-Prats, J. Principe, R. Rossi, On the use of ensemble averaging techniques to accelerate the Uncer- tainty Quantification of CFD predictions in wind engineering, Journal of Wind Engineering and Industrial Aerodynamics 228 (2022). doi:10.1016/j.jweia.2022.105105, publisher: Elsevier B.V

work page doi:10.1016/j.jweia.2022.105105 2022

[6] [6]

B. I. Krasnopolsky, An approach for accelerating incompressible turbulent flow simulations based on simultaneous modelling of multiple ensembles, Computer Physics Communications 229 (2018) 8–19. doi:10.1016/j.cpc.2018.03.023, publisher: Elsevier B.V

work page doi:10.1016/j.cpc.2018.03.023 2018

[7] [7]

Alsalti-Baldellou, X

À. Alsalti-Baldellou, X. Álvarez-Farré, G. Colomer, A. Gorobets, C. D. Pérez-Segarra, A. Oliva, F. X. Trias, Lighter and faster simulations on domains with symmetries, Computers and Fluids 275 (2024). doi:10.1016/j.compfluid.2024.106247, publisher: Elsevier Ltd

work page doi:10.1016/j.compfluid.2024.106247 2024

[8] [8]

Alsalti-Baldellou, C

À. Alsalti-Baldellou, C. Janna, X. Álvarez-Farré, F. X. Trias, A Multigrid Reduction Framework for Domains with Symme- tries, SIAM Journal on Scientific Computing 46 (2024) B860–B883. URL:https://epubs.siam.org/doi/10.1137/24M1638513. doi:10.1137/24M1638513

work page doi:10.1137/24m1638513 2024

[9] [9]

Álvarez, A

X. Álvarez, A. Gorobets, F. X. Trias, R. Borrell, G. Oyarzun, HPC2—A fully-portable, algebra-based framework for heterogeneous comput- ing. Application to CFD, Computers and Fluids 173 (2018) 285–292. doi:10.1016/j.compfluid.2018.01.034, publisher: Elsevier Ltd

work page doi:10.1016/j.compfluid.2018.01.034 2018

[10] [10]

B. I. Krasnopolsky, Optimal Strategy for Modelling Turbulent Flows with Ensemble Averaging on High Performance Computing Systems, Lobachevskii Journal of Mathematics 39 (2018) 533–542. doi:10.1134/S199508021804008X, publisher: Pleiades Publishing

work page doi:10.1134/s199508021804008x 2018

[11] [11]

Alsalti-Baldellou, X

A. Alsalti-Baldellou, X. Álvarez-Farré, F. X. Trias, A. Oliva, Exploiting spatial symmetries for solving Poisson’s equation, Journal of Computational Physics 486 (2023). doi:10.1016/j.jcp.2023.112133, publisher: Academic Press Inc

work page doi:10.1016/j.jcp.2023.112133 2023

[12] [12]

F. X. Trias, O. Lehmkuhl, A. Oliva, C. D. Pérez-Segarra, R. W. Verstappen, Symmetry-preserving discretization of Navier-Stokes equations on collocated unstructured grids, Journal of Computational Physics 258 (2014) 246–267. doi:10.1016/j.jcp.2013.10.031, publisher: Academic Press Inc

work page doi:10.1016/j.jcp.2013.10.031 2014

[13] [13]

N. V . Nikitin, Disturbance growth rate in turbulent wall flows, Fluid Dynamics 44 (2009) 652–657. doi:10.1134/S0015462809050032

work page doi:10.1134/s0015462809050032 2009

[14] [14]

Keefe, P

L. Keefe, P. Moin, J. Kim, The dimension of attractors underlying periodic turbulent Poiseuille flow, Journal of Fluid Mechanics 242 (1992) 1–29. doi:10.1017/S0022112092002258

work page doi:10.1017/s0022112092002258 1992

[15] [15]

Badii, K

R. Badii, K. Heinzelmann, P. F. Meier, A. Politi, Correlation functions and generalized Lyapunov exponents, Physical Review A 37 (1988) 1323–1328. URL:https://link.aps.org/doi/10.1103/PhysRevA.37.1323. doi:10.1103/PhysRevA.37.1323

work page doi:10.1103/physreva.37.1323 1988

[16] [16]

C. F. O. Mendes, R. M. Da Silva, M. W. Beims, Decay of the distance autocorrelation and Lyapunov exponents, Physical Review E 99 (2019) 062206. URL:https://link.aps.org/doi/10.1103/PhysRevE.99.062206. doi:10.1103/PhysRevE.99.062206

work page doi:10.1103/physreve.99.062206 2019

[17] [17]

Cheskidov, C

A. Cheskidov, C. Foias, On global attractors of the 3D Navier–Stokes equations, Journal of Differential Equations 231 (2006) 714–754. URL:https://linkinghub.elsevier.com/retrieve/pii/S002203960600338X. doi:10.1016/j.jde.2006.08.021

work page doi:10.1016/j.jde.2006.08.021 2006

[18] [18]

M. C. Bortolan, A. N. de Carvalho, P. Marín-Rubio, J. Valero, Weak global attractor for the 3D-Navier-Stokes equations via the globally mod- ified Navier-Stokes equations (2024). URL:http://arxiv.org/abs/2402.06435. doi:10.48550/arXiv.2402.06435, arXiv: 2402.06435

work page doi:10.48550/arxiv.2402.06435 2024

[19] [19]

R. W. C. P. Verstappen, A. E. P. Veldman, Symmetry-preserving discretization of turbulent flow, Journal of Computational Physics 187 (2003) 343–368. doi:10.1016/S0021-9991(03)00126-8, publisher: Academic Press Inc

work page doi:10.1016/s0021-9991(03)00126-8 2003

[20] [20]

Sanderse, B

B. Sanderse, B. Koren, Accuracy analysis of explicit Runge-Kutta methods applied to the incompressible Navier-Stokes equations, Journal of Computational Physics 231 (2012) 3041–3063. doi:10.1016/j.jcp.2011.11.028, publisher: Academic Press Inc

work page doi:10.1016/j.jcp.2011.11.028 2012

[21] [21]

Capuano, G

F. Capuano, G. Coppola, L. Rández, L. de Luca, Explicit Runge–Kutta schemes for incompressible flow with improved energy-conservation properties, Journal of Computational Physics 328 (2017) 86–94. doi:10.1016/j.jcp.2016.10.040, publisher: Academic Press Inc

work page doi:10.1016/j.jcp.2016.10.040 2017

[22] [22]

F. X. Trias, X. Álvarez-Farré, A. Alsalti-Baldellou, A. Gorobets, A. Oliva, An efficient eigenvalue bounding method: CFL condition revisited, Computer Physics Communications 305 (2024). doi:10.1016/j.cpc.2024.109351, publisher: Elsevier B.V

work page doi:10.1016/j.cpc.2024.109351 2024

[23] [23]

R. D. Falgout, J. E. Jones, U. M. Yang, The Design and Implementation of hypre, a Library of Parallel High Performance Preconditioners, in: A. Bruaset, A. Tveito (Eds.), Numerical Solution of Partial Differential Equations on Parallel Computers. Lecture Notes in Computational Science and Engineering, volume 51, Springer, Berlin, Heidelberg, 2006, pp. 267–294

work page 2006

[24] [24]

Isotton, M

G. Isotton, M. Frigo, N. Spiezia, C. Janna, Chronos: A General Purpose Classical AMG Solver for High Performance Computing, SIAM Journal on Scientific Computing 43 (2021) C335–C357. doi:10.1137/21M1398586. 24

work page doi:10.1137/21m1398586 2021

[25] [25]

V . A. Paludetto Magri, A. Franceschini, C. Janna, A Novel Algebraic Multigrid Approach Based on Adaptive Smooth- ing and Prolongation for Ill-Conditioned Systems, SIAM Journal on Scientific Computing 41 (2019) A190–A219. URL: https://epubs.siam.org/doi/10.1137/17M1161178. doi:10.1137/17M1161178

work page doi:10.1137/17m1161178 2019

[26] [26]

De Sterck, U

H. De Sterck, U. M. Yang, J. J. Heys, Reducing Complexity in Parallel Algebraic Multigrid Preconditioners, SIAM Journal on Matrix Analysis and Applications 27 (2006) 1019–1039. URL:http://epubs.siam.org/doi/10.1137/040615729. doi:10.1137/040615729

work page doi:10.1137/040615729 2006

[27] [27]

De Sterck, R

H. De Sterck, R. D. Falgout, J. W. Nolting, U. M. Yang, Distance-two interpolation for parallel algebraic multigrid, Numerical Linear Algebra with Applications 15 (2008) 115–139. URL:https://onlinelibrary.wiley.com/doi/10.1002/nla.559. doi:10.1002/nla.559

work page doi:10.1002/nla.559 2008

[28] [28]

F. X. Trias, A. Gorobets, M. Soria, A. Oliva, Direct numerical simulation of a differentially heated cavity of aspect ratio 4 with Rayleigh numbers up to 1011 - Part I: Numerical methods and time-averaged flow, International Journal of Heat and Mass Transfer 53 (2010) 665–673. doi:10.1016/j.ijheatmasstransfer.2009.10.026

work page doi:10.1016/j.ijheatmasstransfer.2009.10.026 2010

[29] [29]

Mosqueda-Otero, À

M. Mosqueda-Otero, À. Alsalti-Baldellou, X. Álvarez-Farré, J. Plana-Riu, G. Colomer, F. X. Trias, A. Oliva, A Portable Algebraic Im- plementation for Reliable Overnight Industrial LES, Proceedings of the 35th Parallel CFD International Conference 2024 35th Parallel CFD International Conference 2024 ParCFD 2024 (2025) pages 119 – 126. URL:https://juser.fz-...

work page doi:10.34734/fzj-2025-02469 2024

[30] [30]

A. D. Demou, D. G. E. Grigoriadis, Direct numerical simulations of Rayleigh–Bénard convection in water with non-Oberbeck–Boussinesq effects, Journal of Fluid Mechanics 881 (2019) 1073–1096. doi:10.1017/jfm.2019.787

work page doi:10.1017/jfm.2019.787 2019

[31] [31]

P. Moin, K. Mahesh, DIRECT NUMERICAL SIMULATION: A Tool in Turbulence Research, Annual Review of Fluid Mechanics 30 (1998) 539–578. URL:https://www.annualreviews.org/doi/10.1146/annurev.fluid.30.1.539. doi:10.1146/annurev.fluid.30.1.539

work page doi:10.1146/annurev.fluid.30.1.539 1998

[32] [32]

Kolmogorov, The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers, Doklady Akademii Nauk SSSR 30 (1941) 9–13

A. Kolmogorov, The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers, Doklady Akademii Nauk SSSR 30 (1941) 9–13. doi:10.1098/rspa.1991.0075

work page doi:10.1098/rspa.1991.0075 1941

[33] [33]

Kolmogorov, On degeneration (decay) of isotropic turbulence in an incompressible viscous liquid, Doklady Akademii Nauk SSSR 31 (1941) 538–540

A. Kolmogorov, On degeneration (decay) of isotropic turbulence in an incompressible viscous liquid, Doklady Akademii Nauk SSSR 31 (1941) 538–540

work page 1941

[34] [34]

Courant, K

R. Courant, K. Friedrichs, H. Lewy, Über die partiellen Differenzengleichungen der mathematischen Physik, Mathematische Annalen 100 (1928) 32–74. doi:10.1007/BF01448839

work page doi:10.1007/bf01448839 1928

[35] [35]

Dongarra, M

J. Dongarra, M. A. Heroux, P. Luszczek, High-performance conjugate-gradient benchmark: A new metric for ranking high- performance computing systems, The International Journal of High Performance Computing Applications 30 (2016) 3–10. URL: https://journals.sagepub.com/doi/10.1177/1094342015593158. doi:10.1177/1094342015593158. 25

work page doi:10.1177/1094342015593158 2016