pith. sign in

arxiv: 2606.25148 · v1 · pith:3LJ3ST4Mnew · submitted 2026-06-23 · 🧮 math.NA · cs.CE· cs.NA· physics.comp-ph· physics.flu-dyn

No 3D Matrices: A Unified Tensor-Product View of Matrix-Free Cartesian PDE Solvers

Pith reviewed 2026-06-25 22:33 UTC · model grok-4.3

classification 🧮 math.NA cs.CEcs.NAphysics.comp-phphysics.flu-dyn
keywords Cartesian gridsmatrix-free methodsKronecker productstensor-product discretizationsADI time steppingPoisson solverssum factorizationPDE solvers
0
0 comments X

The pith

Cartesian three-dimensional PDE operators factor exactly into one-dimensional line solves along each axis and are never assembled or stored.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that on Cartesian product grids every standard discretization and solver for three-dimensional PDEs reduces to repeated applications of one-dimensional operators along the coordinate lines. This factorization, expressed through Kronecker products, applies to finite differences, compact schemes, Galerkin methods, B-splines, collocation, ADI time stepping, and direct Poisson and Helmholtz solvers. Production codes have exploited this for decades to achieve linear scaling in the number of unknowns and minimal storage. A sympathetic reader would care because it explains how high-dimensional simulations remain computationally tractable without ever forming massive three-dimensional matrices.

Core claim

Every Cartesian three-dimensional PDE solver hides a structural secret: the derivative matrices, compact Padé line solves, Galerkin mass inversions, alternating-direction-implicit substeps, and fast Poisson and Helmholtz diagonalization transforms all factor along the coordinate axes and collapse into repeated one-dimensional banded kernels executed along the grid lines. The three-dimensional operator exists only on paper; it is never assembled, factored, or stored.

What carries the argument

The Kronecker-product algebra that makes the exact factorization of multi-dimensional operators into independent one-dimensional factors along each coordinate axis.

If this is right

  • For fixed stencil width or polynomial degree, compute cost stays O(N) in the total number of unknowns N = Nx Ny Nz.
  • Operator storage drops to O(Nx + Ny + Nz) up to bandwidth constants.
  • Direct separable Poisson and Helmholtz solvers add only the expected transform cost.
  • The line kernels are embarrassingly parallel.
  • The multi-right-hand-side reshape exposes sweeps as batched line kernels, and sum factorization keeps high-order Galerkin costs linear.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same algebraic reduction supplies a single organizing principle that explains why several otherwise unrelated numerical techniques all achieve the same scaling.
  • Implementations can therefore allocate all optimization effort to one-dimensional kernels and their batched execution rather than to three-dimensional data layouts.
  • The approach suggests a template for constructing new separable discretizations by first designing the one-dimensional factors and then lifting them via Kronecker products.

Load-bearing premise

The PDE discretization must be posed on a Cartesian product grid so that every operator factors exactly into independent one-dimensional factors along each coordinate axis.

What would settle it

Demonstrating a standard discretization of a Cartesian PDE whose three-dimensional operator cannot be expressed as a product of one-dimensional operators along the axes, or measuring a computational cost that scales worse than linear in the total number of grid points.

Figures

Figures reproduced from arXiv: 2606.25148 by Kathleen A. Yearick, Yong Yi Bay.

Figure 1
Figure 1. Figure 1: The whole paper in one picture. A Cartesian three-dimensional operator is executed [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Tensor-product line-sweep template. Every 3D operation is expressed as a loop of [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Centered finite-difference maps in 1D. The first- and second-derivative operators [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Compact first-derivative maps in factorized form. The left matrix [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: One-dimensional Galerkin spline maps. The mass matrix [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: One-dimensional B-spline collocation maps. The nodal maps [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The ADI cascade. A single implicit step on the 3D heat equation is replaced by [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Typical one-dimensional line factors for implicit time stepping. Whether the [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Fast diagonalization pipeline. The transform to the tensor-product eigenbasis is [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The multi-RHS reshape. A three-dimensional [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Sum factorization. The O(n 2d ) nested quadrature sum of a spectral-element right￾hand side, with n = p + 1 points per direction, becomes three sequential contractions, each O(n d+1) in 3D. For p = 16 (n = 17), the leading count is about 24 million operations naively versus about 250 thousand after three contractions. The same dimension-by-dimension contraction that runs every sweep in this paper runs the… view at source ↗
Figure 12
Figure 12. Figure 12: Pencil decomposition. A Pα × Pβ process grid owns the two perpendicular directions; each rank holds a complete “pencil” of the third direction. Tiles in distinct colors represent distinct MPI ranks. Between sweeps, a collective all-to-all transpose rotates the pencil orientation so that the next direction about to be swept is the local one. The 1D kernel inside a rank never changes; only which direction c… view at source ↗
Figure 13
Figure 13. Figure 13: The same data as Table 2, on log–log axes. Never forming the three-dimensional [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗
read the original abstract

Every Cartesian three-dimensional PDE solver hides a structural secret that production CFD codes have used for half a century and that graduate-level textbooks rarely state plainly. The derivative matrices, the compact Pad\'e line solves, the Galerkin mass inversions, the alternating-direction-implicit substeps, and even the fast Poisson and Helmholtz diagonalization transforms all factor along the coordinate axes and collapse into repeated one-dimensional banded kernels executed along the grid lines. The three-dimensional operator exists only on paper; it is never assembled, factored, or stored. This paper is the manual for that collapse. We derive the Kronecker-product algebra that makes it exact, carry it cleanly through central differences, compact schemes, tensor-product Galerkin, B-spline and isogeometric methods, collocation, ADI time stepping, and direct Poisson and Helmholtz solves, and bring into the open the three production tricks that turn the reduction into hardware-conscious floating-point throughput on real machines: the multi-right-hand-side reshape that exposes a sweep as one batched line kernel (a dense BLAS-3 GEMM when the line factor is dense or element-local, a banded or stencil kernel when it is not), the sum factorization that rescues high-order Galerkin from the $O(p^{2d})$ quadrature trap, and the pencil decomposition that keeps every direction contiguous across an MPI cluster. For fixed stencil width or fixed polynomial degree, the compute cost stays $O(N)$ in the total number of unknowns $N = N_x N_y N_z$; the operator storage drops to $O(N_x + N_y + N_z)$ up to bandwidth constants; direct separable Poisson and Helmholtz solvers add the expected transform cost; the line kernels are embarrassingly parallel. These facts are familiar to practitioners but rarely assembled in one place; this paper collects them and shows how to use them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript claims that every three-dimensional Cartesian PDE solver on a product grid factors exactly via Kronecker products into repeated one-dimensional banded or stencil kernels executed along coordinate lines, so that the full 3D operator is never assembled, factored, or stored. It derives the relevant algebra for central differences, compact Padé schemes, tensor-product Galerkin and isogeometric methods, collocation, ADI time stepping, and separable Poisson/Helmholtz solvers, then details three implementation devices (multi-RHS reshape to batched line kernels, sum factorization, and pencil decomposition) that realize O(N) work and O(Nx+Ny+Nz) storage with embarrassingly parallel line sweeps.

Significance. If the algebraic reductions hold, the paper supplies a single, explicit reference that assembles a body of established but scattered practitioner knowledge. Its value lies in the clean Kronecker derivations, the explicit treatment of the three performance tricks, and the consistent emphasis on matrix-free execution; these elements can shorten the path from textbook discretization to hardware-efficient code. The work is not presented as containing new theorems but as a unifying manual, which is a legitimate and useful contribution when executed with the care shown in the abstract.

minor comments (2)
  1. [Abstract] Abstract, final paragraph: the sentence listing the three production tricks would be clearer if the tricks were enumerated (1. multi-RHS reshape, 2. sum factorization, 3. pencil decomposition) rather than described in a single clause.
  2. [Introduction] The manuscript should include a short table or diagram in the introduction that maps each listed method (central differences, compact schemes, ADI, etc.) to the corresponding Kronecker identity or section number where its factorization is derived.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation to accept the manuscript. The referee's summary correctly identifies the paper's purpose as a unifying reference that assembles the Kronecker-product reductions and implementation devices for matrix-free Cartesian solvers.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper presents a unified view of matrix-free Cartesian PDE solvers by invoking the standard Kronecker-product factorization of separable operators on product grids. This is an external algebraic fact (not derived or fitted within the paper) that applies by definition to the listed methods (central differences, compact schemes, tensor-product Galerkin, ADI, separable Poisson solvers). The work explicitly frames itself as collecting established techniques rather than proving new results or introducing self-referential definitions, fitted parameters called predictions, or load-bearing self-citations. No equations or steps reduce the central claim to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard algebraic property that Kronecker products allow exact factorization of tensor-product operators on Cartesian grids; no free parameters, ad-hoc axioms, or new entities are introduced in the abstract.

axioms (1)
  • standard math Kronecker product algebra permits exact factorization of separable multidimensional operators into repeated one-dimensional kernels
    Invoked throughout the abstract as the mechanism that collapses every listed 3D operator.

pith-pipeline@v0.9.1-grok · 5889 in / 1241 out tokens · 22092 ms · 2026-06-25T22:33:34.490283+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 2 canonical work pages

  1. [1]

    Journal of Computational Physics , volume=

    Compact finite difference schemes with spectral-like resolution , author=. Journal of Computational Physics , volume=. 1992 , publisher=

  2. [2]

    Journal of the Society for Industrial and Applied Mathematics , volume=

    The numerical solution of parabolic and elliptic differential equations , author=. Journal of the Society for Industrial and Applied Mathematics , volume=. 1955 , publisher=

  3. [3]

    Numerische Mathematik , volume=

    Alternating direction methods for three space variables , author=. Numerische Mathematik , volume=. 1962 , publisher=

  4. [4]

    Numerische Mathematik , volume=

    A general formulation of alternating direction methods , author=. Numerische Mathematik , volume=. 1964 , publisher=

  5. [5]

    Numerische Mathematik , volume=

    Direct solution of partial difference equations by tensor product methods , author=. Numerische Mathematik , volume=. 1964 , publisher=

  6. [6]

    The accurate solution of

    Haidvogel, Dale B and Zang, Thomas , journal=. The accurate solution of. 1979 , publisher=

  7. [7]

    2013 , publisher=

    Matrix Computations , author=. 2013 , publisher=

  8. [8]

    The ubiquitous

    Van Loan, Charles F , journal=. The ubiquitous. 2000 , publisher=

  9. [9]

    2007 , publisher=

    Computational Science and Engineering , author=. 2007 , publisher=

  10. [10]

    2009 , publisher=

    Tensor decompositions and applications , author=. 2009 , publisher=

  11. [11]

    Application of a fractional-step method to incompressible

    Kim, John and Moin, Parviz , journal=. Application of a fractional-step method to incompressible. 1985 , publisher=

  12. [12]

    Journal of Computational Physics , volume=

    An implicit finite-difference algorithm for hyperbolic systems in conservation-law form , author=. Journal of Computational Physics , volume=. 1976 , publisher=

  13. [13]

    Mathematics of Computation , volume=

    Generation of finite difference formulas on arbitrarily spaced grids , author=. Mathematics of Computation , volume=. 1988 , publisher=

  14. [14]

    2007 , publisher=

    Finite Difference Methods for Ordinary and Partial Differential Equations , author=. 2007 , publisher=

  15. [15]

    2007 , publisher=

    Numerical Computation of Internal and External Flows , author=. 2007 , publisher=

  16. [16]

    Watson Scientific Computing Laboratory Report , year=

    Elliptic problems in linear difference equations over a network , author=. Watson Scientific Computing Laboratory Report , year=

  17. [17]

    2009 , publisher=

    A First Course in the Numerical Analysis of Differential Equations , author=. 2009 , publisher=

  18. [18]

    Costa, Pedro , journal=. A. 2018 , publisher=

  19. [19]

    2001 , publisher=

    Bewley, Thomas R and Moin, Parviz and Temam, Roger , journal=. 2001 , publisher=

  20. [20]

    1978 , publisher=

    A Practical Guide to Splines , author=. 1978 , publisher=

  21. [21]

    An Energy-Conservative Cut-Cell Method and Advanced

    Bay, Yong Yi , school=. An Energy-Conservative Cut-Cell Method and Advanced. 2019 , note=

  22. [22]

    Isogeometric analysis:

    Hughes, Thomas JR and Cottrell, John A and Bazilevs, Yuri , journal=. Isogeometric analysis:. 2005 , publisher=

  23. [23]

    Isogeometric Analysis: Toward Integration of

    Cottrell, J Austin and Hughes, Thomas JR and Bazilevs, Yuri , year=. Isogeometric Analysis: Toward Integration of

  24. [24]

    2002 , publisher=

    High-Order Methods for Incompressible Fluid Flow , author=. 2002 , publisher=

  25. [25]

    Higher order

    Johnson, Richard W , journal=. Higher order. 2005 , publisher=

  26. [26]

    Li, Ning and Laizet, Sylvain , booktitle=

  27. [27]

    Rolfo, Stefano and Flageul, C. The. Journal of Open Source Software , volume=. 2023 , doi=

  28. [28]

    , journal=

    Frigo, Matteo and Johnson, Steven G. , journal=. The Design and Implementation of. 2005 , doi=

  29. [30]

    , journal=

    Bay, Yong Yi and Yearick, Kathleen A. , journal=. Solve for the Hyperparameter, Skip the Search:. 2026 , eprint=

  30. [31]

    2DECOMP&FFT : A highly scalable 2d decomposition library and FFT interface

    Ning Li and Sylvain Laizet. 2DECOMP&FFT : A highly scalable 2d decomposition library and FFT interface. In Cray User Group 2010 Conference, Edinburgh, United Kingdom, 2010

  31. [32]

    A Practical Guide to Splines

    Carl de Boor. A Practical Guide to Splines. Springer, 1978

  32. [33]

    Application of a fractional-step method to incompressible N avier-- S tokes equations

    John Kim and Parviz Moin. Application of a fractional-step method to incompressible N avier-- S tokes equations. Journal of Computational Physics, 59 0 (2): 0 308--323, 1985

  33. [34]

    An Energy-Conservative Cut-Cell Method and Advanced B -Spline-Based Filtering Method for Flow Simulation

    Yong Yi Bay. An Energy-Conservative Cut-Cell Method and Advanced B -Spline-Based Filtering Method for Flow Simulation . Ph.D. dissertation, University of Illinois at Urbana-Champaign, 2019. https://hdl.handle.net/2142/106215

  34. [35]

    Yong Yi Bay and Kathleen A. Yearick. Solve for the hyperparameter, skip the search: K olmogorov-optimal scaling laws for spline regression. arXiv preprint arXiv:2606.23575, 2026

  35. [36]

    Yong Yi Bay and Kathleen A. Yearick. Machine learning vs deep learning: the generalization problem. arXiv preprint arXiv:2403.01621, 2024

  36. [37]

    The ubiquitous K ronecker product

    Charles F Van Loan. The ubiquitous K ronecker product. Journal of Computational and Applied Mathematics, 123 0 (1--2): 0 85--100, 2000

  37. [38]

    Compact finite difference schemes with spectral-like resolution

    Sanjiva K Lele. Compact finite difference schemes with spectral-like resolution. Journal of Computational Physics, 103 0 (1): 0 16--42, 1992

  38. [39]

    An implicit finite-difference algorithm for hyperbolic systems in conservation-law form

    Richard M Beam and Robert F Warming. An implicit finite-difference algorithm for hyperbolic systems in conservation-law form. Journal of Computational Physics, 22 0 (1): 0 87--110, 1976

  39. [40]

    Matrix Computations

    Gene H Golub and Charles F Van Loan. Matrix Computations. Johns Hopkins University Press, 4 edition, 2013

  40. [41]

    A FFT -based finite-difference solver for massively-parallel direct numerical simulations of turbulent flows

    Pedro Costa. A FFT -based finite-difference solver for massively-parallel direct numerical simulations of turbulent flows. Computers & Mathematics with Applications, 76 0 (8): 0 1853--1862, 2018

  41. [42]

    Matteo Frigo and Steven G. Johnson. The design and implementation of FFTW3 . Proceedings of the IEEE, 93 0 (2): 0 216--231, 2005. doi:10.1109/JPROC.2004.840301

  42. [43]

    Tensor decompositions and applications

    Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM Review , 51 0 (3): 0 455--500, 2009

  43. [44]

    Direct solution of partial difference equations by tensor product methods

    Robert E Lynch, John R Rice, and David H Thomas. Direct solution of partial difference equations by tensor product methods. Numerische Mathematik, 6 0 (1): 0 185--199, 1964

  44. [45]

    The 2DECOMP&FFT library: an update with new CPU/GPU capabilities

    Stefano Rolfo, C \'e dric Flageul, Paul Bartholomew, Filippo Spiga, and Sylvain Laizet. The 2DECOMP&FFT library: an update with new CPU/GPU capabilities. Journal of Open Source Software, 8 0 (91): 0 5813, 2023. doi:10.21105/joss.05813

  45. [46]

    DNS -based predictive control of turbulence: an optimal benchmark for feedback algorithms

    Thomas R Bewley, Parviz Moin, and Roger Temam. DNS -based predictive control of turbulence: an optimal benchmark for feedback algorithms. Journal of Fluid Mechanics, 447: 0 179--225, 2001

  46. [47]

    Isogeometric analysis: CAD , finite elements, NURBS , exact geometry and mesh refinement

    Thomas JR Hughes, John A Cottrell, and Yuri Bazilevs. Isogeometric analysis: CAD , finite elements, NURBS , exact geometry and mesh refinement. Computer Methods in Applied Mechanics and Engineering, 194 0 (39--41): 0 4135--4195, 2005

  47. [48]

    Numerical Computation of Internal and External Flows

    Charles Hirsch. Numerical Computation of Internal and External Flows. Butterworth-Heinemann, 2 edition, 2007

  48. [49]

    Computational Science and Engineering

    Gilbert Strang. Computational Science and Engineering. Wellesley-Cambridge Press, 2007

  49. [50]

    Elliptic problems in linear difference equations over a network

    Llewellyn H Thomas. Elliptic problems in linear difference equations over a network. Watson Scientific Computing Laboratory Report, 1949

  50. [51]

    Alternating direction methods for three space variables

    Jim Douglas. Alternating direction methods for three space variables. Numerische Mathematik, 4 0 (1): 0 41--63, 1962

  51. [52]

    A general formulation of alternating direction methods

    Jim Douglas, Jr and James E Gunn. A general formulation of alternating direction methods. Numerische Mathematik, 6 0 (1): 0 428--453, 1964

  52. [53]

    A First Course in the Numerical Analysis of Differential Equations

    Arieh Iserles. A First Course in the Numerical Analysis of Differential Equations. Cambridge University Press, 2 edition, 2009

  53. [54]

    High-Order Methods for Incompressible Fluid Flow

    Michel O Deville, Paul F Fischer, and Ernest H Mund. High-Order Methods for Incompressible Fluid Flow. Cambridge University Press, 2002

  54. [55]

    Higher order B -spline collocation at the G reville abscissae

    Richard W Johnson. Higher order B -spline collocation at the G reville abscissae. Applied Numerical Mathematics, 52 0 (1): 0 63--75, 2005

  55. [56]

    Finite Difference Methods for Ordinary and Partial Differential Equations

    Randall J LeVeque. Finite Difference Methods for Ordinary and Partial Differential Equations. SIAM, 2007

  56. [57]

    Isogeometric Analysis: Toward Integration of CAD and FEA

    J Austin Cottrell, Thomas JR Hughes, and Yuri Bazilevs. Isogeometric Analysis: Toward Integration of CAD and FEA . Wiley, 2009

  57. [58]

    Generation of finite difference formulas on arbitrarily spaced grids

    Bengt Fornberg. Generation of finite difference formulas on arbitrarily spaced grids. Mathematics of Computation, 51 0 (184): 0 699--706, 1988

  58. [59]

    The numerical solution of parabolic and elliptic differential equations

    Donald W Peaceman and Henry H Rachford, Jr. The numerical solution of parabolic and elliptic differential equations. Journal of the Society for Industrial and Applied Mathematics, 3 0 (1): 0 28--41, 1955

  59. [60]

    The accurate solution of P oisson's equation by expansion in C hebyshev polynomials

    Dale B Haidvogel and Thomas Zang. The accurate solution of P oisson's equation by expansion in C hebyshev polynomials. Journal of Computational Physics, 30 0 (2): 0 167--180, 1979