GraphGP: Scalable Gaussian Processes with Vecchia's Approximation

Benjamin Dodge; Philipp Frank; Susan E. Clark

arxiv: 2606.11402 · v1 · pith:4LOHOTIUnew · submitted 2026-06-09 · 📊 stat.CO · astro-ph.IM· stat.ML

GraphGP: Scalable Gaussian Processes with Vecchia's Approximation

Benjamin Dodge , Philipp Frank , Susan E. Clark This is my paper

Pith reviewed 2026-06-27 10:24 UTC · model grok-4.3

classification 📊 stat.CO astro-ph.IMstat.ML

keywords Gaussian processesVecchia approximationGPU computingscalable inferencesparse precision matrixk-d tree orderingnearest-neighbor conditioning

0 comments

The pith

GraphGP implements Vecchia's approximation on GPU to let Gaussian processes scale to nearly a billion points with linear time and memory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Gaussian processes model continuous fields but their standard implementations require cubic time and quadratic memory, which quickly becomes unusable for large numbers of points. Vecchia's approximation addresses this by forming a sparse precision matrix in which each observation is conditioned only on its k nearest neighbors, provided the kernel is stationary and correlations decay with distance. GraphGP realizes this approximation through a bit-reversed k-d tree ordering that supports fast neighbor lookup together with high batch parallelism, plus a custom differentiable CUDA kernel that replaces slower JAX baselines. The resulting algorithm supplies forward simulation, matrix-vector products, log-determinants, and gradient computations, all in linear time and memory even for arbitrary point sets spanning wide ranges. This removes the main computational barrier that has historically restricted Gaussian-process methods to modest dataset sizes.

Core claim

GraphGP is a GPU algorithm for Vecchia's approximation that scales to nearly a billion parameters with linear time and memory requirements, handling arbitrary point distributions over a large dynamic range. Its key contributions are a bit-reversed k-d tree ordering that allows efficient neighbor searches while also maximizing batch parallelism, and a differentiable CUDA implementation that is substantially faster and more memory efficient than a pure JAX baseline. The method supplies the building blocks for inference, including forward generation, inverse application, log-determinant, and kernel parameter derivatives.

What carries the argument

The bit-reversed k-d tree ordering paired with a differentiable CUDA implementation of the sparse precision matrix formed by conditioning each point on its k nearest neighbors.

If this is right

All standard Gaussian-process operations become feasible at scales up to nearly a billion points.
The same ordering and CUDA kernels support both simulation and parameter inference without quadratic memory use.
Arbitrary point distributions, including those with large dynamic range, can be handled without special preprocessing.
Kernel hyperparameters can be optimized by back-propagating through the log-determinant and other terms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same neighbor-ordering technique could be reused inside other sparse Gaussian-process approximations that rely on local conditioning.
Because the implementation is differentiable, it could serve as a drop-in module inside larger models that combine Gaussian processes with neural networks.
Linear scaling opens the possibility of real-time or online updating of Gaussian-process fields on streaming spatial data.

Load-bearing premise

The kernels must be stationary with correlations that decay rapidly enough with distance for conditioning on only k nearest neighbors to produce an accurate sparse precision matrix.

What would settle it

Measure wall-clock time and approximation error while increasing N from 10^6 to 10^9 on a stationary decaying kernel; linear scaling together with stable error would support the claim, while either super-linear growth or rapidly worsening error would falsify it.

Figures

Figures reproduced from arXiv: 2606.11402 by Benjamin Dodge, Philipp Frank, Susan E. Clark.

**Figure 1.** Figure 1: Evaluation of GraphGP order. Left: Illustration of our bit-reversed order and the standard binary k-d tree order. Consecutive points in our order are far away from each other in space and likely to be conditionally independent. Center: Batch sizes for N = 104 points drawn from a unit Gaussian in 3d with k = 16 and n0 = 50. We choose this test distribution as it is irregular and offers three orders of magni… view at source ↗

**Figure 2.** Figure 2: Performance of GraphGP as a function of N and k. Points are drawn from a unit Gaussian in 3d with n0 = 1000. All experiments were performed on an H100 GPU (80 GB) with 32-bit floats, increasing N until memory was exhausted. The large improvements that CUDA offers for the forward pass are due to on-the-fly matrix operations described in Section 2.3. 3 Discussion GraphGP enables practical scaling of Vecchia’… view at source ↗

read the original abstract

Gaussian processes are a powerful tool for modeling continuous fields, but their naive $\mathcal{O}(N^3)$ computational cost and $\mathcal{O}(N^2)$ memory requirement often limit their practical use. Vecchia's approximation is a sparse precision matrix approximation for stationary, decaying kernels that conditions each point only on its $k$ nearest neighbors. We present GraphGP, a GPU algorithm for Vecchia's approximation that scales to nearly a billion parameters with linear time and memory requirements, handling arbitrary point distributions over a large dynamic range. Our key contributions are (1) a bit-reversed k-d tree ordering that allows efficient neighbor searches while also maximizing batch parallelism, and (2) a differentiable CUDA implementation, which is substantially faster and more memory efficient than our pure JAX baseline. GraphGP provides the building blocks for inference, including forward generation, inverse application, log-determinant, and kernel parameter derivatives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GraphGP is a practical GPU implementation of Vecchia's approximation with a bit-reversed kd-tree ordering that aids parallelism, but its scaling claims rest on engineering details that need full-paper verification.

read the letter

The main point is that this paper delivers a GPU version of Vecchia's approximation for Gaussian processes, using a bit-reversed k-d tree to speed up neighbor searches and improve batching, plus custom differentiable CUDA kernels. It targets linear scaling in N for fixed k on stationary kernels with decaying correlations, up to nearly a billion points.

What is actually new is the combination of that ordering scheme with the CUDA implementation; prior Vecchia work exists, but the specific engineering choices for GPU efficiency and differentiability are the contribution. The paper supplies the expected pieces: forward sampling, inverse application, log-determinant, and kernel derivatives, which makes it usable for inference rather than just a one-off trick.

It handles the practical side reasonably by addressing arbitrary point distributions and large dynamic ranges, which matters for real spatial data. The abstract is clear that this is an approximation relying on k-nearest conditioning, so the usual trade-off between k and accuracy is left to the user.

The soft spot is that the strong performance claims are stated without numbers or comparisons visible in the abstract, so the full manuscript has to show that the linear time and memory actually materialize on hardware and that accuracy holds for the target applications. No obvious internal contradictions appear in the stated approach.

This is for people who need large-scale spatial GPs in astronomy, climate, or similar fields and want something they can run on GPU. A reader looking for a working implementation would find value if the code and tests check out.

I would send it to peer review because the problem is relevant and the algorithmic work looks grounded, even if revisions will likely focus on empirical validation.

Referee Report

0 major / 3 minor

Summary. The paper introduces GraphGP, a GPU algorithm implementing Vecchia's approximation for Gaussian processes on stationary kernels with decaying correlations. It conditions each point on its k nearest neighbors to obtain a sparse precision matrix, using a bit-reversed k-d tree ordering for efficient neighbor search and batch parallelism, together with a differentiable CUDA kernel implementation. The claimed contributions include forward generation, inverse application, log-determinant evaluation, and kernel-parameter derivatives, all asserted to achieve linear time and memory scaling up to nearly 10^9 parameters for arbitrary point distributions.

Significance. If the linear scaling and numerical accuracy claims are substantiated, GraphGP would supply practical building blocks for large-scale GP inference on GPUs, extending Vecchia-type approximations to problem sizes that are currently inaccessible. The emphasis on a fully differentiable CUDA backend is a concrete strength for downstream use in gradient-based optimization frameworks.

minor comments (3)

[Abstract] Abstract and §1: the performance claims (linear scaling to ~10^9 parameters) are stated without reference to the specific benchmark tables or figures that presumably support them; adding explicit cross-references would improve readability.
[§3.2] §3.2: the bit-reversed k-d tree construction is described at a high level; a short pseudocode block or complexity breakdown for the ordering step itself would clarify that it remains O(N log N) and does not introduce a hidden quadratic term.
The manuscript should include at least one table or figure that reports wall-clock time, memory footprint, and approximation error versus N for fixed k on both synthetic and real point sets, with direct comparison to a pure-JAX or CPU Vecchia baseline.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of GraphGP, the significance of the linear-scaling claims, and the recommendation for minor revision. The report contains no enumerated major comments.

Circularity Check

0 steps flagged

No significant circularity in algorithmic implementation

full rationale

The paper describes an algorithmic GPU implementation of the established Vecchia approximation (conditioning on k-nearest neighbors for sparse precision matrices of stationary kernels), with contributions limited to bit-reversed k-d tree ordering for parallelism and a differentiable CUDA kernel. No derivation chain, fitted parameters, or self-citations reduce the claimed linear scaling or efficiency claims to inputs by construction; the work is self-contained as an engineering optimization preserving the standard O(N k^2) complexity of the prior approximation without introducing circular reductions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the standard Vecchia assumption that kernels are stationary with decaying correlations; the neighbor count k is a free hyperparameter whose value controls approximation quality but is not fitted inside the central claim.

free parameters (1)

k (neighbor count)
Controls the sparsity of the precision matrix and therefore both accuracy and speed; chosen by the user rather than derived.

axioms (1)

domain assumption Kernels are stationary and correlations decay with distance
Required for the k-nearest-neighbor conditioning in Vecchia's approximation to produce a valid sparse precision matrix.

pith-pipeline@v0.9.1-grok · 5688 in / 1373 out tokens · 27304 ms · 2026-06-27T10:24:49.148810+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Via Project: Overview of the Science, Instrument, and Survey
astro-ph.IM 2026-06 unverdicted novelty 5.0

The Via Project is a planned five-year dual-hemisphere spectroscopic survey targeting over 2 million stars with 100 m/s RV stability and transient spectroscopy to r~24 using instruments on MMT and Magellan/Clay telesc...

Reference graph

Works this paper leans on

21 extracted references · 12 canonical work pages · cited by 1 Pith paper

[1]

2024, A&A, 685, A82, doi: 10.1051/0004-6361/202347628

Edenhofer, Gordian and Zucker, Catherine and Frank, Philipp and Saydjari, Andrew K. and Speagle, Joshua S. and Finkbeiner, Douglas and En. A parsec-scale Galactic 3D dust map out to 1.25 kpc from the Sun , volume=. 2024 , pages=. doi:10.1051/0004-6361/202347628 , journal=

work page doi:10.1051/0004-6361/202347628 2024
[2]

Sparse Inverse Cholesky Factorization of Dense Kernel Matrices by Greedy Conditional Selection , volume=

Huan, Stephen and Guinness, Joseph and Katzfuss, Matthias and Owhadi, Houman and Sch\". Sparse Inverse Cholesky Factorization of Dense Kernel Matrices by Greedy Conditional Selection , volume=. SIAM/ASA Journal on Uncertainty Quantification , publisher=. 2025 , pages=. doi:10.1137/23m1606253 , number=

work page doi:10.1137/23m1606253 2025
[3]

and Guerdi, Massin and Scheel-Platz, Lukas I

Edenhofer, Gordian and Frank, Philipp and Roth, Jakob and Leike, Reimar H. and Guerdi, Massin and Scheel-Platz, Lukas I. and Guardiani, Matteo and Eberle, Vincent and Westerkamp, Margret and En. Re-Envisioning Numerical Information Field Theory (. Journal of Open Source Software , publisher=. 2024 , pages=. doi:10.21105/joss.06593 , number=

work page doi:10.21105/joss.06593 2024
[4]

, keywords =

McCallum, Lewis and Frank, Philipp and Hutschenreuter, Sebastian and Benjamin, Robert and Booth, Rebecca A and Clark, Susan E and Haverkorn, Marijke and Hill, Alex S and Mertsch, Philipp and Ordog, Anna and Saydjari, Andrew K and West, Jennifer , year=. The radial component of the local Galactic magnetic field in 3D , volume=. Monthly Notices of the Royal...

work page doi:10.1093/mnras/stag640
[5]

Geometric Variational Inference , volume=

Frank, Philipp and Leike, Reimar and En. Geometric Variational Inference , volume=. Entropy , publisher=. 2021 , pages=. doi:10.3390/e23070853 , number=

work page doi:10.3390/e23070853 2021
[6]

Knollm. Metric. arXiv e-prints , DOI=
[7]

arXiv e-prints , DOI=

A Three-Dimensional Tomographic Reconstruction of the Galactic Cosmic-Ray Proton Density , author=. arXiv e-prints , DOI=
[8]

and Keyes, David E

Pan, Qilong and Abdulah, Sameh and Genton, Marc G. and Keyes, David E. and Ltaief, Hatem and Sun, Ying , journal=. doi:10.48550/arXiv.2403.07412 , year=

work page doi:10.48550/arxiv.2403.07412
[9]

Leike, R. H. and Edenhofer, G. and Knollm. The Galactic 3D large-scale dust distribution via. arXiv e-prints , DOI=
[10]

Implementation and Analysis of

James, Zachary and Guinness, Joseph , journal=. Implementation and Analysis of. doi:10.48550/arXiv.2407.02740 , year=

work page doi:10.48550/arxiv.2407.02740
[11]

doi:10.48550/arXiv.2211.00120 , year=

Wald, Ingo , journal=. doi:10.48550/arXiv.2211.00120 , year=

work page doi:10.48550/arxiv.2211.00120
[12]

arXiv e-prints , DOI=

A Stack-Free Traversal Algorithm for Left-Balanced k-d Trees , author=. arXiv e-prints , DOI=
[13]

and Frank, Philipp and En

Edenhofer, Gordian and Leike, Reimar H. and Frank, Philipp and En. Sparse Kernel. arXiv e-prints , DOI=
[14]

Vecchia, A. V. , year=. Estimation and Model Identification for Continuous Spatial Processes , volume=. Journal of the Royal Statistical Society Series B: Statistical Methodology , publisher=. doi:10.1111/j.2517-6161.1988.tb01729.x , number=

work page doi:10.1111/j.2517-6161.1988.tb01729.x 1988
[15]

Kernel Interpolation for Scalable Structured

Wilson, Andrew and Nickisch, Hannes , booktitle=. Kernel Interpolation for Scalable Structured. 2015 , publisher=

2015
[16]

Rasmussen, Carl Edward and Williams, Christopher K. I. , year=
[17]

Journal of Agricultural, Biological and Environmental Statistics , publisher=

Katzfuss, Matthias and Guinness, Joseph and Gong, Wenlong and Zilber, Daniel , year=. Journal of Agricultural, Biological and Environmental Statistics , publisher=. doi:10.1007/s13253-020-00401-7 , number=

work page doi:10.1007/s13253-020-00401-7
[18]

Permutation and Grouping Methods for Sharpening

Guinness, Joseph , year=. Permutation and Grouping Methods for Sharpening. Technometrics , publisher=. doi:10.1080/00401706.2018.1437476 , number=

work page doi:10.1080/00401706.2018.1437476 2018
[19]

Variational Learning of Inducing Variables in Sparse

Titsias, Michalis , booktitle=. Variational Learning of Inducing Variables in Sparse. 2009 , publisher=

2009
[20]

Bradbury, James and Frostig, Roy and Hawkins, Peter and Johnson, Matthew James and Leary, Chris and Maclaurin, Dougal and Necula, George and Paszke, Adam and Vander
[21]

Smith, S. P. , year=. Differentiation of the. Journal of Computational and Graphical Statistics , publisher=. doi:10.1080/10618600.1995.10474671 , number=

work page doi:10.1080/10618600.1995.10474671 1995

[1] [1]

2024, A&A, 685, A82, doi: 10.1051/0004-6361/202347628

Edenhofer, Gordian and Zucker, Catherine and Frank, Philipp and Saydjari, Andrew K. and Speagle, Joshua S. and Finkbeiner, Douglas and En. A parsec-scale Galactic 3D dust map out to 1.25 kpc from the Sun , volume=. 2024 , pages=. doi:10.1051/0004-6361/202347628 , journal=

work page doi:10.1051/0004-6361/202347628 2024

[2] [2]

Sparse Inverse Cholesky Factorization of Dense Kernel Matrices by Greedy Conditional Selection , volume=

Huan, Stephen and Guinness, Joseph and Katzfuss, Matthias and Owhadi, Houman and Sch\". Sparse Inverse Cholesky Factorization of Dense Kernel Matrices by Greedy Conditional Selection , volume=. SIAM/ASA Journal on Uncertainty Quantification , publisher=. 2025 , pages=. doi:10.1137/23m1606253 , number=

work page doi:10.1137/23m1606253 2025

[3] [3]

and Guerdi, Massin and Scheel-Platz, Lukas I

Edenhofer, Gordian and Frank, Philipp and Roth, Jakob and Leike, Reimar H. and Guerdi, Massin and Scheel-Platz, Lukas I. and Guardiani, Matteo and Eberle, Vincent and Westerkamp, Margret and En. Re-Envisioning Numerical Information Field Theory (. Journal of Open Source Software , publisher=. 2024 , pages=. doi:10.21105/joss.06593 , number=

work page doi:10.21105/joss.06593 2024

[4] [4]

, keywords =

McCallum, Lewis and Frank, Philipp and Hutschenreuter, Sebastian and Benjamin, Robert and Booth, Rebecca A and Clark, Susan E and Haverkorn, Marijke and Hill, Alex S and Mertsch, Philipp and Ordog, Anna and Saydjari, Andrew K and West, Jennifer , year=. The radial component of the local Galactic magnetic field in 3D , volume=. Monthly Notices of the Royal...

work page doi:10.1093/mnras/stag640

[5] [5]

Geometric Variational Inference , volume=

Frank, Philipp and Leike, Reimar and En. Geometric Variational Inference , volume=. Entropy , publisher=. 2021 , pages=. doi:10.3390/e23070853 , number=

work page doi:10.3390/e23070853 2021

[6] [6]

Knollm. Metric. arXiv e-prints , DOI=

[7] [7]

arXiv e-prints , DOI=

A Three-Dimensional Tomographic Reconstruction of the Galactic Cosmic-Ray Proton Density , author=. arXiv e-prints , DOI=

[8] [8]

and Keyes, David E

Pan, Qilong and Abdulah, Sameh and Genton, Marc G. and Keyes, David E. and Ltaief, Hatem and Sun, Ying , journal=. doi:10.48550/arXiv.2403.07412 , year=

work page doi:10.48550/arxiv.2403.07412

[9] [9]

Leike, R. H. and Edenhofer, G. and Knollm. The Galactic 3D large-scale dust distribution via. arXiv e-prints , DOI=

[10] [10]

Implementation and Analysis of

James, Zachary and Guinness, Joseph , journal=. Implementation and Analysis of. doi:10.48550/arXiv.2407.02740 , year=

work page doi:10.48550/arxiv.2407.02740

[11] [11]

doi:10.48550/arXiv.2211.00120 , year=

Wald, Ingo , journal=. doi:10.48550/arXiv.2211.00120 , year=

work page doi:10.48550/arxiv.2211.00120

[12] [12]

arXiv e-prints , DOI=

A Stack-Free Traversal Algorithm for Left-Balanced k-d Trees , author=. arXiv e-prints , DOI=

[13] [13]

and Frank, Philipp and En

Edenhofer, Gordian and Leike, Reimar H. and Frank, Philipp and En. Sparse Kernel. arXiv e-prints , DOI=

[14] [14]

Vecchia, A. V. , year=. Estimation and Model Identification for Continuous Spatial Processes , volume=. Journal of the Royal Statistical Society Series B: Statistical Methodology , publisher=. doi:10.1111/j.2517-6161.1988.tb01729.x , number=

work page doi:10.1111/j.2517-6161.1988.tb01729.x 1988

[15] [15]

Kernel Interpolation for Scalable Structured

Wilson, Andrew and Nickisch, Hannes , booktitle=. Kernel Interpolation for Scalable Structured. 2015 , publisher=

2015

[16] [16]

Rasmussen, Carl Edward and Williams, Christopher K. I. , year=

[17] [17]

Journal of Agricultural, Biological and Environmental Statistics , publisher=

Katzfuss, Matthias and Guinness, Joseph and Gong, Wenlong and Zilber, Daniel , year=. Journal of Agricultural, Biological and Environmental Statistics , publisher=. doi:10.1007/s13253-020-00401-7 , number=

work page doi:10.1007/s13253-020-00401-7

[18] [18]

Permutation and Grouping Methods for Sharpening

Guinness, Joseph , year=. Permutation and Grouping Methods for Sharpening. Technometrics , publisher=. doi:10.1080/00401706.2018.1437476 , number=

work page doi:10.1080/00401706.2018.1437476 2018

[19] [19]

Variational Learning of Inducing Variables in Sparse

Titsias, Michalis , booktitle=. Variational Learning of Inducing Variables in Sparse. 2009 , publisher=

2009

[20] [20]

Bradbury, James and Frostig, Roy and Hawkins, Peter and Johnson, Matthew James and Leary, Chris and Maclaurin, Dougal and Necula, George and Paszke, Adam and Vander

[21] [21]

Smith, S. P. , year=. Differentiation of the. Journal of Computational and Graphical Statistics , publisher=. doi:10.1080/10618600.1995.10474671 , number=

work page doi:10.1080/10618600.1995.10474671 1995