REX-SUB: A Scalable Subsampling Strategy for Modeling Large Spatial Datasets

Ben Seiyon Lee; Nicholas Rios

arxiv: 2605.16075 · v1 · pith:H4YUJQ4Bnew · submitted 2026-05-15 · 📊 stat.ME · stat.CO

REX-SUB: A Scalable Subsampling Strategy for Modeling Large Spatial Datasets

Nicholas Rios , Ben Seiyon Lee This is my paper

Pith reviewed 2026-05-20 15:24 UTC · model grok-4.3

classification 📊 stat.ME stat.CO

keywords spatial subsamplingGaussian processesVecchia approximationlarge spatial datasetsprediction errorexchange algorithmgeostatistics

0 comments

The pith

REX-SUB selects small subsamples from large spatial datasets to reduce prediction errors in Gaussian process models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors develop REX-SUB, a method that uses a randomized exchange algorithm to choose a small number of locations from millions of spatial observations. This selection aims to minimize prediction errors when fitting a Gaussian process model to the subset. They combine it with the Vecchia approximation to keep computations fast even on the chosen points. Results from simulations and a precipitable water dataset indicate better mean squared prediction errors and interval scores than other subsampling techniques. This approach addresses the challenge of fitting geostatistical models to massive data collections that would otherwise be too large for standard methods.

Core claim

Through a simulation study and an application to a remotely sensed precipitable water dataset, REX-SUB yields lower mean squared prediction errors and interval scores compared to competing subsampling strategies by efficiently selecting subsamples that optimize a prediction-error criterion within a Vecchia-approximated Gaussian process framework.

What carries the argument

The randomized exchange algorithm that starts with a random subsample and iteratively swaps locations to minimize a prediction-error criterion, embedded with the Vecchia approximation to the Gaussian process likelihood for fast sparse-matrix computations on the selected points.

If this is right

Lower mean squared prediction errors when predicting at unsampled locations from the full spatial field.
Improved interval scores that better calibrate uncertainty in spatial predictions.
Feasible fitting of Gaussian process models on datasets containing millions of locations where direct computation is prohibitive.
Better performance than random sampling or other fixed subsampling strategies in both controlled simulations and real remotely sensed data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same exchange criterion could be tested on non-Gaussian spatial processes if the approximation is adjusted accordingly.
REX-SUB might support adaptive monitoring designs where new measurement locations are chosen sequentially to reduce future prediction error.
Direct comparisons against more recent subsampling techniques on additional environmental datasets would clarify when the method is most advantageous.

Load-bearing premise

The prediction-error criterion optimized inside the exchange algorithm serves as a faithful proxy for how the fitted model will perform on the full dataset rather than only the chosen subsample.

What would settle it

A new simulation or application in which the locations chosen by REX-SUB produce higher mean squared prediction errors or worse interval scores than a competing subsampling method or even simple random selection on the same data.

Figures

Figures reproduced from arXiv: 2605.16075 by Ben Seiyon Lee, Nicholas Rios.

**Figure 1.** Figure 1: Two LHDs with n = 6 points in d = 2 dimensions. Figure from Zhao et al. (2018). that attempt to maximize the minimum distance between points; this is done by the maximinLHS function in the R package lhs (Carnell, 2024). Naturally, LHDs can be used to subsample n points from N points in d−dimensional space by partitioning the space into n × n regions, and then using the LHD to select n regions. Once the n r… view at source ↗

**Figure 2.** Figure 2: Total precipitable water data obtained via the MODIS Terra Satellite obtained [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗

**Figure 3.** Figure 3: Example of Subsample Locations for REX-SUB and LHS when [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of MSPE, Interval Scores, and coverage of 100 subsamples of size [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of the MSPE, Interval Scores, and coverage of 100 subsamples of [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

read the original abstract

Recent advances in data collection technologies have led to the emergence of massive spatial datasets, with measurements obtained at millions of spatial locations. Geostatistical models typically employ Gaussian processes (GPs) to capture spatial dependence, but standard GP fitting becomes prohibitive at such scales. A promising solution is optimal subsampling, where a subset of locations is selected that optimizes a criterion. In this study, we propose a randomized exchange algorithm for subsampling (REX-SUB) which efficiently selects small subsamples that minimize prediction errors in the fitted spatial GP models. To further improve computational efficiency, we embed a scalable Vecchia approximation to the GP's joint likelihood, which takes advantage of sparsity in the precision matrix to enable fast inference on the selected subsamples. Through a simulation study and an application to a remotely sensed precipitable water dataset, we show that REX-SUB yields lower mean squared prediction errors and interval scores compared to competing subsampling strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes REX-SUB, a randomized exchange algorithm for subsampling large spatial datasets to fit Gaussian process models. It embeds the Vecchia approximation to the GP likelihood for computational scalability and selects subsamples by minimizing a prediction-error criterion. Through simulations and an application to a remotely sensed precipitable water dataset, the authors report that REX-SUB produces lower mean squared prediction errors and interval scores than competing subsampling strategies.

Significance. If the performance gains survive direct validation against exact GP evaluations, the method could provide a practical, scalable tool for geostatistical modeling of massive spatial data where full GP inference is infeasible. The algorithmic combination of exchange subsampling with Vecchia sparsity is a reasonable engineering approach, and the empirical comparisons on simulated and real data supply initial supporting evidence.

major comments (2)

The exchange algorithm selects subsamples by minimizing a prediction-error criterion computed from the Vecchia-approximated likelihood. No section reports a direct comparison of Vecchia-based versus exact-GP prediction errors on the adaptively chosen subsamples. This is load-bearing for the central claim because the headline result (lower MSPE and interval scores) rests on the untested assumption that the approximate criterion induces the same ranking as the exact GP.
Simulation Study: the reported gains are shown for a fixed Vecchia conditioning set size, but the manuscript contains no sensitivity analysis to this parameter or to the exchange neighborhood definition. Because Vecchia accuracy depends on both the conditioning sets and the spatial configuration of the chosen points, it remains unclear whether the superiority over competitors persists under reasonable changes to these free parameters.

minor comments (2)

Abstract: the statement that REX-SUB 'yields lower mean squared prediction errors and interval scores' would be strengthened by a brief quantitative indication of the magnitude of improvement and the identity of the competing strategies.
Notation: the free parameters (subsample size m and Vecchia conditioning set size) should be explicitly listed in the experimental sections so that readers can reproduce the precise settings used for the reported results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and describe the revisions that will be incorporated to strengthen the manuscript.

read point-by-point responses

Referee: The exchange algorithm selects subsamples by minimizing a prediction-error criterion computed from the Vecchia-approximated likelihood. No section reports a direct comparison of Vecchia-based versus exact-GP prediction errors on the adaptively chosen subsamples. This is load-bearing for the central claim because the headline result (lower MSPE and interval scores) rests on the untested assumption that the approximate criterion induces the same ranking as the exact GP.

Authors: We agree this validation would strengthen the central claim. While exact GP evaluation is computationally prohibitive for the large-scale settings that motivate the work, we will add a new subsection to the simulation study that performs the comparison on smaller datasets (n=500 and n=1000) where exact GP computations remain feasible. In this analysis we will report the overlap in selected subsamples, the correlation between Vecchia and exact criteria, and the resulting MSPE and interval scores to confirm that the approximate criterion preserves the ranking of candidate subsamples. revision: yes
Referee: Simulation Study: the reported gains are shown for a fixed Vecchia conditioning set size, but the manuscript contains no sensitivity analysis to this parameter or to the exchange neighborhood definition. Because Vecchia accuracy depends on both the conditioning sets and the spatial configuration of the chosen points, it remains unclear whether the superiority over competitors persists under reasonable changes to these free parameters.

Authors: We acknowledge the value of demonstrating robustness. The revised manuscript will include additional simulation results that vary the Vecchia conditioning set size (m = 5, 10, 20, 30) and the exchange neighborhood size (k = 10, 20, 30 nearest neighbors). These experiments will be reported alongside the original results to show that the performance advantages of REX-SUB over the competing subsampling strategies are maintained across these reasonable parameter choices. revision: yes

Circularity Check

0 steps flagged

No significant circularity: algorithmic method evaluated on external benchmarks

full rationale

The paper introduces REX-SUB as a randomized exchange algorithm that selects subsamples by minimizing a prediction-error criterion computed under the Vecchia approximation. All performance claims (lower MSPE and interval scores versus competitors) are established through separate simulation studies with known truth and an application to a real remotely sensed dataset. No section presents a closed-form derivation, optimality condition, or prediction that reduces by the paper's own equations to a fitted quantity defined on the same data. The Vecchia approximation is a standard external technique, and the central results rest on empirical comparisons rather than self-referential definitions or self-citation chains.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard assumption that spatial dependence is well captured by a Gaussian process with a stationary covariance function, plus the modeling choice that a fixed-size subsample plus Vecchia sparsity pattern is sufficient to approximate the full joint likelihood.

free parameters (2)

subsample size m
Chosen by the user; performance depends on this value but no automatic selection rule is described in the abstract.
Vecchia conditioning set size
Controls the sparsity pattern and approximation accuracy; must be set by the practitioner.

axioms (1)

domain assumption The spatial field is a realization of a Gaussian process with known parametric covariance family.
Invoked when the prediction criterion is defined and when the Vecchia approximation is applied.

pith-pipeline@v0.9.0 · 5689 in / 1381 out tokens · 29057 ms · 2026-05-20T15:24:54.308037+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

REX-SUB ... randomized exchange algorithm ... embed a scalable Vecchia approximation ... minimize prediction errors
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

simulation study and ... remotely sensed precipitable water dataset

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 1 internal anchor

[1]

Journal of Statistical Planning and Inference , volume=

Information-based optimal subdata selection for big data logistic regression , author=. Journal of Statistical Planning and Inference , volume=. 2020 , publisher=

work page 2020
[2]

Technometrics , volume=

A comparison of algorithms for constructing exact D-optimal designs , author=. Technometrics , volume=. 1980 , publisher=

work page 1980
[3]

Journal of statistical planning and inference , volume=

Minimax and maximin distance designs , author=. Journal of statistical planning and inference , volume=. 1990 , publisher=

work page 1990
[4]

Journal of Quality Technology , volume=

A-optimal versus D-optimal design of screening experiments , author=. Journal of Quality Technology , volume=. 2021 , publisher=

work page 2021
[5]

arXiv preprint arXiv:2309.00720 , year=

Information-based Optimal Subdata Selection for Clusterwise Linear Regression , author=. arXiv preprint arXiv:2309.00720 , year=

work page arXiv
[6]

Journal of Statistical Theory and Practice , volume=

Model-robust subdata selection for big data , author=. Journal of Statistical Theory and Practice , volume=. 2021 , publisher=

work page 2021
[7]

The New England Journal of Statistics in Data Science , volume=

Subdata selection with a large number of variables , author=. The New England Journal of Statistics in Data Science , volume=. 2023 , publisher=

work page 2023
[8]

Journal of the American Statistical Association , volume=

Information-based optimal subdata selection for big data linear regression , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=

work page 2019
[9]

Atmospheric Environment , volume=

MODIS Collection 6.1 aerosol optical depth products over land and ocean: validation and comparison , author=. Atmospheric Environment , volume=. 2019 , publisher=

work page 2019
[10]

Spatial Statistics , volume=

Hierarchical statistical modeling of big spatial datasets using the exponential family of distributions , author=. Spatial Statistics , volume=. 2013 , publisher=

work page 2013
[11]

Journal of Computational and Graphical Statistics , volume=

A computationally efficient projection-based approach for spatial generalized linear mixed models , author=. Journal of Computational and Graphical Statistics , volume=. 2018 , publisher=

work page 2018
[12]

The cryosphere , volume=

Bedmap2: improved ice bed, surface and thickness datasets for Antarctica , author=. The cryosphere , volume=. 2013 , publisher=

work page 2013
[13]

Computational Statistics & Data Analysis , volume=

Vecchia--Laplace approximations of generalized Gaussian processes for big non-Gaussian spatial data , author=. Computational Statistics & Data Analysis , volume=. 2021 , publisher=

work page 2021
[14]

Journal of the American Statistical Association , volume=

I-optimal design of mixture experiments , author=. Journal of the American Statistical Association , volume=. 2016 , publisher=

work page 2016
[15]

2013 , publisher=

Theory of optimal experiments , author=. 2013 , publisher=

work page 2013
[16]

Journal of Multivariate analysis , volume=

Asymptotic properties of a maximum likelihood estimator with data from a Gaussian process , author=. Journal of Multivariate analysis , volume=. 1991 , publisher=

work page 1991
[17]

Journal of the American Statistical Association , volume =

Hao Zhang , title =. Journal of the American Statistical Association , volume =. 2004 , publisher =. doi:10.1198/016214504000000241 , URL =

work page doi:10.1198/016214504000000241 2004
[18]

The Annals of Statistics , pages=

Uniform asymptotic optimality of linear predictions of a random field using an incorrect second-order structure , author=. The Annals of Statistics , pages=. 1990 , publisher=

work page 1990
[19]

Journal of the American statistical Association , volume=

Strictly proper scoring rules, prediction, and estimation , author=. Journal of the American statistical Association , volume=. 2007 , publisher=

work page 2007
[20]

Rasmussen, Carl Edward and Williams, Christopher K. I. , title =. 2005 , month =. doi:10.7551/mitpress/3206.001.0001 , url =

work page doi:10.7551/mitpress/3206.001.0001 2005
[21]

International Conference on Artificial Intelligence and Statistics , pages=

On random subsampling of Gaussian process regression: A graphon-based analysis , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2020 , organization=

work page 2020
[22]

Statistica Sinica , volume=

Gaussian process prediction using design-based subsampling , author=. Statistica Sinica , volume=. 2022 , publisher=

work page 2022
[23]

Statistica Sinica , volume=

Efficient Gaussian process modeling using experimental design-based subagging , author=. Statistica Sinica , volume=. 2018 , publisher=

work page 2018
[24]

Technometrics , volume=

Comparison of three methods for selecting values of input variables in the analysis of output from a computer code , author=. Technometrics , volume=

work page
[25]

2024 , note =

lhs: Latin Hypercube Samples , author =. 2024 , note =

work page 2024
[26]

Journal of the Royal Statistical Society: Series B (Methodological) , volume=

Estimation and model identification for continuous spatial processes , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1988 , publisher=

work page 1988
[27]

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=

Approximating likelihoods for large spatial data sets , author=. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=. 2004 , publisher=

work page 2004
[28]

Hierarchical nearest-neighbor

Datta, Abhirup and Banerjee, Sudipto and Finley, Andrew O and Gelfand, Alan E , date-added =. Hierarchical nearest-neighbor. Journal of the American Statistical Association , number =

work page
[29]

Vecchia approximations of

Katzfuss, Matthias and Guinness, Joseph and Gong, Wenlong and Zilber, Daniel , journal=. Vecchia approximations of. 2020 , publisher=

work page 2020
[30]

Technometrics , volume=

Permutation and grouping methods for sharpening Gaussian process approximations , author=. Technometrics , volume=. 2018 , publisher=

work page 2018
[31]

Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=

Graphical methods for Order-of-Addition experiments , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=. 2025 , publisher=

work page 2025
[32]

Computational Statistics & Data Analysis , volume=

Exact designs for order-of-addition experiments under a transition-effect model , author=. Computational Statistics & Data Analysis , volume=. 2025 , publisher=

work page 2025
[33]

Computational Statistics & Data Analysis , volume=

TA algorithms for D-optimal OofA Mixture designs , author=. Computational Statistics & Data Analysis , volume=. 2022 , publisher=

work page 2022
[34]

1990 , publisher=

Spline models for observational data , author=. 1990 , publisher=

work page 1990
[35]

2011 , publisher=

Statistics for spatio-temporal data , author=. 2011 , publisher=

work page 2011
[36]

2003 , publisher=

Hierarchical modeling and analysis for spatial data , author=. 2003 , publisher=

work page 2003
[37]

Annual Review of Statistics and Its Application , volume=

Basis-function models in spatial statistics , author=. Annual Review of Statistics and Its Application , volume=. 2022 , publisher=

work page 2022
[38]

Journal of Agricultural, Biological and Environmental Statistics , volume=

A case study competition among methods for analyzing large spatial data , author=. Journal of Agricultural, Biological and Environmental Statistics , volume=. 2019 , publisher=

work page 2019
[39]

Technometrics , volume=

A scalable partitioned approach to model massive nonstationary non-Gaussian spatial datasets , author=. Technometrics , volume=. 2023 , publisher=

work page 2023
[40]

Journal of Computational and Graphical Statistics , volume=

Adaptive Bayesian nonstationary modeling for large spatial datasets using covariance approximations , author=. Journal of Computational and Graphical Statistics , volume=. 2014 , publisher=

work page 2014
[41]

Bayesian Analysis , volume=

Incorporating subsampling into Bayesian models for high-dimensional spatial data , author=. Bayesian Analysis , volume=. 2024 , publisher=

work page 2024
[42]

Test , volume=

Comparing and selecting spatial predictors using local criteria , author=. Test , volume=. 2015 , publisher=

work page 2015
[43]

Journal of Computational and Graphical Statistics , volume=

An approach to incorporate subsampling into a generic Bayesian hierarchical model , author=. Journal of Computational and Graphical Statistics , volume=. 2021 , publisher=

work page 2021
[44]

Statistical Science , volume=

A general framework for Vecchia approximations of Gaussian processes , author=. Statistical Science , volume=. 2021 , publisher=

work page 2021
[45]

An explicit link between

Lindgren, Finn and Rue, H. An explicit link between. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , number =

work page
[46]

Environmental and Ecological Statistics , volume=

A process-convolution approach to modelling temperatures in the North Atlantic Ocean , author=. Environmental and Ecological Statistics , volume=. 1998 , publisher=

work page 1998
[47]

and Johannesson, G

Cressie, N. and Johannesson, G. , date-added =. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , number =

work page
[48]

and Gelfand, A.E

Banerjee, S. and Gelfand, A.E. and Finley, A.O. and Sang, H. , journal =

work page
[49]

Covariance tapering for interpolation of large spatial datasets , volume =

Furrer, Reinhard and Genton, Marc G and Nychka, Douglas , journal=. Covariance tapering for interpolation of large spatial datasets , volume =

work page
[50]

Limitations on low rank approximations for covariance matrices of spatial data , volume =

Stein, Michael L , journal =. Limitations on low rank approximations for covariance matrices of spatial data , volume =

work page
[51]

Vecchia, A. V. , date-added =. Estimation and model identification for continuous spatial processes , volume =. Journal of the Royal Statistical Society, Series B, Methodological , keywords =

work page
[52]

The Annals of Applied Statistics , pages=

Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors , author=. The Annals of Applied Statistics , pages=. 2011 , publisher=

work page 2011
[53]

Environmetrics: The official journal of the International Environmetrics Society , volume=

A high frequency kriging approach for non-stationary environmental processes , author=. Environmetrics: The official journal of the International Environmetrics Society , volume=. 2001 , publisher=

work page 2001
[54]

Journal of the American Statistical Association , volume=

Analyzing nonstationary spatial data using piecewise Gaussian processes , author=. Journal of the American Statistical Association , volume=. 2005 , publisher=

work page 2005
[55]

Local likelihood estimation for covariance functions with spatially-varying parameters: the convoSPAT package for R

Local likelihood estimation for covariance functions with spatially-varying parameters: the convoSPAT package for R , author=. arXiv preprint arXiv:1507.08613 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[56]

Fast sampling of

Rue, Havard , journal=. Fast sampling of. 2001 , publisher=

work page 2001
[57]

Journal of Computational and Graphical Statistics , volume=

A multiresolution Gaussian process model for the analysis of large spatial datasets , author=. Journal of Computational and Graphical Statistics , volume=. 2015 , publisher=

work page 2015
[58]

Journal of the american statistical association , volume=

Nonseparable, stationary covariance functions for space--time data , author=. Journal of the american statistical association , volume=. 2002 , publisher=

work page 2002
[59]

Journal of the American Statistical association , volume=

Classes of nonseparable, spatio-temporal stationary covariance functions , author=. Journal of the American Statistical association , volume=. 1999 , publisher=

work page 1999
[60]

Journal of the American Statistical Association , volume=

Stable and efficient multiple smoothing parameter estimation for generalized additive models , author=. Journal of the American Statistical Association , volume=. 2004 , publisher=

work page 2004
[61]

Journal of Statistical Software , volume=

FRK: An R package for spatial and spatio-temporal prediction with large datasets , author=. Journal of Statistical Software , volume=

work page
[62]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2009 , publisher=

work page 2009
[63]

A multi-resolution approximation for massive spatial datasets , volume =

Katzfuss, Matthias , journal =. A multi-resolution approximation for massive spatial datasets , volume =

work page
[64]

Environmetrics: The official journal of the International Environmetrics Society , volume=

Spatial modelling using a new class of nonstationary covariance functions , author=. Environmetrics: The official journal of the International Environmetrics Society , volume=. 2006 , publisher=

work page 2006
[65]

Ecology , volume=

The basis function approach for modeling autocorrelation in ecological data , author=. Ecology , volume=. 2017 , publisher=

work page 2017
[66]

hetgp: Heteroskedastic Gaussian process modeling and sequential design in

Binois, Micka. hetgp: Heteroskedastic Gaussian process modeling and sequential design in. Journal of Statistical Software , volume=

work page
[67]

Springer Series in Statistics , year=

The Design and Analysis of Computer Experiments , author=. Springer Series in Statistics , year=

work page
[68]

arXiv preprint arXiv:2209.11280 , year=

Scalable Gaussian Process Hyperparameter Optimization via Coverage Regularization , author=. arXiv preprint arXiv:2209.11280 , year=

work page arXiv
[69]

Space Weather , volume=

GeoDGP: One-hour ahead global probabilistic geomagnetic perturbation forecasting using deep Gaussian process , author=. Space Weather , volume=. 2025 , publisher=

work page 2025
[70]

SIAM/ASA Journal on Uncertainty Quantification , volume=

Fully bayesian inference for latent variable gaussian process models , author=. SIAM/ASA Journal on Uncertainty Quantification , volume=. 2023 , publisher=

work page 2023
[71]

Statistics and Computing , volume=

Gaussian process learning via Fisher scoring of Vecchia’s approximation , author=. Statistics and Computing , volume=. 2021 , publisher=

work page 2021

[1] [1]

Journal of Statistical Planning and Inference , volume=

Information-based optimal subdata selection for big data logistic regression , author=. Journal of Statistical Planning and Inference , volume=. 2020 , publisher=

work page 2020

[2] [2]

Technometrics , volume=

A comparison of algorithms for constructing exact D-optimal designs , author=. Technometrics , volume=. 1980 , publisher=

work page 1980

[3] [3]

Journal of statistical planning and inference , volume=

Minimax and maximin distance designs , author=. Journal of statistical planning and inference , volume=. 1990 , publisher=

work page 1990

[4] [4]

Journal of Quality Technology , volume=

A-optimal versus D-optimal design of screening experiments , author=. Journal of Quality Technology , volume=. 2021 , publisher=

work page 2021

[5] [5]

arXiv preprint arXiv:2309.00720 , year=

Information-based Optimal Subdata Selection for Clusterwise Linear Regression , author=. arXiv preprint arXiv:2309.00720 , year=

work page arXiv

[6] [6]

Journal of Statistical Theory and Practice , volume=

Model-robust subdata selection for big data , author=. Journal of Statistical Theory and Practice , volume=. 2021 , publisher=

work page 2021

[7] [7]

The New England Journal of Statistics in Data Science , volume=

Subdata selection with a large number of variables , author=. The New England Journal of Statistics in Data Science , volume=. 2023 , publisher=

work page 2023

[8] [8]

Journal of the American Statistical Association , volume=

Information-based optimal subdata selection for big data linear regression , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=

work page 2019

[9] [9]

Atmospheric Environment , volume=

MODIS Collection 6.1 aerosol optical depth products over land and ocean: validation and comparison , author=. Atmospheric Environment , volume=. 2019 , publisher=

work page 2019

[10] [10]

Spatial Statistics , volume=

Hierarchical statistical modeling of big spatial datasets using the exponential family of distributions , author=. Spatial Statistics , volume=. 2013 , publisher=

work page 2013

[11] [11]

Journal of Computational and Graphical Statistics , volume=

A computationally efficient projection-based approach for spatial generalized linear mixed models , author=. Journal of Computational and Graphical Statistics , volume=. 2018 , publisher=

work page 2018

[12] [12]

The cryosphere , volume=

Bedmap2: improved ice bed, surface and thickness datasets for Antarctica , author=. The cryosphere , volume=. 2013 , publisher=

work page 2013

[13] [13]

Computational Statistics & Data Analysis , volume=

Vecchia--Laplace approximations of generalized Gaussian processes for big non-Gaussian spatial data , author=. Computational Statistics & Data Analysis , volume=. 2021 , publisher=

work page 2021

[14] [14]

Journal of the American Statistical Association , volume=

I-optimal design of mixture experiments , author=. Journal of the American Statistical Association , volume=. 2016 , publisher=

work page 2016

[15] [15]

2013 , publisher=

Theory of optimal experiments , author=. 2013 , publisher=

work page 2013

[16] [16]

Journal of Multivariate analysis , volume=

Asymptotic properties of a maximum likelihood estimator with data from a Gaussian process , author=. Journal of Multivariate analysis , volume=. 1991 , publisher=

work page 1991

[17] [17]

Journal of the American Statistical Association , volume =

Hao Zhang , title =. Journal of the American Statistical Association , volume =. 2004 , publisher =. doi:10.1198/016214504000000241 , URL =

work page doi:10.1198/016214504000000241 2004

[18] [18]

The Annals of Statistics , pages=

Uniform asymptotic optimality of linear predictions of a random field using an incorrect second-order structure , author=. The Annals of Statistics , pages=. 1990 , publisher=

work page 1990

[19] [19]

Journal of the American statistical Association , volume=

Strictly proper scoring rules, prediction, and estimation , author=. Journal of the American statistical Association , volume=. 2007 , publisher=

work page 2007

[20] [20]

Rasmussen, Carl Edward and Williams, Christopher K. I. , title =. 2005 , month =. doi:10.7551/mitpress/3206.001.0001 , url =

work page doi:10.7551/mitpress/3206.001.0001 2005

[21] [21]

International Conference on Artificial Intelligence and Statistics , pages=

On random subsampling of Gaussian process regression: A graphon-based analysis , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2020 , organization=

work page 2020

[22] [22]

Statistica Sinica , volume=

Gaussian process prediction using design-based subsampling , author=. Statistica Sinica , volume=. 2022 , publisher=

work page 2022

[23] [23]

Statistica Sinica , volume=

Efficient Gaussian process modeling using experimental design-based subagging , author=. Statistica Sinica , volume=. 2018 , publisher=

work page 2018

[24] [24]

Technometrics , volume=

Comparison of three methods for selecting values of input variables in the analysis of output from a computer code , author=. Technometrics , volume=

work page

[25] [25]

2024 , note =

lhs: Latin Hypercube Samples , author =. 2024 , note =

work page 2024

[26] [26]

Journal of the Royal Statistical Society: Series B (Methodological) , volume=

Estimation and model identification for continuous spatial processes , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1988 , publisher=

work page 1988

[27] [27]

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=

Approximating likelihoods for large spatial data sets , author=. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=. 2004 , publisher=

work page 2004

[28] [28]

Hierarchical nearest-neighbor

Datta, Abhirup and Banerjee, Sudipto and Finley, Andrew O and Gelfand, Alan E , date-added =. Hierarchical nearest-neighbor. Journal of the American Statistical Association , number =

work page

[29] [29]

Vecchia approximations of

Katzfuss, Matthias and Guinness, Joseph and Gong, Wenlong and Zilber, Daniel , journal=. Vecchia approximations of. 2020 , publisher=

work page 2020

[30] [30]

Technometrics , volume=

Permutation and grouping methods for sharpening Gaussian process approximations , author=. Technometrics , volume=. 2018 , publisher=

work page 2018

[31] [31]

Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=

Graphical methods for Order-of-Addition experiments , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=. 2025 , publisher=

work page 2025

[32] [32]

Computational Statistics & Data Analysis , volume=

Exact designs for order-of-addition experiments under a transition-effect model , author=. Computational Statistics & Data Analysis , volume=. 2025 , publisher=

work page 2025

[33] [33]

Computational Statistics & Data Analysis , volume=

TA algorithms for D-optimal OofA Mixture designs , author=. Computational Statistics & Data Analysis , volume=. 2022 , publisher=

work page 2022

[34] [34]

1990 , publisher=

Spline models for observational data , author=. 1990 , publisher=

work page 1990

[35] [35]

2011 , publisher=

Statistics for spatio-temporal data , author=. 2011 , publisher=

work page 2011

[36] [36]

2003 , publisher=

Hierarchical modeling and analysis for spatial data , author=. 2003 , publisher=

work page 2003

[37] [37]

Annual Review of Statistics and Its Application , volume=

Basis-function models in spatial statistics , author=. Annual Review of Statistics and Its Application , volume=. 2022 , publisher=

work page 2022

[38] [38]

Journal of Agricultural, Biological and Environmental Statistics , volume=

A case study competition among methods for analyzing large spatial data , author=. Journal of Agricultural, Biological and Environmental Statistics , volume=. 2019 , publisher=

work page 2019

[39] [39]

Technometrics , volume=

A scalable partitioned approach to model massive nonstationary non-Gaussian spatial datasets , author=. Technometrics , volume=. 2023 , publisher=

work page 2023

[40] [40]

Journal of Computational and Graphical Statistics , volume=

Adaptive Bayesian nonstationary modeling for large spatial datasets using covariance approximations , author=. Journal of Computational and Graphical Statistics , volume=. 2014 , publisher=

work page 2014

[41] [41]

Bayesian Analysis , volume=

Incorporating subsampling into Bayesian models for high-dimensional spatial data , author=. Bayesian Analysis , volume=. 2024 , publisher=

work page 2024

[42] [42]

Test , volume=

Comparing and selecting spatial predictors using local criteria , author=. Test , volume=. 2015 , publisher=

work page 2015

[43] [43]

Journal of Computational and Graphical Statistics , volume=

An approach to incorporate subsampling into a generic Bayesian hierarchical model , author=. Journal of Computational and Graphical Statistics , volume=. 2021 , publisher=

work page 2021

[44] [44]

Statistical Science , volume=

A general framework for Vecchia approximations of Gaussian processes , author=. Statistical Science , volume=. 2021 , publisher=

work page 2021

[45] [45]

An explicit link between

Lindgren, Finn and Rue, H. An explicit link between. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , number =

work page

[46] [46]

Environmental and Ecological Statistics , volume=

A process-convolution approach to modelling temperatures in the North Atlantic Ocean , author=. Environmental and Ecological Statistics , volume=. 1998 , publisher=

work page 1998

[47] [47]

and Johannesson, G

Cressie, N. and Johannesson, G. , date-added =. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , number =

work page

[48] [48]

and Gelfand, A.E

Banerjee, S. and Gelfand, A.E. and Finley, A.O. and Sang, H. , journal =

work page

[49] [49]

Covariance tapering for interpolation of large spatial datasets , volume =

Furrer, Reinhard and Genton, Marc G and Nychka, Douglas , journal=. Covariance tapering for interpolation of large spatial datasets , volume =

work page

[50] [50]

Limitations on low rank approximations for covariance matrices of spatial data , volume =

Stein, Michael L , journal =. Limitations on low rank approximations for covariance matrices of spatial data , volume =

work page

[51] [51]

Vecchia, A. V. , date-added =. Estimation and model identification for continuous spatial processes , volume =. Journal of the Royal Statistical Society, Series B, Methodological , keywords =

work page

[52] [52]

The Annals of Applied Statistics , pages=

Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors , author=. The Annals of Applied Statistics , pages=. 2011 , publisher=

work page 2011

[53] [53]

Environmetrics: The official journal of the International Environmetrics Society , volume=

A high frequency kriging approach for non-stationary environmental processes , author=. Environmetrics: The official journal of the International Environmetrics Society , volume=. 2001 , publisher=

work page 2001

[54] [54]

Journal of the American Statistical Association , volume=

Analyzing nonstationary spatial data using piecewise Gaussian processes , author=. Journal of the American Statistical Association , volume=. 2005 , publisher=

work page 2005

[55] [55]

Local likelihood estimation for covariance functions with spatially-varying parameters: the convoSPAT package for R

Local likelihood estimation for covariance functions with spatially-varying parameters: the convoSPAT package for R , author=. arXiv preprint arXiv:1507.08613 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[56] [56]

Fast sampling of

Rue, Havard , journal=. Fast sampling of. 2001 , publisher=

work page 2001

[57] [57]

Journal of Computational and Graphical Statistics , volume=

A multiresolution Gaussian process model for the analysis of large spatial datasets , author=. Journal of Computational and Graphical Statistics , volume=. 2015 , publisher=

work page 2015

[58] [58]

Journal of the american statistical association , volume=

Nonseparable, stationary covariance functions for space--time data , author=. Journal of the american statistical association , volume=. 2002 , publisher=

work page 2002

[59] [59]

Journal of the American Statistical association , volume=

Classes of nonseparable, spatio-temporal stationary covariance functions , author=. Journal of the American Statistical association , volume=. 1999 , publisher=

work page 1999

[60] [60]

Journal of the American Statistical Association , volume=

Stable and efficient multiple smoothing parameter estimation for generalized additive models , author=. Journal of the American Statistical Association , volume=. 2004 , publisher=

work page 2004

[61] [61]

Journal of Statistical Software , volume=

FRK: An R package for spatial and spatio-temporal prediction with large datasets , author=. Journal of Statistical Software , volume=

work page

[62] [62]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2009 , publisher=

work page 2009

[63] [63]

A multi-resolution approximation for massive spatial datasets , volume =

Katzfuss, Matthias , journal =. A multi-resolution approximation for massive spatial datasets , volume =

work page

[64] [64]

Environmetrics: The official journal of the International Environmetrics Society , volume=

Spatial modelling using a new class of nonstationary covariance functions , author=. Environmetrics: The official journal of the International Environmetrics Society , volume=. 2006 , publisher=

work page 2006

[65] [65]

Ecology , volume=

The basis function approach for modeling autocorrelation in ecological data , author=. Ecology , volume=. 2017 , publisher=

work page 2017

[66] [66]

hetgp: Heteroskedastic Gaussian process modeling and sequential design in

Binois, Micka. hetgp: Heteroskedastic Gaussian process modeling and sequential design in. Journal of Statistical Software , volume=

work page

[67] [67]

Springer Series in Statistics , year=

The Design and Analysis of Computer Experiments , author=. Springer Series in Statistics , year=

work page

[68] [68]

arXiv preprint arXiv:2209.11280 , year=

Scalable Gaussian Process Hyperparameter Optimization via Coverage Regularization , author=. arXiv preprint arXiv:2209.11280 , year=

work page arXiv

[69] [69]

Space Weather , volume=

GeoDGP: One-hour ahead global probabilistic geomagnetic perturbation forecasting using deep Gaussian process , author=. Space Weather , volume=. 2025 , publisher=

work page 2025

[70] [70]

SIAM/ASA Journal on Uncertainty Quantification , volume=

Fully bayesian inference for latent variable gaussian process models , author=. SIAM/ASA Journal on Uncertainty Quantification , volume=. 2023 , publisher=

work page 2023

[71] [71]

Statistics and Computing , volume=

Gaussian process learning via Fisher scoring of Vecchia’s approximation , author=. Statistics and Computing , volume=. 2021 , publisher=

work page 2021