A review of regularised estimation methods and cross-validation in spatiotemporal statistics
Pith reviewed 2026-05-24 03:34 UTC · model grok-4.3
The pith
Regularised estimation procedures structured around mixed-effects models support dimensionality reduction and model selection for large geospatial spatiotemporal data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Regularised estimation procedures are applicable to geostatistical and spatial econometric models for dimensionality reduction or model selection in big geospatial data, structured via mixed-effects models.
What carries the argument
A mixed-effects model setup that organises a variety of spatiotemporal models and guides the application of different regularisation procedures.
If this is right
- Enables selection of relevant regressors in spatial econometric models.
- Supports dimensionality reduction of covariance matrices in geostatistical settings.
- Allows detection of conditionally independent locations.
- Facilitates estimation of full spatial interaction matrices.
Where Pith is reading between the lines
- The mixed-effects framing may simplify comparison of regularisation choices across different data types.
- Cross-validation procedures mentioned in the title could be used to tune the shrinkage targets within the same setup.
- The approach suggests a route toward automated handling of high-dimensional geo-referenced data without separate model-specific derivations.
Load-bearing premise
The reviewed regularisation procedures can be organized around a mixed-effects model setup that covers a variety of spatiotemporal models.
What would settle it
A demonstration that one or more standard spatiotemporal models cannot be expressed within the mixed-effects framework or that the listed regularisation procedures fail to achieve dimensionality reduction on representative big geospatial datasets.
Figures
read the original abstract
This review article focuses on regularised estimation procedures applicable to geostatistical and spatial econometric models. These methods are particularly relevant in the case of big geospatial data for dimensionality reduction or model selection. To structure the review, we initially consider the most general case of multivariate spatiotemporal processes (i.e., $g > 1$ dimensions of the spatial domain, a one-dimensional temporal domain, and $q \geq 1$ random variables). Then, the idea of regularised/penalised estimation procedures and different choices of shrinkage targets are discussed. Finally, guided by the elements of a mixed-effects model setup, which allows for a variety of spatiotemporal models, we show different regularisation procedures and how they can be used for the analysis of geo-referenced data, e.g. for selection of relevant regressors, dimensionality reduction of the covariance matrices, detection of conditionally independent locations, or the estimation of a full spatial interaction matrix.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This review article examines regularised estimation procedures for geostatistical and spatial econometric models, especially for big geospatial data to achieve dimensionality reduction or model selection. It begins with the general case of multivariate spatiotemporal processes (g > 1 spatial dimensions, one temporal dimension, q ≥ 1 variables), discusses penalised estimation and shrinkage targets, and then employs a mixed-effects model framework to demonstrate applications including regressor selection, covariance dimensionality reduction, detection of conditionally independent locations, and estimation of spatial interaction matrices.
Significance. The paper offers an organizational structure for regularisation methods in spatiotemporal statistics via the mixed-effects lens. If the framework successfully unifies methods from different subfields, it could serve as a valuable reference for practitioners dealing with high-dimensional geospatial data, facilitating better model selection and estimation in geostatistics and spatial econometrics.
major comments (2)
- [Abstract] The title prominently features 'cross-validation', but the abstract makes no mention of cross-validation or how it integrates with the regularised estimation procedures reviewed. This is a potential mismatch since the central claim is about structuring the review around regularisation, and CV is a key tool for penalty selection in such methods.
- [mixed-effects model setup paragraph] The abstract claims that the mixed-effects model setup 'allows for a variety of spatiotemporal models', but to substantiate the organizational contribution, the manuscript should explicitly demonstrate how standard models (e.g., Gaussian processes, spatial autoregressive models) map onto this setup in the relevant section.
minor comments (1)
- [Abstract] The notation defining the general case (g > 1, temporal domain, q ≥ 1) is introduced without an accompanying simple example, which might help readers unfamiliar with the multivariate spatiotemporal setting.
Simulated Author's Rebuttal
We thank the referee for their thoughtful comments and recommendation for minor revision. We address each major comment below and outline the planned changes to strengthen the manuscript's alignment with its title and organizational claims.
read point-by-point responses
-
Referee: [Abstract] The title prominently features 'cross-validation', but the abstract makes no mention of cross-validation or how it integrates with the regularised estimation procedures reviewed. This is a potential mismatch since the central claim is about structuring the review around regularisation, and CV is a key tool for penalty selection in such methods.
Authors: We agree that the abstract should reference cross-validation to align with the title and emphasize its role in penalty selection. We will revise the abstract to briefly describe how cross-validation integrates with the reviewed regularised estimation procedures for spatiotemporal models. revision: yes
-
Referee: [mixed-effects model setup paragraph] The abstract claims that the mixed-effects model setup 'allows for a variety of spatiotemporal models', but to substantiate the organizational contribution, the manuscript should explicitly demonstrate how standard models (e.g., Gaussian processes, spatial autoregressive models) map onto this setup in the relevant section.
Authors: We accept this point as it strengthens the organizational contribution. In the section on the mixed-effects model setup, we will add explicit mappings or examples showing how standard models such as Gaussian processes and spatial autoregressive models correspond to the framework, clarifying its unifying role across geostatistics and spatial econometrics. revision: yes
Circularity Check
No significant circularity; review paper with no derivations or fitted predictions
full rationale
This manuscript is a review article whose contribution is organizational: it groups existing regularised estimation methods (regressor selection, covariance shrinkage, conditional independence detection) under a mixed-effects model template that the abstract describes as allowing a variety of spatiotemporal models. No new theorems, estimators, equations, or empirical predictions are derived. The text illustrates external techniques via that lens rather than advancing deductive steps that could reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. Because the central claim is descriptive rather than predictive or deductive, and the abstract gives no indication of internal mismatch between claimed generality and reviewed methods, the paper is self-contained as a survey with no load-bearing circular elements.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ahrens, A. and Bhattacharjee, A. (2015). Two-step LASSO estimation of the spatial weights matrix. Econometrics, 3(1):128–155. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle . Akad´ emiai Kiad´ o, Budapest, Hungary. Al-Sulami, D., Jiang, Z., Lu, Z., and Zhu, J. (2019). On a semiparametric data-driven nonlinear mo...
work page 2015
-
[2]
Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). Gaussian predictive process models for large spatial data sets.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4):825–848. Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high- dimensional sparse models. Bernoulli, 19(2):521–5...
work page 2008
-
[3]
Bill´ e, A. G. and Arbia, G. (2019). Spatial limited dependent variable models: A review fo- cused on specification, estimation, and health economics applications. Journal of Economic Surveys, 33(5):1531–1554. Bio, A. M. F., De Becker, P., De Bie, E., Huybrechts, W., and Wassen, M. (2002). Prediction of plant species distribution in lowland river valleys ...
work page 2019
-
[4]
Carroll, R., Lawson, A. B., Faes, C., Kirby, R. S., Aregay, M., and Watjou, K. (2016a). Bayesian model selection methods in modeling small area colon cancer incidence. Annals of Epidemi- ology, 26(1):43–49. Carroll, R., Lawson, A. B., Faes, C., Kirby, R. S., Aregay, M., and Watjou, K. (2016b). Spatio- temporal Bayesian model selection for disease mapping....
work page 2018
-
[5]
Chernozhukov, V., Karl H¨ ardle, W., Huang, C., and Wang, W. (2021). LASSO-driven inference in time and space. The Annals of Statistics , 49(3):1702–1735,
work page 2021
-
[6]
Chu, T., Zhu, J., and Wang, H. (2011). Penalized maximum likelihood estimation and variable selection in geostatistics. The Annals of Statistics , 39(5):2607–2625,
work page 2011
-
[7]
Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 70(1):209–226. Cressie, N., Sainsbury-Dale, M., and Zammit-Mangion, A. (2022). Basis-function models in spatial statistics. Annual Review of Statistics and Its Application , 9:373–400. ...
work page 2008
-
[8]
Fass` o, A., Maranzano, P., and Otto, P. (2022). Spatiotemporal variable selection and air quality impact assessment of COVID-19 lockdown. Spatial Statistics, 49:100549. Fass` o, A., Rodeschini, J., Moro, A. F., Shaboviq, Q., Maranzano, P., Cameletti, M., Finazzi, F., Golini, N., Ignaccolo, R., and Otto, P. (2023). Agrimonia: a dataset on livestock, meteo...
work page 2022
-
[9]
34 Golub, G. H., Heath, M., and Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2):215–223. Gonella, R., Bourel, M., and Bel, L. (2022). Facing spatial massive data in science and society: Variable selection for spatial models. Spatial Statistics, page 100627. Gonz´ alez, J. A., Rodr´ ıguez...
work page 1979
-
[10]
Springer. Haworth, J. and Cheng, T. (2014). Graphical LASSO for local spatio-temporal neighbourhood selection. In Proceedings the GIS Research UK 22nd Annual Conference. Presented at the GISRUK, pages 425–433. Hewamalage, H., Ackermann, K., and Bergmeir, C. (2023). Forecast evaluation for data scien- tists: common pitfalls and best practices. Data Mining ...
work page 2014
-
[11]
Hjorth, U. and Hjort, U. (1982). Model selection and forward validation. Scandinavian Journal of Statistics, 9(2):95–105. Hofierka, J., Parajka, J., Mitasova, H., and Mitas, L. (2002). Multivariate interpolation of precipitation using regularized spline with tension. Transactions in GIS, 6(2):135–150. Hsiang, T. C. (1975). A Bayesian view on ridge regress...
-
[12]
Lichstein, J. W., Simons, T. R., Shriner, S. A., and Franzreb, K. E. (2002). Spatial autocorre- lation and autoregressive models in ecology. Ecological monographs, 72(3):445–463. Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(1/2):187–192. Linnenbrink, J., Mil` a, C., Ludwig, M., and Meyer, H. (2023). kNNDM: k-fold nearest neighbour distance...
-
[13]
Meyer, H. and Pebesma, E. (2022). Machine learning-based global maps of ecological variables and the challenge of assessing them. Nature Communications, 13(1):2208. 38 Meyer, H., Reudenbach, C., Hengl, T., Katurji, M., and Nauss, T. (2018). Improving per- formance of spatio-temporal machine learning models using forward feature selection and target-orient...
work page 2022
-
[14]
Porcu, E., Bevilacqua, M., and Genton, M. G. (2016). Spatio-temporal covariance and cross- covariance functions of the great circle distance on a sphere. Journal of the American Sta- tistical Association, 111(514):888–898. Porcu, E., Furrer, R., and Nychka, D. (2021). 30 years of space–time covariance functions. Wiley Interdisciplinary Reviews: Computatio...
work page 2016
-
[15]
Rue, H., Riebler, A., Sørbye, S. H., Illian, J. B., Simpson, D. P., and Lindgren, F. K. (2017). Bayesian computing with INLA: a review. Annual Review of Statistics and Its Application , 4:395–421. Safikhani, A., Kamga, C., Mudigonda, S., Faghih, S. S., and Moghimi, B. (2020). Spatio- temporal modeling of yellow taxi demands in New York City using generali...
-
[16]
Sch¨ afer, F., Katzfuss, M., and Owhadi, H. (2021). Sparse Cholesky factorization by Kullback– Leibler minimization. SIAM Journal on scientific computing , 43(3):A2019–A2046. Silverman, B. and Ramsay, J. (2002). Applied functional data analysis: methods and case studies. Springer New York, NY. Simon, N. and Tibshirani, R. (2012). Standardization and the g...
work page 2021
-
[17]
Sugasawa, S. and Murakami, D. (2021). Spatially clustered regression. Spatial Statistics , 44:100525. Tashman, L. J. (2000). Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting, 16(4):437–450. Telford, R. J. and Birks, H. J. B. (2005). The secret assumption of transfer functions: problems with spatial ...
-
[18]
Ver Hoef, J. M., Peterson, E. E., Hooten, M. B., Hanks, E. M., and Fortin, M.-J. (2018b). Spatial autoregressive models for statistical inference from ecological data. Ecological Mono- graphs, 88(1):36–59. Wang, H. and Zhu, J. (2009). Variable selection in spatial regression via penalized least squares. Canadian Journal of Statistics , 37(4):607–624. Wang...
work page 2009
-
[19]
Wen, Y., Shen, X., and Lu, Q. (2018). Genetic risk prediction using a spatial autoregressive model with adaptive LASSO. Statistics in Medicine , 37(26):3764–3775. Wikle, C. K., Berliner, L. M., and Cressie, N. (1998). Hierarchical Bayesian space-time models. Environmental and Ecological Statistics, 5(2):117–154. Wikle, C. K., Zammit-Mangion, A., and Cress...
work page 2018
-
[20]
Zhao, P. and Yu, B. (2006). On model selection consistency of LASSO. The Journal of Machine Learning Research, 7:2541–2563. Zhao, Y. and Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for docu- ment datasets. In Proceedings of the eleventh international conference on Information and knowledge management, pages 515–524. Zhu, J., Huang...
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.