Assessing the Impact of Block Size on Block Likelihood Estimation: A Comparative Study
Pith reviewed 2026-05-24 04:42 UTC · model grok-4.3
The pith
Larger block sizes do not always improve performance in block likelihood estimation for geostatistical data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper finds that both simulation experiments and real-data analyses of sea surface temperature challenge the prevailing assumption that larger block sizes invariably lead to improved statistical performance in block likelihood estimation.
What carries the argument
The block size parameter in block likelihood estimation, which determines how data is partitioned for computation.
If this is right
- Statistical performance may not improve with larger blocks in all cases.
- Smaller blocks can offer comparable accuracy with lower computational demands.
- The assumption of monotonic improvement with block size does not hold generally.
- Block size selection requires careful consideration based on specific data characteristics.
Where Pith is reading between the lines
- This finding implies that defaulting to large blocks may waste resources in some applications.
- Optimal block sizes could vary depending on the underlying spatial dependence structure.
- Extending the study to more datasets would help confirm the generality of the results.
Load-bearing premise
The specific simulation setups and the sea surface temperature dataset are sufficiently representative of geostatistical problems to draw general conclusions about block size effects.
What would settle it
Observing consistent superior performance of larger blocks across a wider variety of simulation scenarios and multiple real datasets would falsify the challenge to the prevailing assumption.
Figures
read the original abstract
This paper focuses on block likelihood estimation for geostatistical data, a method that balances statistical accuracy and computational efficiency. Central to this approach is the choice of block size, which can significantly impact performance. This study contributes by providing a thorough numerical investigation of the effects of large versus small block configurations. Findings from both simulation experiments and real-data analyses of sea surface temperature challenge the prevailing assumption that larger block sizes invariably lead to improved statistical performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents simulation experiments and a real-data analysis of sea surface temperature to compare the effects of large versus small block sizes in block likelihood estimation for geostatistical data, concluding that larger blocks do not invariably yield improved statistical performance and thereby challenging a common assumption in the field.
Significance. If substantiated with adequate experimental detail and representative designs, the finding would be useful for practitioners choosing block sizes in spatial statistics, as it provides counter-examples to the prevailing view. The study employs both simulations and real data, which is a strength, but the single real dataset and unspecified simulation regimes limit the scope of the challenge to the assumption.
major comments (2)
- [Abstract] Abstract: no information is given on the simulation designs (correlation structures, sample sizes, block configurations, performance metrics, or statistical tests), error quantification, or data exclusion rules. This prevents evaluation of whether the reported findings actually support the central claim that larger blocks do not invariably improve performance.
- [Real-data analysis] The real-data component uses only a single dataset (sea surface temperature). Without additional datasets or regimes exhibiting strong long-range dependence, it is unclear whether the counter-examples are representative or special cases, which is load-bearing for the generalization that the assumption is challenged across geostatistical applications.
minor comments (1)
- Notation for block size and likelihood should be defined consistently at first use.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major comment below. The simulation designs are fully specified in the body of the manuscript; we agree the abstract would benefit from a concise summary of these elements.
read point-by-point responses
-
Referee: [Abstract] Abstract: no information is given on the simulation designs (correlation structures, sample sizes, block configurations, performance metrics, or statistical tests), error quantification, or data exclusion rules. This prevents evaluation of whether the reported findings actually support the central claim that larger blocks do not invariably improve performance.
Authors: The simulation designs are described in detail in Sections 3 (Methods) and 4 (Results), including exponential and Matérn correlation structures with varying range and smoothness parameters, sample sizes n = 200, 500, 1000, block sizes from 5×5 to 20×20, performance metrics (RMSE, coverage probability, interval length), paired statistical tests across block sizes, and error quantification via Monte Carlo standard errors over 1000 replications. No observations were excluded. To address the concern, we will revise the abstract to include a one-sentence summary of these design elements. revision: yes
-
Referee: [Real-data analysis] The real-data component uses only a single dataset (sea surface temperature). Without additional datasets or regimes exhibiting strong long-range dependence, it is unclear whether the counter-examples are representative or special cases, which is load-bearing for the generalization that the assumption is challenged across geostatistical applications.
Authors: The simulations explicitly include regimes with strong long-range dependence (power-law covariances with low smoothness and large range parameters). The sea-surface-temperature analysis is presented as a representative real-data illustration rather than an exhaustive survey. The central claim—that larger blocks do not invariably improve performance—is supported by the counter-examples already obtained; we therefore do not view expansion to additional real datasets as necessary for the stated contribution. revision: no
Circularity Check
No circularity: purely empirical comparative study
full rationale
The paper is a numerical investigation consisting of simulation experiments and one real-data analysis (sea surface temperature) that directly compares performance metrics across block sizes. No derivation chain, mathematical model, fitted parameters, or predictions appear; the central claim is an empirical observation that challenges an assumption based on observed results. No self-citations, ansatzes, or renamings are load-bearing. This is a standard non-circular empirical paper with independent content from its inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aho, A. V. and Hopcroft, J. E. (1974). The design and analysis of computer algorithms . Pearson Education India
work page 1974
-
[2]
Alegr \' a, A., Caro, S., Bevilacqua, M., Porcu, E., and Clarke, J. (2017). Estimating covariance functions of multivariate skew- G aussian random fields on the sphere. Spatial Statistics , 22:388--402
work page 2017
-
[3]
Bai, Y., Song, P. X.-K., and Raghunathan, T. (2012). Joint composite estimating functions in spatiotemporal models. Journal of the Royal Statistical Society Series B: Statistical Methodology , 74(5):799--824
work page 2012
-
[4]
Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society Series B: Statistical Methodology , 70(4):825--848
work page 2008
-
[5]
Bevilacqua, M., Alegria, A., Velandia, D., and Porcu, E. (2016). Composite likelihood inference for multivariate G aussian random fields. Journal of Agricultural, Biological, and Environmental Statistics , 21:448--469
work page 2016
-
[6]
Bevilacqua, M. and Gaetan, C. (2015). Comparing composite likelihood methods based on pairs for spatial G aussian random fields. Statistics and Computing , 25:877--892
work page 2015
-
[7]
Bevilacqua, M., Gaetan, C., Mateu, J., and Porcu, E. (2012). Estimating space and space-time covariance functions for large data sets: a weighted composite likelihood approach. Journal of the American Statistical Association , 107(497):268--280
work page 2012
-
[8]
Caama \ n o-Carrillo, C., Bevilacqua, M., L \'o pez, C., and Morales-O \ n ate, V. (2024). Nearest neighbors weighted composite likelihood based on pairs for (non-) G aussian massive spatial data with an application to T ukey-hh random fields estimation. Computational Statistics & Data Analysis , 191:107887
work page 2024
-
[9]
Cao, J., Genton, M. G., Keyes, D. E., and Turkiyyah, G. M. (2021). Sum of K ronecker products representation and its C holesky factorization for spatial covariance matrices from large grids. Computational Statistics & Data Analysis , 157:107165
work page 2021
-
[10]
Caragea, P. C. and Smith, R. L. (2007). Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models. Journal of Multivariate Analysis , 98(7):1417--1440
work page 2007
-
[11]
Castruccio, S., Huser, R., and Genton, M. G. (2016). High-order composite likelihood inference for max-stable distributions and processes. Journal of Computational and Graphical Statistics , 25(4):1212--1229
work page 2016
-
[12]
Chiles, J.-P. and Delfiner, P. (2012). Geostatistics: modeling spatial uncertainty , volume 713. John Wiley & Sons
work page 2012
-
[13]
Cressie, N. (2015). Statistics for spatial data . John Wiley & Sons
work page 2015
-
[14]
Curriero, F. C. and Lele, S. (1999). A composite likelihood approach to semivariogram estimation. Journal of Agricultural, Biological, and Environmental Statistics , 4(1):9--28
work page 1999
-
[15]
Eidsvik, J., Shaby, B. A., Reich, B. J., Wheeler, M., and Niemi, J. (2014). Estimation and prediction in spatial models with block composite likelihoods. Journal of Computational and Graphical Statistics , 23(2):295--315
work page 2014
-
[16]
Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. Journal of the American Statistical Association , 102(477):321
work page 2007
-
[17]
Furrer, R., Genton, M. G., and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics , 15(3):502--523
work page 2006
-
[18]
Heagerty, P. J. and Lele, S. R. (1998). A composite likelihood approach to binary spatial data. Journal of the American Statistical Association , 93(443):1099--1111
work page 1998
-
[19]
Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., et al. (2019). A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological and Environmental Statistics , 24:398--425
work page 2019
-
[20]
Hong, Y., Song, Y., Abdulah, S., Sun, Y., Ltaief, H., Keyes, D. E., and Genton, M. G. (2023). The third competition on spatial statistics for large datasets. Journal of Agricultural, Biological and Environmental Statistics , 28:618--635
work page 2023
-
[21]
Huang, H., Abdulah, S., Sun, Y., Ltaief, H., Keyes, D. E., and Genton, M. G. (2021). Competition on spatial statistics for large datasets. Journal of Agricultural, Biological and Environmental Statistics , 26:580--595
work page 2021
-
[22]
Ishii, M., Shouji, A., Sugimoto, S., and Matsumoto, T. (2005). Objective analyses of sea-surface temperature and marine meteorological variables for the 20th century using icoads and the kobe collection. International Journal of Climatology , 25(7):865--879
work page 2005
-
[23]
Katzfuss, M. and Guinness, J. (2021). A General Framework for Vecchia Approximations of Gaussian Processes . Statistical Science , 36(1):124 -- 141
work page 2021
-
[24]
Kaufman, C. G., Schervish, M. J., and Nychka, D. W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. Journal of the American Statistical Association , 103(484):1545--1555
work page 2008
-
[25]
Lindgren, F., Rue, H., and Lindstr \"o m, J. (2011). An explicit link between G aussian fields and G aussian M arkov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society Series B: Statistical Methodology , 73(4):423--498
work page 2011
-
[26]
Lindsay, B. G. (1988). Composite likelihood methods. Comtemporary Mathematics , 80(1):221--239
work page 1988
-
[27]
Litvinenko, A., Kriemann, R., Genton, M. G., Sun, Y., and Keyes, D. E. (2020). Hlibcov: Parallel hierarchical matrix approximation of large covariance matrices and likelihoods with applications in parameter identification. MethodsX , 7:100600
work page 2020
-
[28]
Ruli, E., Sartori, N., and Ventura, L. (2016). Approximate B ayesian computation with composite score functions. Statistics and Computing , 26(3):679--692
work page 2016
-
[29]
Rulli \`e re, D., Durrande, N., Bachoc, F., and Chevalier, C. (2018). Nested kriging predictions for datasets with a large number of observations. Statistics and Computing , 28:849--867
work page 2018
-
[30]
Sang, H. and Genton, M. G. (2014). Tapered composite likelihood for spatial max-stable models. Spatial Statistics , 8:86--103
work page 2014
-
[31]
Shaby, B. and Ruppert, D. (2012). Tapered covariance: Bayesian estimation and asymptotics. Journal of Computational and Graphical Statistics , 21(2):433--452
work page 2012
-
[32]
Stein, M. L. (2012). Interpolation of spatial data: some theory for kriging . Springer Science & Business Media
work page 2012
-
[33]
Stein, M. L., Chen, J., and Anitescu, M. (2013). Stochastic approximation of score functions for Gaussian processes . The Annals of Applied Statistics , 7(2):1162 -- 1191
work page 2013
-
[34]
Stein, M. L., Chi, Z., and Welty, L. J. (2004). Approximating likelihoods for large spatial data sets. Journal of the Royal Statistical Society Series B: Statistical Methodology , 66(2):275--296
work page 2004
-
[35]
Sun, Y. and Stein, M. L. (2016). Statistically and computationally efficient estimating equations for large spatial datasets. Journal of Computational and Graphical Statistics , 25(1):187--208
work page 2016
-
[36]
Varin, C. (2008). On composite marginal likelihoods. AStA Advances in Statistical Analysis , 92(1):1--28
work page 2008
-
[37]
Varin, C., Reid, N., and Firth, D. (2011). An overview of composite likelihood methods. Statistica Sinica , 21(1):5--42
work page 2011
-
[38]
Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society Series B: Statistical Methodology , 50(2):297--312
work page 1988
-
[39]
Whittle, P. (1954). On stationary processes in the plane. Biometrika , 41(3/4):434--449
work page 1954
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.