Coarse-to-fine spatial GLMM for scalable prediction and multiscale analysis

Alexis Comber; Chris Brunsdon; Daisuke Murakami; Narumasa Tsutsumida; Takahiro Yoshida; Tomoki Nakaya

arxiv: 2605.01157 · v1 · submitted 2026-05-01 · 📊 stat.ME

Coarse-to-fine spatial GLMM for scalable prediction and multiscale analysis

Daisuke Murakami , Alexis Comber , Takahiro Yoshida , Narumasa Tsutsumida , Chris Brunsdon , Tomoki Nakaya This is my paper

Pith reviewed 2026-05-09 18:18 UTC · model grok-4.3

classification 📊 stat.ME

keywords spatial GLMMcoarse-to-fine modelingdegeneracy problemscalable spatial predictionmultiscale analysiscount dataspatial statistics

0 comments

The pith

Extending the coarse-to-fine framework to GLMMs produces a CF-GLMM that resolves the degeneracy problem in spatial predictions for count data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends the coarse-to-fine spatial modeling framework, previously limited to Gaussian responses, to generalized linear mixed models with a focus on count data. This produces the CF-GLMM, which the authors present as a way to deliver scalable spatial predictions while sidestepping the degeneracy that commonly affects standard spatial GLMMs. Monte Carlo experiments test the approach on prediction accuracy and multiscale feature extraction, and the method is demonstrated on COVID-19 case data. An R package implements the full procedure for practical use.

Core claim

The central claim is that the coarse-to-fine approximation, when adapted to GLMM responses, yields a model that maintains computational scalability and numerical stability for large spatial datasets, thereby overcoming the degeneracy issues that arise in conventional spatial GLMMs and enabling both accurate spatial prediction and extraction of features at multiple scales.

What carries the argument

The CF-GLMM, which applies the coarse-to-fine hierarchical approximation to the spatial random effects within a generalized linear mixed model so that the process is modeled from coarse to fine resolutions while preserving the non-Gaussian response structure.

If this is right

Spatial prediction becomes feasible for large count datasets without encountering the degeneracy that halts conventional GLMM fitting.
Multiscale feature extraction can be performed directly within the same fitted model rather than requiring separate post-processing steps.
Real-world count processes such as disease incidence can be analyzed at varying spatial resolutions with a single computational run.
An open-source R implementation allows immediate application to new datasets while reproducing the reported scalability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coarse-to-fine structure could be tested on binary or zero-inflated count responses to check whether the degeneracy fix generalizes beyond Poisson GLMMs.
The hierarchical approximation might be combined with other spatial covariance functions to see if further gains in speed or accuracy appear.
Applying the method to non-epidemiological count data, such as species abundance or traffic incidents, would test whether the multiscale benefits hold outside the COVID-19 demonstration.

Load-bearing premise

That extending the coarse-to-fine approximation from Gaussian responses to GLMM responses preserves scalability and stability without introducing new approximation errors or instabilities that undermine predictions or multiscale extraction.

What would settle it

A Monte Carlo experiment on large spatial count data in which the CF-GLMM produces prediction errors, convergence failures, or scale-dependent biases that are comparable to or larger than those from a standard spatial GLMM.

Figures

Figures reproduced from arXiv: 2605.01157 by Alexis Comber, Chris Brunsdon, Daisuke Murakami, Narumasa Tsutsumida, Takahiro Yoshida, Tomoki Nakaya.

**Figure 2.** Figure 2: Simulated counts 𝑦(𝑠") with 𝑁 = 20,000. Spatial processes were also assumed in the covariables as follows: 𝑥K(𝑠") = 0.5𝑧K(𝑠") + 0.5𝑒K(𝑠"), 𝑒K(𝑠")~𝑁(0,1), 𝑧K(𝑠") = W 𝑤* ;)H6ID𝑑"JF 5 J0! 𝑢KD𝑠JF, 𝑢K(𝑠")~𝑁(0,1). (13) The simulated samples are generated 200 times for each case with intercept 𝛽G ∈ {−1.5, 0.5} and training sample sizes 𝑁 ∈ {500,1000,2000,3000,6000,12000,20000}. As shown in view at source ↗

**Figure 4.** Figure 4: , which shows boxplots of the estimated 𝛽! values. The GLM estimate is upwardly biased due to an ignorance of the latent process. The bias of CF-GLMM is the smallest owing to the accurate spatial process modeling, confirming the usefulness for regression analysis in the presence of spatial dependence (see LeSage and Pace, 2009) view at source ↗

**Figure 9.** Figure 9: Standard deviations of each scale-wise process view at source ↗

read the original abstract

Although a recent study suggested that coarse-to-fine learning provides a fast and flexible framework for large-scale spatial process modeling, the method was originally developed for Gaussian responses, limiting its applicability. To address this limitation, we extended the coarse-to-fine spatial modeling (CFSM) framework to accommodate spatial generalized linear mixed models (GLMMs), with a particular focus on count data. The resulting model, referred to as CF-GLMM efficiently addresses the degeneracy problem often encountered in conventional spatial GLMMs. The performance of the proposed CF-GLMMs was evaluated in terms of spatial prediction and multiscale feature extraction via Monte Carlo experiments. Finally, we applied the proposed method to the analysis of coronavirus disease 2019 (COVID-19). The proposed method is implemented in an R package spCF (https://cran.r-project.org/web/packages/spCF/).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Extends CFSM to GLMMs for count data with code and experiments, but the degeneracy fix rests on unshown approximation details.

read the letter

The main point is that this paper takes the existing coarse-to-fine spatial modeling framework, which was limited to Gaussian responses, and extends it to GLMMs focused on count data. The result is called CF-GLMM, and the claim is that it resolves the degeneracy that often appears in standard spatial GLMMs when the range parameter gets large or variance collapses. They back this with Monte Carlo runs on prediction accuracy and multiscale extraction, a COVID-19 application, and an R package spCF on CRAN. That package and the experiments are the practical parts worth noting; they make the method usable for people who actually fit these models on big datasets. The extension itself is new relative to the cited Gaussian-only work. The math follows the prior structure, so the core logic is clear. The soft spot is the handling of the non-Gaussian case. The abstract does not spell out the inner approximation (Laplace, PQL, or whatever they use) or show diagnostics like condition numbers or effective degrees of freedom in the regimes where degeneracy is worst. The stress-test note is reasonable here: if the fine-scale integration is inexact, the effective covariance can still become ill-conditioned. Without those checks reported, it is hard to know whether the hierarchy truly prevents the problem or just moves it. The citation pattern is direct and appropriate. This is for spatial statisticians and applied users in epidemiology or ecology who need scalable count models. A reader who wants code and a working alternative to off-the-shelf GLMMs will get something from it. It deserves a serious referee because the idea targets a known pain point and ships usable software, even if the validation section needs tightening on the approximation error.

Referee Report

2 major / 2 minor

Summary. The paper extends the coarse-to-fine spatial modeling (CFSM) framework, originally for Gaussian responses, to spatial generalized linear mixed models (GLMMs) for count data. The resulting CF-GLMM is claimed to efficiently resolve the degeneracy problem common in conventional spatial GLMMs. Performance is assessed via Monte Carlo experiments on spatial prediction and multiscale feature extraction, followed by an application to COVID-19 data; an R package spCF is provided.

Significance. If the extension preserves scalability and stability for non-Gaussian responses without reintroducing approximation-induced instabilities, the work would offer a practical advance for large-scale spatial analysis of count data, supporting both prediction and interpretable multiscale decomposition. The open-source R package strengthens reproducibility and potential adoption in applied fields such as epidemiology.

major comments (2)

Abstract: The central claim that CF-GLMM 'efficiently addresses the degeneracy problem' is presented without any description of the inner approximation (Laplace, variational, or penalized quasi-likelihood) required to integrate the latent field under the non-Gaussian likelihood; this omission prevents assessment of whether the coarse-to-fine basis remains exact or stable at fine scales.
Monte Carlo experiments (as summarized in the abstract): No details are given on whether the simulations included the high-range or low-variance regimes known to induce degeneracy in spatial GLMMs, nor on any diagnostic quantities (condition number of the effective covariance, effective degrees of freedom, or posterior contraction of the range parameter) used to verify resolution of the degeneracy.

minor comments (2)

Abstract: The phrase 'a recent study suggested that coarse-to-fine learning provides a fast and flexible framework' requires a specific citation to the prior Gaussian work.
Implementation: While the CRAN link for spCF is welcome, the manuscript should briefly describe the main exported functions and their usage to support immediate reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our extension of the coarse-to-fine spatial modeling framework to GLMMs. The comments identify opportunities to strengthen clarity in the abstract and experimental reporting. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract: The central claim that CF-GLMM 'efficiently addresses the degeneracy problem' is presented without any description of the inner approximation (Laplace, variational, or penalized quasi-likelihood) required to integrate the latent field under the non-Gaussian likelihood; this omission prevents assessment of whether the coarse-to-fine basis remains exact or stable at fine scales.

Authors: We agree that the abstract would be improved by briefly specifying the approximation method. We will revise the abstract to note that a Laplace approximation is employed to integrate the latent field under the Poisson likelihood, preserving the stability properties of the coarse-to-fine basis at fine scales. revision: yes
Referee: Monte Carlo experiments (as summarized in the abstract): No details are given on whether the simulations included the high-range or low-variance regimes known to induce degeneracy in spatial GLMMs, nor on any diagnostic quantities (condition number of the effective covariance, effective degrees of freedom, or posterior contraction of the range parameter) used to verify resolution of the degeneracy.

Authors: We agree that the abstract summary of the Monte Carlo experiments lacks these specifics. We will revise the manuscript to explicitly state that the simulation design includes high-range and low-variance parameter regimes known to induce degeneracy, and we will report the suggested diagnostic quantities (condition numbers, effective degrees of freedom, and range-parameter behavior) to demonstrate resolution of the degeneracy issue. revision: yes

Circularity Check

0 steps flagged

No significant circularity; extension and evaluation are independent of inputs

full rationale

The paper extends an existing coarse-to-fine spatial modeling framework (originally for Gaussian responses) to GLMMs for count data, with the central claim that the resulting CF-GLMM addresses degeneracy supported by Monte Carlo experiments on prediction accuracy and multiscale extraction plus a COVID-19 case study. No derivation step reduces by construction to a fitted parameter, self-defined quantity, or load-bearing self-citation chain; the prior Gaussian work is cited only as motivation for the extension, which is then validated externally. The derivation chain remains self-contained against the reported benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are described. The model likely relies on standard GLMM link functions, spatial covariance structures, and the original CFSM hierarchy without new postulated entities.

pith-pipeline@v0.9.0 · 5465 in / 996 out tokens · 40838 ms · 2026-05-09T18:18:50.276300+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

Introduction Statistical models have been developed to accommodate diverse types of spatial and spatiotemporal data, including counts and binary responses. Among them, spatial generalized linear mixed models (GLMMs; Diggle et al., 1998), which extend generalized linear models (GLMs) to incorporate latent spatial processes, are widely used in ecology (e.g....

work page 1998
[2]

For example, even if the true spatial process consists of both small- and large- scale components with distinct interpretations, they cannot be identified separately

and other fields. For example, even if the true spatial process consists of both small- and large- scale components with distinct interpretations, they cannot be identified separately. Third, because 5 spatial GLMMs rely on likelihood-based inference, they are not readily integrated into validation-loss- driven optimization pipelines commonly used in mode...

work page 2026
[3]

),…,𝑧#(𝑠

Coarse-to-fine spatial modeling (CFSM) This section introduces the Gaussian CFSM (Murakami et al., 2026). The CFSM considers a multiscale process that consists of scale-wise components 𝑧!(𝑠"),…,𝑧#(𝑠") corresponding to the 6 bandwidth values ℎ!,…,ℎ#, where ℎ$=𝛼ℎ$%!, with 0<𝛼<1. In other words, 𝑧$(𝑠") represents the r-th largest-scale process where 𝑟∈{1,…,𝑅...

work page 2026
[4]

)~𝑃D𝜇(𝑠")F,𝑔D𝜇(𝑠

CFSM-based spatial GLMM (CF-GLMM) This section develops a CFSM-based spatial GLMM, which we will refer to CF-GLMM. Section 3.1 introduces the model, followed by Section 3.2, which defines the deviance loss function 9 minimized to optimize the model. Section 3.3 describes the optimization algorithm. Section 3.4 describes uncertainty modeling of the model. ...

work page
[5]

), estimate 𝑧̂#(𝑠

Given 𝑧̂!:#%!(𝑠"), estimate 𝑧̂#(𝑠") and 𝛃 to minimize the weighted squared loss for the training samples (Eq. 9): W𝑤p3D𝑠"(Fq𝜂̂D𝑠"(F−𝐱D𝑠"(F2𝛃−𝑜S!:#%!D𝑠"(F−𝑧#D𝑠"(Fs'5( "(0! , (10) where 𝑖>∈{1>,…,𝑁>} represents an index for the training samples. 𝑜S!:#%!D𝑠"(F=𝑜D𝑠"(F+ 𝑧̂!:#%!D𝑠">F is a given offset variable. Since Eq. (10) is identical to the loss function of ...

work page 2026
[6]

&F, and 𝑧̂#D𝑠

Evaluate the validation loss 𝐿𝑜𝑠𝑠# of the model given 𝛃v#, 𝑧̂!:#%!D𝑠"&F, and 𝑧̂#D𝑠"&F: (a) If 𝐿𝑜𝑠𝑠#%!<𝐿𝑜𝑠𝑠#, 𝛃v=𝛃v# and 𝑜S!:#(𝑠")=𝑜S!:#%!(𝑠")+𝑧̂#(𝑠"), reset the counter 𝑄= 0, and go to step 4. (b) Otherwise, 𝑄→𝑄+1. 𝑜S!:#(𝑠")=𝑜S!:#%!(𝑠"). If 𝑄 is less than a threshold value, which is 5 in our case, proceed to Step 4. Otherwise, 𝑅 is the terminal resolution...

work page
[7]

&F,...,𝑧̂#D𝑠

Update scale 𝑅→𝑅+1, reduce the bandwidth ℎ#F!=𝛿ℎ#, where we assumed 𝛿=0.9, and go back to step 1. In short, this algorithm sequentially estimates 𝑧̂!D𝑠"&F,...,𝑧̂#D𝑠"&F until the deviance loss no longer improves. Owing to Step 3, this algorithm never increases the deviance loss over the iterations. 3.4.Predictive variance For computational simplicity, we a...

work page 2026
[8]

∈{𝑠!,…,𝑠5} randomly distributed within the region [0, 1] × [0, 1]. A sample was generated for each site as follows: 𝑦(𝑠

Monte Carlo experiments 1: Spatial prediction Sections 4 and 5 present Monte Carlo experiments that investigate the performance of the proposed method in terms of predictive accuracy and multiscale feature extraction, with a focus on modeling count data. See Appendix 1 for an additional experiment examining the predictive accuracy for binary responses. 16...

work page 2000
[9]

) is replaced with the following multiscale process: 𝑧(𝑠

Monte Carlo experiments 2: Multiscale analysis 5.1.Outline In this section, we evaluate the performance of the proposed method in terms of multiscale spatial feature extraction. The same count data generation process as in Section 3 was assumed, except that the spatial process 𝑧(𝑠") is replaced with the following multiscale process: 𝑧(𝑠")=W𝑍M(𝑠")NM0!,𝑍M(𝑠...

work page 2006
[10]

The study periods include an early period (January–May 2020) and a late period (July–December 2021)

Application 6.1.Outline This section applies the proposed method to analyze coronavirus disease 2019 (COVID-19) cases in Tokyo Prefecture, Japan. The study periods include an early period (January–May 2020) and a late period (July–December 2021). The early period corresponds to an initial outbreak, characterized by limited testing and strict interventions...

work page 2019
[11]

),…,𝑧̂#(𝑠

Concluding remarks This study extends the CFSM framework originally developed for Gaussian data to CF- GLMM accounting for count, binary, and other exponential family data. Unlike conventional spatial GLMMs, which rely on covariance modeling, our method is based on local modeling, offering a novel perspective. Although the proposed method can be regarded ...

work page arXiv 2022

[1] [1]

Introduction Statistical models have been developed to accommodate diverse types of spatial and spatiotemporal data, including counts and binary responses. Among them, spatial generalized linear mixed models (GLMMs; Diggle et al., 1998), which extend generalized linear models (GLMs) to incorporate latent spatial processes, are widely used in ecology (e.g....

work page 1998

[2] [2]

For example, even if the true spatial process consists of both small- and large- scale components with distinct interpretations, they cannot be identified separately

and other fields. For example, even if the true spatial process consists of both small- and large- scale components with distinct interpretations, they cannot be identified separately. Third, because 5 spatial GLMMs rely on likelihood-based inference, they are not readily integrated into validation-loss- driven optimization pipelines commonly used in mode...

work page 2026

[3] [3]

),…,𝑧#(𝑠

Coarse-to-fine spatial modeling (CFSM) This section introduces the Gaussian CFSM (Murakami et al., 2026). The CFSM considers a multiscale process that consists of scale-wise components 𝑧!(𝑠"),…,𝑧#(𝑠") corresponding to the 6 bandwidth values ℎ!,…,ℎ#, where ℎ$=𝛼ℎ$%!, with 0<𝛼<1. In other words, 𝑧$(𝑠") represents the r-th largest-scale process where 𝑟∈{1,…,𝑅...

work page 2026

[4] [4]

)~𝑃D𝜇(𝑠")F,𝑔D𝜇(𝑠

CFSM-based spatial GLMM (CF-GLMM) This section develops a CFSM-based spatial GLMM, which we will refer to CF-GLMM. Section 3.1 introduces the model, followed by Section 3.2, which defines the deviance loss function 9 minimized to optimize the model. Section 3.3 describes the optimization algorithm. Section 3.4 describes uncertainty modeling of the model. ...

work page

[5] [5]

), estimate 𝑧̂#(𝑠

Given 𝑧̂!:#%!(𝑠"), estimate 𝑧̂#(𝑠") and 𝛃 to minimize the weighted squared loss for the training samples (Eq. 9): W𝑤p3D𝑠"(Fq𝜂̂D𝑠"(F−𝐱D𝑠"(F2𝛃−𝑜S!:#%!D𝑠"(F−𝑧#D𝑠"(Fs'5( "(0! , (10) where 𝑖>∈{1>,…,𝑁>} represents an index for the training samples. 𝑜S!:#%!D𝑠"(F=𝑜D𝑠"(F+ 𝑧̂!:#%!D𝑠">F is a given offset variable. Since Eq. (10) is identical to the loss function of ...

work page 2026

[6] [6]

&F, and 𝑧̂#D𝑠

Evaluate the validation loss 𝐿𝑜𝑠𝑠# of the model given 𝛃v#, 𝑧̂!:#%!D𝑠"&F, and 𝑧̂#D𝑠"&F: (a) If 𝐿𝑜𝑠𝑠#%!<𝐿𝑜𝑠𝑠#, 𝛃v=𝛃v# and 𝑜S!:#(𝑠")=𝑜S!:#%!(𝑠")+𝑧̂#(𝑠"), reset the counter 𝑄= 0, and go to step 4. (b) Otherwise, 𝑄→𝑄+1. 𝑜S!:#(𝑠")=𝑜S!:#%!(𝑠"). If 𝑄 is less than a threshold value, which is 5 in our case, proceed to Step 4. Otherwise, 𝑅 is the terminal resolution...

work page

[7] [7]

&F,...,𝑧̂#D𝑠

Update scale 𝑅→𝑅+1, reduce the bandwidth ℎ#F!=𝛿ℎ#, where we assumed 𝛿=0.9, and go back to step 1. In short, this algorithm sequentially estimates 𝑧̂!D𝑠"&F,...,𝑧̂#D𝑠"&F until the deviance loss no longer improves. Owing to Step 3, this algorithm never increases the deviance loss over the iterations. 3.4.Predictive variance For computational simplicity, we a...

work page 2026

[8] [8]

∈{𝑠!,…,𝑠5} randomly distributed within the region [0, 1] × [0, 1]. A sample was generated for each site as follows: 𝑦(𝑠

Monte Carlo experiments 1: Spatial prediction Sections 4 and 5 present Monte Carlo experiments that investigate the performance of the proposed method in terms of predictive accuracy and multiscale feature extraction, with a focus on modeling count data. See Appendix 1 for an additional experiment examining the predictive accuracy for binary responses. 16...

work page 2000

[9] [9]

) is replaced with the following multiscale process: 𝑧(𝑠

Monte Carlo experiments 2: Multiscale analysis 5.1.Outline In this section, we evaluate the performance of the proposed method in terms of multiscale spatial feature extraction. The same count data generation process as in Section 3 was assumed, except that the spatial process 𝑧(𝑠") is replaced with the following multiscale process: 𝑧(𝑠")=W𝑍M(𝑠")NM0!,𝑍M(𝑠...

work page 2006

[10] [10]

The study periods include an early period (January–May 2020) and a late period (July–December 2021)

Application 6.1.Outline This section applies the proposed method to analyze coronavirus disease 2019 (COVID-19) cases in Tokyo Prefecture, Japan. The study periods include an early period (January–May 2020) and a late period (July–December 2021). The early period corresponds to an initial outbreak, characterized by limited testing and strict interventions...

work page 2019

[11] [11]

),…,𝑧̂#(𝑠

Concluding remarks This study extends the CFSM framework originally developed for Gaussian data to CF- GLMM accounting for count, binary, and other exponential family data. Unlike conventional spatial GLMMs, which rely on covariance modeling, our method is based on local modeling, offering a novel perspective. Although the proposed method can be regarded ...

work page arXiv 2022