pith. sign in

arxiv: 2511.17725 · v3 · submitted 2025-11-21 · 📊 stat.ME

A Unified Spatiotemporal Framework for Modeling Censored and Missing Areal Responses

Pith reviewed 2026-05-17 19:55 UTC · model grok-4.3

classification 📊 stat.ME
keywords spatiotemporal modelingcensored datamissing observationsareal dataGaussian Markov random fieldsSAR modelDAGAR modelBayesian spatial statistics
0
0 comments X

The pith

A Bayesian spatiotemporal model unifies SAR and DAGAR structures with temporal autoregression to handle censored and missing areal responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a flexible random effect that merges spatial dependence from Simultaneous Autoregressive and Directed Acyclic Graph Autoregressive models with a temporal autoregressive component. This creates a unified framework for modeling spatiotemporal areal data that includes censored observations and missing values. The approach expresses these models as Gaussian Markov random fields in their innovation form, allowing interpretable capture of spatial, temporal, and joint correlations. Simulations demonstrate superior performance over common imputation techniques like using the limit of detection for censored data or sample means for missing data. Application to Beijing air quality data shows competitive predictive accuracy with improved interpretability compared to traditional conditional autoregressive models.

Core claim

The proposed formulation extends both SAR and DAGAR spatial models into a unified spatiotemporal framework by combining them with a temporal autoregressive component and expressing the result as Gaussian Markov random fields in innovation form. This captures the joint spatiotemporal dependence structure for areal responses that may be censored or missing, outperforming ad hoc imputation in simulations and providing clearer interpretability in real data applications.

What carries the argument

The combined SAR/DAGAR spatial dependence with temporal AR random effect, formulated as a Gaussian Markov random field in innovation form.

Load-bearing premise

The combined SAR or DAGAR spatial model plus temporal AR random effect sufficiently represents the true joint spatiotemporal dependence in the presence of censoring and missing observations.

What would settle it

If in controlled simulations with known true spatiotemporal correlation structure the proposed model shows no improvement or worse predictive performance than simple imputation methods like LOD replacement, the advantage would be falsified.

Figures

Figures reproduced from arXiv: 2511.17725 by Jose A. Ordo\~nez, Luis M. Castro, Tsung-I Lin, Victor H. Lachos.

Figure 1
Figure 1. Figure 1: Time series of log(CO) concentrations at twelve air-quality monitoring stations in Beijing from December to March. 4 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Directed acyclic graph representations of AR(1) and AR(2) processes. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Simulation study I. Credible interval lengths for the covariance structure parameters considering a censoring [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Simulation study II. √ MSPE and credible interval length considering a censoring level of 15% and a missing level of 5%, for one, three and seven-step-ahead predictions for N = 500. 6 Beijing CO concentrations spatiotemporal modeling In this section, we apply our spatiotemporal model proposals to the CO concentrations obtained from the Beijing multi-station air quality dataset described in Section 2. Figur… view at source ↗
Figure 5
Figure 5. Figure 5: Beijing air pollutant data. Observed and predicted [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Simulation study 1 - DAGAR model. Credible interval lenghts for the mean structure parameters considering [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Simulation study 1 - DAGAR model. Credible interval lengths for the mean structure parameters with [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Simulation study 1 - DAGAR model. Credible interval lengths for the covariance structure parameters with [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Simulation study 1 - SAR model. Credible interval lenghts for the mean structure parameters considering a [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Simulation study 1 - SAR model. Credible interval lengths for the covariance structure parameters consid [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Simulation study 1 - SAR model. Credible interval lenghts for the mean structure parameters with censoring [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Simulation study 1 - SAR model. Credible interval lengths for the covariance structure parameters with [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Simulation study 2 - DAGAR model. Comparison of our proposal (NST-CLG ) with methods that impute [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Simulation study 2 - DAGAR model. Comparison of our proposal (NST-CLG ) with methods that impute [PITH_FULL_IMAGE:figures/full_fig_p029_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Simulation study 2 - SAR model. Comparison of our proposal (NST-CLG ) with methods that impute the [PITH_FULL_IMAGE:figures/full_fig_p030_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Simulation study 2 - SAR model. Comparison of our proposal (NST-CLG ) with methods that impute the [PITH_FULL_IMAGE:figures/full_fig_p031_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Simulation study 2 - SAR model. Comparison of our proposal (NST-CLG ) with methods that impute the [PITH_FULL_IMAGE:figures/full_fig_p032_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Beijing air pollutants data. Autocorrelation function of log-transformed CO concentrations, with missing [PITH_FULL_IMAGE:figures/full_fig_p033_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Beijing air pollutants data. Partial autocorrelation function of log-transformed CO concentrations, with [PITH_FULL_IMAGE:figures/full_fig_p034_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Beijing air pollutants data. Moran’s I statistics for log-transformed CO concentrations at different time [PITH_FULL_IMAGE:figures/full_fig_p035_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Beijing air pollutants data. Observed and predicted Log(CO) concentrations for the DAGAR model across [PITH_FULL_IMAGE:figures/full_fig_p036_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Beijing air pollutants data. Spatial network of the twelve air-quality monitoring sites in Beijing. The [PITH_FULL_IMAGE:figures/full_fig_p036_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Beijing air pollutants data.. Posterior distribution of the parameters for the DAGAR -AR(1) model. [PITH_FULL_IMAGE:figures/full_fig_p037_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Beijing air pollutants data. Posterior chains of parameters for the DAGAR -AR(1) model. [PITH_FULL_IMAGE:figures/full_fig_p038_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Beijing air pollutants data. ACF of the parameters for the DAGAR -AR(1) model. [PITH_FULL_IMAGE:figures/full_fig_p039_25.png] view at source ↗
read the original abstract

We propose a new Bayesian approach for spatiotemporal areal data with censored and missing observations. The method introduces a flexible random effect that combines the spatial dependence structures of the Simultaneous Autoregressive (SAR) and Directed Acyclic Graph Autoregressive (DAGAR) models with a temporal autoregressive component. We demonstrate that this formulation extends both spatial models into a unified spatiotemporal framework, expressing them as Gaussian Markov random fields in their innovation form. The resulting model captures spatial, temporal, and joint spatiotemporal correlations in an interpretable way. Simulation studies show that the proposed model outperforms common ad hoc imputation strategies, such as replacing censored values with the limit of detection (LOD) or imputing missing data by the sample mean. We further apply the method to carbon monoxide (CO) concentration data from Beijing's air quality network, comparing the proposed DAGAR-AR model with the traditional Conditional Autoregressive (CAR) approach. The results indicate that while the CAR model achieves slightly better predictive performance, the DAGAR-AR specification offers clearer interpretability and a more coherent representation of the spatiotemporal dependence structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a Bayesian unified spatiotemporal framework for areal data subject to censoring and missing observations. By combining the spatial dependence from SAR and DAGAR models with a temporal autoregressive random effect, the approach is formulated as a Gaussian Markov random field in its innovation form. This allows for interpretable capture of spatial, temporal, and spatiotemporal correlations. Simulation studies are used to show that the model outperforms common ad hoc strategies like LOD replacement for censored values and mean imputation for missing data. The method is then applied to carbon monoxide concentration data from Beijing's air quality monitoring network, with comparisons to the traditional CAR model highlighting trade-offs between predictive accuracy and interpretability.

Significance. If the central claims regarding the unification and improved performance hold, this framework could provide a more coherent and statistically principled alternative to ad hoc imputation methods for handling incomplete spatiotemporal areal data. Such data are common in environmental science and public health, making the contribution potentially significant. The explicit representation as GMRFs in innovation form is a strength that facilitates understanding of the dependence structure. However, the real-data results showing comparable or slightly inferior predictive performance for the proposed model compared to CAR suggest that the practical advantages may be context-dependent.

major comments (2)
  1. [Simulation studies] The description of the simulation studies does not specify the exact data-generating process used to create the censored and missing observations. Given that the outperformance is claimed against ad hoc methods, it is critical to clarify whether the simulated data were generated from the proposed SAR/DAGAR-AR model or from an independent mechanism. If the former, the results may not adequately test the model robustness to departures from the assumed dependence structure, such as non-stationary spatial effects or additional noise components.
  2. [Methods] The handling of censoring in the likelihood is not detailed. Please specify the form of the contribution to the likelihood for censored observations (e.g., the integral or cumulative probability up to the limit of detection) and how it integrates with the GMRF precision matrix construction.
minor comments (2)
  1. [Abstract] The abstract would benefit from including specific quantitative results from the simulations, such as error metrics or improvement percentages, to better convey the performance gains.
  2. [Notation] The notation used for the innovation form of the GMRF should be introduced with an explicit equation to improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and positive overall assessment of our manuscript. We address each major comment point by point below, indicating where revisions have been made to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Simulation studies] The description of the simulation studies does not specify the exact data-generating process used to create the censored and missing observations. Given that the outperformance is claimed against ad hoc methods, it is critical to clarify whether the simulated data were generated from the proposed SAR/DAGAR-AR model or from an independent mechanism. If the former, the results may not adequately test the model robustness to departures from the assumed dependence structure, such as non-stationary spatial effects or additional noise components.

    Authors: We agree that the original manuscript lacked sufficient detail on the simulation design. The data were generated from the proposed SAR/DAGAR-AR spatiotemporal model (with known parameters for spatial and temporal dependence) and then subjected to independent censoring at a fixed LOD and random missingness. This setup was chosen to evaluate performance when the modeling assumptions hold. We acknowledge the referee's valid point regarding robustness testing. In the revised manuscript we have added an explicit description of the data-generating process (including parameter values and imposition of censoring/missingness) in Section 4, together with a short discussion of potential limitations under misspecification. We have not added entirely new simulation scenarios in this revision but note that the current results still demonstrate clear gains over ad-hoc methods under the stated conditions. revision: yes

  2. Referee: [Methods] The handling of censoring in the likelihood is not detailed. Please specify the form of the contribution to the likelihood for censored observations (e.g., the integral or cumulative probability up to the limit of detection) and how it integrates with the GMRF precision matrix construction.

    Authors: We thank the referee for highlighting this omission. In the revised manuscript we have expanded Section 3 to specify that, for an observation censored below the limit of detection c, the likelihood contribution is the integral of the conditional normal density from −∞ to c (equivalently the CDF evaluated at c after marginalizing over the latent GMRF). This term is combined with the joint precision matrix of the innovation-form GMRF by treating the censored values as partially observed latent variables; the precision matrix remains unchanged while the mean vector and the observed-data likelihood are adjusted accordingly. The revised text now includes the explicit integral expression, its reduction to the normal CDF under Gaussianity, and a brief description of how the construction is preserved within the GMRF framework. revision: yes

Circularity Check

0 steps flagged

No significant circularity in model derivation or validation

full rationale

The paper proposes a constructive Bayesian spatiotemporal model that unifies SAR and DAGAR spatial structures with a temporal AR(1) random effect, expressed as GMRFs in innovation form. This is presented as an extension rather than a reduction to prior inputs. Simulation studies compare the model against standard ad hoc imputation (LOD replacement, mean imputation), and the Beijing CO application uses external real-world data for evaluation. No equations reduce claimed performance or unification to fitted parameters by construction, and no load-bearing steps rely on self-citations or self-referential definitions. The derivation chain is self-contained with independent validation steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The model rests on standard domain assumptions for Gaussian random fields and autoregressive dependence; no new free parameters or invented entities are introduced beyond the random-effect structure itself.

axioms (1)
  • domain assumption The observations arise from a latent Gaussian process with the specified SAR/DAGAR-temporal random effect.
    This is the core modeling assumption invoked to justify the unified GMRF representation.

pith-pipeline@v0.9.0 · 5501 in / 1314 out tokens · 42024 ms · 2026-05-17T19:55:01.444031+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Clearing the air: a review of the effects of particulate matter air pollution on human health

    Anderson, J., Thundiyil, J., and Stolbach, A. (2011). “Clearing the air: a review of the effects of particulate matter air pollution on human health.”Journal of Medical Toxicology, 2(8): 166–175. 1

  2. [2]

    Modeling massive spatial datasets using a conjugate Bayesian linear regression frame- work

    Banerjee, S. (2021). “Modeling massive spatial datasets using a conjugate Bayesian linear regression frame- work.”Spatial Statistics, 41: 100498. URLhttps://arxiv.org/abs/2109.0444710

  3. [3]

    E., and Carlin, B

    Banerjee, S., Gelfand, A. E., and Carlin, B. P. (2025).Hierarchical modeling and analysis for spatial data. Boca Raton: Chapman and Hall/CRC, 3 edition. URLhttps://doi.org/10.1201/97810034017285, 8

  4. [4]

    On the parametrization of autoregressive models by partial auto- correlations

    Barndorff-Nielsen, O. and Schou, G. (1973). “On the parametrization of autoregressive models by partial auto- correlations.”Journal of Multivariate Analysis, 3(4): 408–419. 6

  5. [5]

    Objective Bayesian Analysis of Spatially Correlated Data

    Berger, O., J., De Oliveira, V ., and Sans´o, B. (2001). “Objective Bayesian Analysis of Spatially Correlated Data.” Journal of the American Statistical Association, 96(456): 1361–1374. 8

  6. [6]

    Spatial interaction and statistical analysis of lattice systems

    Besag, J. (1974). “Spatial interaction and statistical analysis of lattice systems.”Journal of the Royal Statistical Society, Series B, 36: 192–225. 2

  7. [7]

    Ambient carbon monoxide and daily mortality: a global time-series study in 337 cities

    Chen, K., Breitner, S., Wolf, K., Stafoggia, M., Sera, F., Vicedo-Cabrera, A. M., Guo, Y ., Tong, S., Lavigne, E., Matus, P., Vald´es, N., Kan, H., Jaakkola, J. J. K., Ryti, N. R. I., Huber, V ., Scortichini, M., Hashizume, M., Honda, Y ., Nunes, B., Madureira, J., Holobˆac˘a, I. H., Fratianni, S., Kim, H., Lee, W., Tobias, A., ´I˜niguez, C., Forsberg, B....

  8. [8]

    Spatial disease mapping using directed acyclic graph auto-regressive (DAGAR) models

    Datta, A., Banerjee, S., Hodges, J. S., and Gao, L. (2019). “Spatial disease mapping using directed acyclic graph auto-regressive (DAGAR) models.”Bayesian Analysis, 14(4): 1221 – 1244. 2, 8

  9. [9]

    Predictive spatio-temporal models for spatially sparse environmental data

    de Luna, X. and Genton, M. G. (2005). “Predictive spatio-temporal models for spatially sparse environmental data.”Statistica Sinica, 15: 547–568. 2

  10. [10]

    Spatio-Temporal models with space-time Interaction and their applications to air pollution data

    Deb, S. and Tsay, R. S. (2019). “Spatio-Temporal models with space-time Interaction and their applications to air pollution data.”Statistica Sinica, 29: 1181–1207. 2 16 APREPRINT- NOVEMBER26, 2025

  11. [11]

    Understanding predictive information criteria for Bayesian models

    Gelman, A., Hwang, J., and Vehtari, A. (2014). “Understanding predictive information criteria for Bayesian models.”Statistics and Computing, 24(6): 997–1016. 16

  12. [12]

    Is replacing missing values of PM2.5 constituents with estimates using machine learning better for source apportionment than exclusion or median replacement?

    Kim, Y ., Yi, S.-M., Heo, J., Kim, H., Lee, W., Kim, H., Hopke, P. K., Lee, Y . S., Shin, H.-J., Park, J., Yoo, M., Jeon, K., and Park, J. (2024). “Is replacing missing values of PM2.5 constituents with estimates using machine learning better for source apportionment than exclusion or median replacement?”Environmental Pollution, 354: 124165. 5

  13. [13]

    Carbon monoxide toxicity

    McMahon, K. and Launico, M. V . (2025). “Carbon monoxide toxicity.” InStatPearls. Treasure Island, FL: StatPearls Publishing. 1

  14. [14]

    Non-separable spatio-temporal models via transformed multivariate Gaussian Markov random fields

    Prates, M. O., Azevedo, D. R. M., MacNab, Y . C., and Willig, M. R. (2022). “Non-separable spatio-temporal models via transformed multivariate Gaussian Markov random fields.”Journal of the Royal Statistical Society: Series C (Applied Statistics), 71(5). 14

  15. [15]

    Health effects of exposure to ambient carbon monoxide

    Raub, J. (1999). “Health effects of exposure to ambient carbon monoxide.”Chemosphere: Global Change Science, (1): 331–351. 1

  16. [16]

    Spatial prediction in the presence of left-censoring

    Schelin, L. and de Luna, S. S. (2014). “Spatial prediction in the presence of left-censoring.”Computational Statistics & Data Analysis, 74: 125–141. 10

  17. [17]

    To explain or to predict?

    Shmueli, G. (2010). “To explain or to predict?”Statistical Science, 25(3): 289–310. 16

  18. [18]

    Likelihood-based inference for spatiotemporal data with censored and missing responses

    Valeriano, K. A. L., Lachos, V . H., Prates, M. O., and Matos, L. A. (2021). “Likelihood-based inference for spatiotemporal data with censored and missing responses.”Environmetrics, 32(3): e2663. 2

  19. [19]

    A close look at the spatial structure implied by the CAR and SAR models

    Wall, M. M. (2004). “A close look at the spatial structure implied by the CAR and SAR models.”Journal of Statistical Planning and Inference, 121(2): 311–324. 8

  20. [20]

    Penalized local polynomial regression for spatial data

    Wang, W. and Sun, Y . (2019). “Penalized local polynomial regression for spatial data.”Biometrics, 75(4): 1179–1190. 2

  21. [21]

    Estimation of missing air pollutant data using a spatiotemporal convolutional autoencoder

    Wardana, I. N. K., Gardner, J. W., and Fahmy, S. A. (2022). “Estimation of missing air pollutant data using a spatiotemporal convolutional autoencoder.”Neural Computing and Applications, 34: 16129–16154. 3

  22. [22]

    On stationary process in the plane

    Whittle, P. (1954). “On stationary process in the plane.”Biometrika, 41: 434–449. 2

  23. [23]

    Cautionary tales on air-quality improve- ment in Beijing

    Zhang, S., Guo, B., Dong, A., He, J., Xu, Z., and Chen, S. X. (2017). “Cautionary tales on air-quality improve- ment in Beijing.”Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473. 2 9 Appendix 9.1 Proof of Proposition 3.1 Given the correlation matrix in equation (5) of the main manuscript, and a fixed positions i,ω s...