pith. sign in

arxiv: 2410.21213 · v2 · submitted 2024-10-28 · 📊 stat.ME

Spatial causal inference in the presence of preferential sampling to study the impacts of marine protected areas

Pith reviewed 2026-05-23 18:48 UTC · model grok-4.3

classification 📊 stat.ME
keywords spatial causal inferencepreferential samplingmarine protected areasunmeasured spatial confoundersBayesian hierarchical modelfish biomassidentifiability
0
0 comments X

The pith

A joint spatial model identifies the causal effect of marine protected areas while correcting for preferential sampling bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Bayesian method for estimating the effect of marine protected areas on fish biomass from observational spatial data. Standard causal methods break down here because sampling sites are chosen in ways tied to the policy and the outcome itself, and hidden spatial factors confound both. The authors build one model that jointly describes where samples are taken, which areas receive protection, and the measured biomass, with shared spatial terms to capture the hidden confounders. They prove the key effect parameter is identifiable and its posterior consistent, then use simulations to show the effect is recovered accurately. When applied to Australian coast data the method detects preferential sampling and shows that ignoring it shifts the estimated impact of the protected areas.

Core claim

We propose a spatial causal inference method that simultaneously accounts for unmeasured spatial confounders in both the sampling process and the treatment allocation. We prove the identifiability of key parameters in the model and the consistency of the posterior distributions of those parameters. Simulation studies confirm that the causal effect of interest can be reliably estimated, and the Australian MPA application shows evidence of preferential sampling whose proper accounting changes the causal effect estimate.

What carries the argument

The joint hierarchical model for sampling locations, treatment assignment, and response that links them through shared spatial random effects representing unmeasured confounders.

If this is right

  • The causal effect of MPAs on fish biomass is identifiable and its posterior is consistent under the joint model.
  • Simulation studies recover the true causal effect when data are generated from the assumed model.
  • In the Australian coast data, evidence of preferential sampling exists and adjusting for it alters the estimated causal effect.
  • Standard separate modeling of sampling, treatment, and outcome would leave the effect unidentified.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same joint-modeling strategy could be used for causal questions in other spatial domains where data collection depends on the outcome, such as pollution monitoring or forest inventories.
  • Policy evaluations that treat sampling locations as fixed may systematically misstate the benefits of protected areas or regulations.
  • Extensions that relax the parametric form of the spatial random effects while preserving identifiability would widen applicability.

Load-bearing premise

The joint model for the sampling locations, treatment assignment, and response correctly captures the dependence induced by unmeasured spatial confounders.

What would settle it

A simulation study or real dataset in which the posterior mean of the causal effect changes substantially when the preferential-sampling component is removed from the joint model.

Figures

Figures reproduced from arXiv: 2410.21213 by Brian J. Reich, David A. Gill, Dongjae Son, Erin M. Schliep, Shu Yang.

Figure 1
Figure 1. Figure 1: Domain of interest (Turquoise) and grid cell locations. Circular dots represent MPA [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Posterior distributions of △ (logarithm of g/100m2 ) under the full and naive models. that these regions are less likely to be assigned MPA policies, whereas the estimates in the eastern and western parts of D are more likely to be assigned MPA policies. Specifically, the eastern and western regions of Australia align with the coral reef area ( [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Posterior means of propensity scores λ1,g λ0,g+λ1,g at each grid cell, along with the sampling locations (Upper left), posterior mean (Upper right) and standard deviation (Lower left) of local causal effects △g under the full model, and the difference between posterior mean of local causal effects △g under the full and naive model (Lower right) using shared random effects to address preferential sampling. … view at source ↗
read the original abstract

Marine Protected Areas (MPAs) have been established globally to conserve marine resources. Given their maintenance costs and impact on commercial fishing, it is critical to evaluate their effectiveness to support future conservation. In this paper, we use data collected from the Australian coast to estimate the effect of MPAs on biodiversity. Environmental studies such as these are often observational, and processes of interest exhibit spatial dependence, which presents challenges in estimating the causal effects. Spatial data can also be subject to preferential sampling, where the sampling locations are related to the policy and the response variable, further complicating inference and prediction. To address these challenges, we propose a spatial causal inference method that simultaneously accounts for unmeasured spatial confounders in both the sampling process and the treatment allocation. We prove the identifiability of key parameters in the model and the consistency of the posterior distributions of those parameters. We show via simulation studies that the causal effect of interest can be reliably estimated under the proposed model. The proposed method is applied to assess the effect of MPAs on fish biomass. We find evidence of preferential sampling and that properly accounting for this source of bias impacts the estimate of the causal effect.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a joint spatial model for sampling locations, treatment assignment (MPA status), and response (biodiversity/fish biomass) to estimate causal effects of marine protected areas while accounting for preferential sampling and unmeasured spatial confounders. It claims to prove identifiability of key parameters and posterior consistency, shows via simulations that the causal effect can be reliably estimated, and applies the method to Australian coast data, finding evidence of preferential sampling that impacts the causal effect estimate.

Significance. If the identifiability and consistency results hold under the stated model, the work provides a principled approach to causal inference in spatially dependent observational data with preferential sampling, a common issue in environmental policy evaluation. The explicit proofs, simulation validation, and real-data application are strengths that could inform future MPA assessments.

major comments (2)
  1. [identifiability section / abstract] The identifiability proof (referenced in the abstract and likely detailed in the model/identifiability section): the result conditions on a specific joint distribution form for the sampling intensity, treatment propensity, and outcome that relies on the shared spatial process (typically a Gaussian random field) inducing exactly the dependence needed to separate sampling bias from spatial confounding. Without explicit conditions on the covariance kernel or link functions, or verification that no additional latent factors are present, the separation may not hold generally, undermining the claim that the causal parameters are identifiable.
  2. [theoretical results] Posterior consistency claim (abstract and theoretical results section): consistency is stated for the key parameters, but the proof sketch appears to inherit the same joint-model restrictions as the identifiability result; if the spatial process specification is misspecified relative to the true data-generating process, consistency may fail even if identifiability holds conditionally.
minor comments (2)
  1. [abstract/introduction] The abstract and introduction could more clearly distinguish the proposed joint model from existing spatial causal methods that handle confounding but not preferential sampling.
  2. [simulation studies] Simulation section: provide more detail on the range of spatial kernels and link functions tested to assess robustness of the identifiability result.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which help clarify the scope of our theoretical results. We respond to each major comment below.

read point-by-point responses
  1. Referee: [identifiability section / abstract] The identifiability proof (referenced in the abstract and likely detailed in the model/identifiability section): the result conditions on a specific joint distribution form for the sampling intensity, treatment propensity, and outcome that relies on the shared spatial process (typically a Gaussian random field) inducing exactly the dependence needed to separate sampling bias from spatial confounding. Without explicit conditions on the covariance kernel or link functions, or verification that no additional latent factors are present, the separation may not hold generally, undermining the claim that the causal parameters are identifiable.

    Authors: We agree that making the modeling assumptions fully explicit strengthens the presentation. The identifiability result is derived under a joint model in which a single Gaussian random field with Matérn covariance drives the dependence among the sampling intensity, treatment propensity, and outcome processes, with logistic links and no additional latent factors. The proof uses the positive-definiteness of the kernel and the strict monotonicity of the links to separate the shared spatial effect from the causal parameter. In the revision we will add an explicit statement of these conditions (kernel class, link properties, and absence of further latent structure) immediately preceding the identifiability theorem. revision: yes

  2. Referee: [theoretical results] Posterior consistency claim (abstract and theoretical results section): consistency is stated for the key parameters, but the proof sketch appears to inherit the same joint-model restrictions as the identifiability result; if the spatial process specification is misspecified relative to the true data-generating process, consistency may fail even if identifiability holds conditionally.

    Authors: The consistency theorem is stated under correct specification of the joint model, which is the conventional setting for posterior consistency results. We acknowledge that misspecification of the spatial process can invalidate consistency in finite samples. Our simulation section already examines performance under several departures from the exact model; we will expand the discussion to note the conditional nature of the consistency guarantee and its practical implications. revision: partial

Circularity Check

0 steps flagged

No circularity: identifiability and consistency proved under explicit joint model with external simulation validation

full rationale

The paper introduces a joint spatial model for sampling locations, treatment assignment, and response to handle preferential sampling and unmeasured confounders. It states that identifiability of causal parameters and posterior consistency are proved for this model, with simulation studies confirming reliable estimation of the causal effect. No quoted equations or self-citations reduce any prediction or parameter to a fitted input by construction, nor does any load-bearing step rely on prior author work as an unverified uniqueness theorem. The derivation chain is therefore self-contained, with the central claims resting on model-specific proofs and independent simulation checks rather than tautological redefinitions or renamings.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed from abstract only; no explicit free parameters, axioms, or invented entities are enumerated. The approach implicitly relies on standard assumptions of spatial Gaussian processes and Bayesian posterior consistency under correct model specification.

pith-pipeline@v0.9.0 · 5749 in / 1157 out tokens · 33873 ms · 2026-05-23T18:48:37.849789+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

  1. [1]

    Adler, R. J. and Taylor, J. E. (2009) Random fields and geometry. Springer Science & Business Media

  2. [2]

    Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2003) Hierarchical modeling and analysis for spatial data. Chapman and Hall/CRC

  3. [3]

    Beaman, R. J. (2023) AusBathyTopo (Australia) 250m 2023 - A High-resolution Depth Model (20230004C) . ://doi.org/10.26186/148758

  4. [4]

    M., Wells, R., Cowan Jr, J

    Boswell, K. M., Wells, R., Cowan Jr, J. H. and Wilson, C. A. (2010) Biomass, density, and size distributions of fishes associated with a large-scale artificial reef complex in the G ulf of M exico. Bulletin of Marine Science, 86, 879--889

  5. [5]

    J., Darling, E

    Campbell, S. J., Darling, E. S., Pardede, S., Ahmadia, G., Mangubhai, S., Amkieltiela, Estradivari and Maire, E. (2020) Fishing restrictions and remoteness deliver conservation outcomes for Indonesia's coral reef fisheries . Conservation Letters, 13, e12698

  6. [6]

    ://doi.org/10.7927/H4JW8BX5

    Center for International Earth Science Information Network - CIESIN - Columbia University (2018) Gridded Population of the World, Version 4.11 (GPWv4): Population Count, Revision 11 . ://doi.org/10.7927/H4JW8BX5. Accessed 15th August 2023

  7. [7]

    E., Graham, N

    Cinner, J. E., Graham, N. A., Huchery, C. and MacNeil, M. A. (2013) Global effects of local human population density and distance to markets on the condition of coral reef fisheries . Conservation Biology, 27, 453--458

  8. [8]

    L., Neelon, B., Nietert, P

    Davis, M. L., Neelon, B., Nietert, P. J., Hunt, K. J., Burgette, L. F., Lawson, A. B. and Egede, L. E. (2019) Addressing geographic confounding through spatial propensity scores: a study of racial disparities in diabetes. Statistical Methods in Medical Research, 28, 734--748

  9. [9]

    and Han, Z

    De Oliveira, V. and Han, Z. (2022) On information about covariance parameters in Gaussian Mat \'e rn random fields . Journal of Agricultural, Biological and Environmental Statistics, 27, 690--712

  10. [10]

    and Leblois, A

    Desbureaux, S., Girard, J., Dalongeville, A., Devillers, R., Mouillot, D., Jiddawi, N., Sanchez, L., Velez, L., Mathon, L. and Leblois, A. (2024) The long-term impacts of marine protected areas on fish catch and socioeconomic development in tanzania. Conservation Letters, e13048

  11. [11]

    L., Grech, A., Kittinger, J

    Devillers, R., Pressey, R. L., Grech, A., Kittinger, J. N., Edgar, G. J., Ward, T. and Watson, R. (2015) Reinventing residual reserves in the sea: are we favouring ease of establishment over need for protection? Aquatic conservation: marine and freshwater ecosystems, 25, 480--504

  12. [12]

    J., Menezes, R

    Diggle, P. J., Menezes, R. and Su, T.-l. (2010) Geostatistical inference under preferential sampling. Journal of the Royal Statistical Society Series C: Applied Statistics, 59, 191--232

  13. [13]

    and Li, F

    Ding, P. and Li, F. (2018) Causal inference: a missing data perspective. Statistical Science, 33, 214--237

  14. [14]

    M., Puotinen, M., Ramsay, H

    Dixon, A. M., Puotinen, M., Ramsay, H. A. and Beger, M. (2022) Coral reef exposure to damaging tropical cyclone waves in a warming climate . Earth's Future, 10, e2021EF002600

  15. [15]

    (1999) Essentials of stochastic processes, vol

    Durrett, R. (1999) Essentials of stochastic processes, vol. 1. Springer

  16. [16]

    J., Stuart-Smith, R

    Edgar, G. J., Stuart-Smith, R. D., Willis, T. J., Kininmonth, S., Baker, S. C., Banks, S., Barrett, N. S., Becerro, M. A., Bernard, A. T., Berkhout, J. et al. (2014) Global conservation outcomes depend on marine protected areas with five key features. Nature, 506, 216--220

  17. [17]

    ://www.arcgis.com/home/item.html?id=dfab3b294ab24961899b2a98e9e8cd3d

    ESRI (2023) World Cities . ://www.arcgis.com/home/item.html?id=dfab3b294ab24961899b2a98e9e8cd3d. Accessed: 6th September 2023

  18. [18]

    J., Sanchirico, J

    Ferraro, P. J., Sanchirico, J. N. and Smith, M. D. (2019) Causal inference in coupled human and natural systems. Proceedings of the National Academy of Sciences, 116, 5311--5318

  19. [19]

    E., Sahu, S

    Gelfand, A. E., Sahu, S. K. and Holland, D. M. (2012) On the effect of preferential sampling in spatial prediction. Environmetrics, 23, 565--578

  20. [20]

    Gelfand, A. E. and Schliep, E. M. (2018) Bayesian inference and computing for spatial point patterns. In NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 10, i--125. JSTOR

  21. [21]

    and Roy, A

    Ghosal, S. and Roy, A. (2006) Posterior consistency of gaussian process prior for nonparametric binary regression. The Annals of Statistics

  22. [22]

    and Van der Vaart, A

    Ghosal, S. and Van der Vaart, A. (2017) Fundamentals of nonparametric Bayesian inference, vol. 44. Cambridge University Press

  23. [23]

    Ghosh, J. K. and Ramamoorthi, R. V. (2003) Bayesian nonparametrics. Springer series in statistics. New York: Springer-Verlag

  24. [24]

    A., Cheng, S

    Gill, D. A., Cheng, S. H., Glew, L., Aigner, E., Bennett, N. J. and Mascia, M. B. (2019) Social synergies, tradeoffs, and equity in marine conservation impacts . Annual Review of Environment and Resources, 44, 347--372

  25. [25]

    A., Lester, S

    Gill, D. A., Lester, S. E., Free, C. M., Pfaff, A., Iversen, E., Reich, B. J., Yang, S., Ahmadia, G., Andradi-Brown, D. A., Darling, E. S. et al. (2024) A diverse portfolio of marine protected areas can better advance global conservation and equity . Proceedings of the National Academy of Sciences, 121, e2313205121

  26. [26]

    A., Mascia, M

    Gill, D. A., Mascia, M. B., Ahmadia, G. N., Glew, L., Lester, S. E., Barnes, M., Craigie, I., Darling, E. S., Free, C. M., Geldmann, J. et al. (2017) Capacity shortfalls hinder the performance of marine protected areas globally. Nature, 543, 665--669

  27. [27]

    Gockenbach, M. S. (2011) Finite-dimensional linear algebra. CRC Press

  28. [28]

    P., Kingston, N., Laffoley, D., Sala, E., Claudet, J

    Grorud-Colvert, K., Sullivan-Stack, J., Roberts, C., Constant, V., Horta e Costa, B., Pike, E. P., Kingston, N., Laffoley, D., Sala, E., Claudet, J. et al. (2021) The MPA guide: A framework to achieve global goals for the ocean. Science, 373, eabf0861

  29. [29]

    L., Reich, B

    Guan, Y., Page, G. L., Reich, B. J., Ventrucci, M. and Yang, S. (2023) Spectral adjustment for spatial confounding. Biometrika, 110, 699--719

  30. [30]

    Hern \'a n, M. A. and Robins, J. M. (2010) Causal inference

  31. [31]

    Imbens, G. W. and Rubin, D. B. (2015) Causal inference in statistics, social, and biomedical sciences. Cambridge University Press

  32. [32]

    (2018) Applying IUCN’s Global Conservation Standards to Marine Protected Areas (MPAs)

    IUCN, W. (2018) Applying IUCN’s Global Conservation Standards to Marine Protected Areas (MPAs). Delivering effective conservation action through MPAs, to secure ocean health & sustainable development. Version 1.0

  33. [33]

    F., Diggle, P

    Jarner, M. F., Diggle, P. and Chetwynd, A. G. (2002) Estimation of spatial variation in risk using matched case-control data. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 44, 936--945

  34. [34]

    the ocean is our farm

    Kamat, V. R. (2014) " the ocean is our farm": marine conservation, food insecurity, and social suffering in southeastern tanzania. Human Organization, 73, 289--298

  35. [35]

    E., Doyle, E., Schill, S

    Knowles, J. E., Doyle, E., Schill, S. R., Roth, L. M., Milam, A. and Raber, G. T. (2015) Establishing a marine conservation baseline for the insular Caribbean . Marine Policy, 60, 84--97

  36. [36]

    (1991) Bounds for modified bessel functions

    Laforgia, A. (1991) Bounds for modified bessel functions. Journal of Computational and Applied Mathematics, 34, 263--267

  37. [37]

    and Mealli, F

    Li, F., Ding, P. and Mealli, F. (2023) Bayesian causal inference: a critical review. Philosophical Transactions of the Royal Society A, 381, 20220153

  38. [38]

    and Klein, N

    Marques, I., Kneib, T. and Klein, N. (2022) Mitigating spatial confounding by explicitly correlating Gaussian random fields . Environmetrics, 33, e2727

  39. [39]

    M ller, J., Syversveen, A. R. and Waagepetersen, R. P. (1998) Log Gaussian Cox processes. Scandinavian Journal of Statistics, 25, 451--482

  40. [40]

    Neal, R. M. et al. (2011) MCMC using H amiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2, 2

  41. [41]

    Olver, F. W. (2010) NIST handbook of mathematical functions . Cambridge university press

  42. [42]

    Pati, D., Reich, B. J. and Dunson, D. B. (2011) Bayesian geostatistical modelling with informative sampling locations. Biometrika, 98, 35--48

  43. [43]

    J., Yang, S., Guan, Y., Giffin, A

    Reich, B. J., Yang, S., Guan, Y., Giffin, A. B., Miller, M. J. and Rappold, A. (2021) A review of spatial causal inference methods for environmental and epidemiological applications. International Statistical Review, 89, 605--634

  44. [44]

    Rosenbaum, P. R. and Rubin, D. B. (1983) The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41--55

  45. [45]

    Rubin, D. B. (1974) Estimating Causal effects of treatments in randomized and nonrandomized studies . Journal of Educational Psychology, 66, 688

  46. [46]

    The Annals of Statistics, 34--58

    --- (1978) Bayesian inference for causal effects: The role of randomization. The Annals of Statistics, 34--58

  47. [47]

    M., Wikle, C

    Schliep, E. M., Wikle, C. K. and Daw, R. (2023) Correcting for informative sampling in spatial covariance estimation and kriging predictions. Journal of Geographical Systems, 1--27

  48. [48]

    Schnell, P. M. and Papadogeorgou, G. (2020) Mitigating unobserved spatial confounding when estimating the effect of supermarket access on cardiovascular disease deaths . The Annals of Applied Statistics, 14, 2069 -- 2095

  49. [49]

    (1965) On bayes procedures

    Schwartz, L. (1965) On bayes procedures. Zeitschrift f \"u r Wahrscheinlichkeitstheorie und verwandte Gebiete , 4, 10--26

  50. [50]

    and De Clerck, O

    Tyberghein, L., Verbruggen, H., Pauly, K., Troupin, C., Mineur, F. and De Clerck, O. (2012) Bio-ORACLE : a global environmental dataset for marine species distribution modelling. Global Ecology and Biogeography, 21, 272--281

  51. [51]

    (2018) NGS.(2018)

    UNEP-WCMC, I. (2018) NGS.(2018) . Protected planet report, 70

  52. [52]

    and Smith, W

    Wessel, P. and Smith, W. H. (1996) A global, self-consistent, hierarchical, high-resolution shoreline database. Journal of Geophysical Research: Solid Earth, 101, 8741--8743

  53. [53]

    Williams, C. K. and Rasmussen, C. E. (2006) Gaussian processes for machine learning, vol. 2. MIT press Cambridge, MA

  54. [54]

    D., Walsh, W., Schroeder, R., Friedlander, A., Richards, B

    Williams, I. D., Walsh, W., Schroeder, R., Friedlander, A., Richards, B. and Stamoulis, K. (2008) Assessing the importance of fishing impacts on Hawaiian coral reef fish assemblages along regional-scale human population gradients . Environmental Conservation, 35, 261--272

  55. [55]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := ...

  56. [56]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...