Bayesian spatial clustering of extremal behaviour for hydrological variables
Pith reviewed 2026-05-25 19:31 UTC · model grok-4.3
The pith
Bayesian clustering determines spatial pooling groups for extremes by jointly modeling marginal tails and dependence structure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose the first extreme value spatial clustering methods which account for both the similarity of the marginal tails and the spatial dependence structure of the data to determine the appropriate level of pooling. Spatial dependence is incorporated in two ways: to determine the cluster selection and to account for dependence of the data over sites within a cluster when making the marginal inference. We introduce a statistical model for the pairwise extremal dependence which incorporates distance between sites, and accommodates our belief that sites within the same cluster tend to exhibit a higher degree of dependence than sites in different clusters. We use a Bayesian framework whichlear
What carries the argument
A distance-based statistical model for pairwise extremal dependence that enforces higher dependence inside clusters than between clusters and feeds into both cluster selection and adjusted marginal inference.
If this is right
- The Bayesian procedure returns both a most probable clustering and posterior uncertainty over allocations, so marginal tail estimates automatically average over plausible groupings.
- Within-cluster dependence is modeled explicitly, preventing over-counting of information when pooling sites that are close together.
- The number of clusters is learned from the data rather than fixed in advance, allowing the method to adapt to different hydrological regimes.
- The same framework can be applied directly to daily precipitation or river-flow series without requiring separate steps for dependence modeling and marginal estimation.
Where Pith is reading between the lines
- If the distance-dependence link holds across many regions, the method could supply ready-made pooling regions for national flood-risk maps without manual delineation.
- The posterior distribution over clusterings offers a natural way to propagate grouping uncertainty into return-level maps, something standard fixed-cluster pooling omits.
- Extending the pairwise model to include elevation or land-cover covariates might further sharpen clusters in complex terrain.
Load-bearing premise
Sites inside the same cluster display stronger extremal dependence than sites in different clusters, and a simple distance function can capture this pattern.
What would settle it
Re-running the procedure on the Norway precipitation or UK river-flow data and finding that the inferred clusters produce worse out-of-sample tail predictions than either a single global pool or clusters chosen only by marginal similarity would falsify the claim that joint modeling of tails and dependence improves pooling.
Figures
read the original abstract
To address the need for efficient inference for a range of hydrological extreme value problems, spatial pooling of information is the standard approach for marginal tail estimation. We propose the first extreme value spatial clustering methods which account for both the similarity of the marginal tails and the spatial dependence structure of the data to determine the appropriate level of pooling. Spatial dependence is incorporated in two ways: to determine the cluster selection and to account for dependence of the data over sites within a cluster when making the marginal inference. We introduce a statistical model for the pairwise extremal dependence which incorporates distance between sites, and accommodates our belief that sites within the same cluster tend to exhibit a higher degree of dependence than sites in different clusters. We use a Bayesian framework which learns about both the number of clusters and their spatial structure, and that enables the inference of site-specific marginal distributions of extremes to incorporate uncertainty in the clustering allocation. The approach is illustrated using simulations, the analysis of daily precipitation levels in Norway and daily river flow levels in the UK.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the first Bayesian spatial clustering method for extreme value analysis of hydrological variables. It jointly uses similarity in marginal tails and a new model for pairwise extremal dependence (incorporating distance and higher within-cluster dependence) both to select clusters for pooling and to adjust for dependence during marginal tail inference. A Bayesian framework infers the number and spatial structure of clusters while propagating allocation uncertainty into site-specific marginal estimates. The method is illustrated via simulation studies and applications to daily precipitation in Norway and daily river flows in the UK.
Significance. If the central modeling assumptions hold and the method is shown to be robust, the work would provide a practical advance in spatial extremes by replacing ad-hoc pooling regions with a data-driven clustering that respects both marginal and dependence structure. The explicit propagation of clustering uncertainty into marginal inference is a methodological strength, as is the use of real hydrological datasets. The paper correctly positions the contribution relative to existing spatial extremes literature.
major comments (2)
- [Methods (pairwise extremal dependence model)] The statistical model for pairwise extremal dependence (introduced to accommodate the belief that intra-cluster dependence exceeds inter-cluster dependence): cluster membership directly enters the dependence parameters, so the same structure drives both the clustering criterion and the adjustment to marginal inference. No identifiability argument, prior sensitivity analysis, or simulation in which the 'higher within-cluster dependence' assumption is deliberately violated is provided; this is load-bearing for the claim that the procedure determines the 'appropriate level of pooling'.
- [Simulation study] Simulation study: all reported scenarios appear to be generated under the fitted model (including the intra-cluster dependence assumption). Without a misspecification experiment, it is not possible to assess whether clustering decisions or marginal tail estimates remain reliable when the core modeling belief does not hold exactly.
minor comments (2)
- [Introduction] The abstract states the method is the 'first' to account for both marginal tails and spatial dependence; the introduction should contain an explicit comparison table or paragraph distinguishing the new model from existing spatial clustering approaches in extremes (e.g., those based solely on marginal similarity or on max-stable processes).
- [Methods] Notation for the cluster indicator in the dependence model should be introduced with an equation number on first use to improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive assessment of the contribution. We respond point-by-point to the major comments and indicate the revisions that will be made.
read point-by-point responses
-
Referee: [Methods (pairwise extremal dependence model)] The statistical model for pairwise extremal dependence (introduced to accommodate the belief that intra-cluster dependence exceeds inter-cluster dependence): cluster membership directly enters the dependence parameters, so the same structure drives both the clustering criterion and the adjustment to marginal inference. No identifiability argument, prior sensitivity analysis, or simulation in which the 'higher within-cluster dependence' assumption is deliberately violated is provided; this is load-bearing for the claim that the procedure determines the 'appropriate level of pooling'.
Authors: We agree that the intra-cluster dependence assumption is central and that supporting analyses are warranted. In the revision we will add a brief identifiability discussion, a prior sensitivity analysis, and a dedicated simulation in which the higher within-cluster dependence assumption is deliberately violated. These additions will directly test the robustness of both the clustering decisions and the resulting marginal tail estimates. revision: yes
-
Referee: [Simulation study] Simulation study: all reported scenarios appear to be generated under the fitted model (including the intra-cluster dependence assumption). Without a misspecification experiment, it is not possible to assess whether clustering decisions or marginal tail estimates remain reliable when the core modeling belief does not hold exactly.
Authors: We accept that the existing simulation study is confined to data generated under the model assumptions. The revised manuscript will include an explicit misspecification experiment that relaxes the intra-cluster dependence assumption, thereby allowing evaluation of the reliability of the clustering allocations and the propagated marginal inferences when the core modeling belief is violated. revision: yes
Circularity Check
No circularity: modeling assumptions are explicit and derivations remain independent
full rationale
The paper introduces a new Bayesian spatial clustering model for extremes that incorporates both marginal tail similarity and a pairwise extremal dependence structure modulated by distance and cluster membership. The assumption that intra-cluster dependence exceeds inter-cluster dependence is stated explicitly as a modeling belief used to define the likelihood, but no derived quantity (such as posterior allocations or marginal tail estimates) is shown to reduce by the paper's own equations to a fitted parameter or input by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are indicated. The approach is presented as self-contained with simulation validation and real-data application, so the central claims do not collapse to tautology.
Axiom & Free-Parameter Ledger
free parameters (1)
- parameters of the pairwise extremal dependence model
axioms (1)
- domain assumption Sites within the same cluster tend to exhibit a higher degree of dependence than sites in different clusters
Reference graph
Works this paper leans on
-
[1]
Asadi, P., Davison, A. C., and Engelke, S. (2015). Extremes on river networks. The Annals of Applied Statistics , 9(4):2023--2050
work page 2015
-
[2]
Asadi, P., Engelke, S., and Davison, A. C. (2018). Optimal regionalization of extreme value distributions for flood estimation. Journal of Hydrology , 556:182--193
work page 2018
-
[3]
Bador, M., Naveau, P., Gilleland, E., Castellà, M., and Arivelo, T. (2015). Spatial clustering of summer temperature maxima from the CNRM-CM5 climate model ensembles & E-OBS over Europe . Weather and Climate Extremes , 9:17--24
work page 2015
-
[4]
Behrens, C. N., Lopes, H. F., and Gamerman, D. (2004). Bayesian analysis of extreme events with threshold estimation. Statistical Modelling , 4(3):227--244
work page 2004
-
[5]
Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J. L. (2004). Statistics of Extremes: Theory and Applications . John Wiley & Sons Chichester
work page 2004
-
[6]
Bernard, E., Naveau, P., Vrac, M., and Mestre, O. (2013). Clustering of maxima: Spatial dependencies among heavy rainfall in France . Journal of Climate , 26(20):7929--7937
work page 2013
-
[7]
Blanchet, J. and Davison, A. C. (2011). Spatial modeling of extreme snow depth. The Annals of Applied Statistics , 5(3):1699--1725
work page 2011
-
[8]
Bottolo, L., Consonni, G., Dellaportas, P., and Lijoi, A. (2003). Bayesian analysis of extreme values by mixture modeling. Extremes , 6(1):25--47
work page 2003
-
[9]
Carreau, J., Naveau, P., and Neppel, L. (2017). Partitioning into hazard subregions for regional peaks-over-threshold modeling of heavy precipitation. Water Resources Research , 53(5):4407--4426
work page 2017
-
[10]
Casson, E. and Coles, S. G. (1999). Spatial regression models for extremes. Extremes , 1(4):449--468
work page 1999
-
[11]
Chavez-Demoulin, V. and Davison, A. C. (2005). Generalized additive modelling of sample extremes. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 54(1):207--222
work page 2005
-
[12]
Chavez-Demoulin, V., Embrechts, P., and Sardy, S. (2014). Extreme-quantile tracking for financial time series. Journal of Econometrics , 181(1):44--52
work page 2014
-
[13]
Coles, S. G. (2001). An Introduction to Statistical Modeling of Extreme Values . Springer-Verlag London
work page 2001
-
[14]
G., Heffernan, J., and Tawn, J
Coles, S. G., Heffernan, J., and Tawn, J. A. (1999). Dependence measures for extreme value analyses. Extremes , 2(4):339--365
work page 1999
-
[15]
Coles, S. G. and Tawn, J. A. (1990). Statistics of coastal flood prevention. Philosophical Transactions of the Royal Society: Physical and Engineering Sciences (1990-1995) , 332(1627):457--476
work page 1990
-
[16]
Coles, S. G. and Tawn, J. A. (1996). A Bayesian analysis of extreme rainfall data. Journal of the Royal Statistical Society. Series C (Applied Statistics) , 45(4):463--478
work page 1996
-
[17]
Davis, R. A. and Mikosch, T. (2009). The extremogram: A correlogram for extreme events. Bernoulli , 15(4):977--1009
work page 2009
-
[18]
Davison, A. C., Padoan, S. A., and Ribatet, M. (2012). Statistical modeling of spatial extremes. Statistical Science , 27(2):161--186
work page 2012
-
[19]
Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society. Series B (Methodological) , 52(3):393--442
work page 1990
-
[20]
de Haan, L. (1984). A spectral representation for max-stable processes. The Annals of Probability , 12(4):1194--1204
work page 1984
-
[21]
de Haan, L. and de Ronde, J. (1998). Sea and wind: Multivariate extremes at work. Extremes , 1(1):7--45
work page 1998
-
[22]
Dombry, C. and Ribatet, M. (2015). Functional regular variations, Pareto processes and peaks over threshold. Statistics and Its Interface , 8:9--17
work page 2015
-
[23]
Dupuis, D. J. and Tawn, J. A. (2001). Effects of mis-specification in bivariate extreme value problems. Extremes , 4(4):315--330
work page 2001
-
[24]
Eastoe, E. F. and Tawn, J. A. (2009). Modelling non-stationary extremes with application to surface level ozone. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 58(1):25--45
work page 2009
-
[25]
Estimating the economic costs of the winter floods 2015 to 2016
Environment Agency (2018). Estimating the economic costs of the winter floods 2015 to 2016. Ref: LIT 10736, https://www.gov.uk/government/publications/floods-of-winter-2015-to-2016-estimating-the-costs
work page 2018
-
[26]
Ferreira, A. and de Haan, L. (2014). The generalized Pareto process; with a view towards application and simulation. Bernoulli , 20(4):1717--1737
work page 2014
-
[27]
Ferro, C. A. T. and Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 65(2):545--556
work page 2003
-
[28]
Genest, C., Ghoudi, K., and Rivest, L.-P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions . Biometrika , 82(3):543--552
work page 1995
-
[29]
Genest, C. and Segers, J. (2009). Rank-based inference for bivariate extreme-value copulas. The Annals of Statistics , 37(5B):2990--3022
work page 2009
-
[30]
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination . Biometrika , 82:711--732
work page 1995
-
[31]
Haug, O., Dimakos, X. K., Vårdal, J. F., Aldrin, M., and Meze-Hausken, E. (2011). Future building water loss projections posed by climate change. Scandinavian Actuarial Journal , 2011(1):1--20
work page 2011
-
[32]
Hilal, S., Poon, S.-H., and Tawn, J. A. (2014). Portfolio risk assessment using multivariate extreme value methods. Extremes , 17(4):531--556
work page 2014
-
[33]
Hosking, J. R. M. and Wallis, J. R. (1993). Some statistics useful in regional frequency analysis. Water Resources Research , 29(2):271--281
work page 1993
-
[34]
Institute of Hydrology (Great Britain) (1975). Flood studies report . Natural Environment Research Council, London
work page 1975
-
[35]
Kaufman, L. and Rousseeuw, P. J. (2005). Finding Groups in Data : An Introduction to Cluster Analysis . Wiley Hoboken, N.J
work page 2005
-
[36]
Kent, J. T. (1982). Robust properties of likelihood ratio tests. Biometrika , 69(1):19--27
work page 1982
-
[37]
Knorr-Held, L. and Ra er, G. (2000). Bayesian detection of clusters and discontinuities in disease maps. Biometrics , 56(1):13--21
work page 2000
-
[38]
Ledford, A. W. and Tawn, J. A. (1997). Modelling dependence within joint tail regions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 59(2):475--499
work page 1997
-
[39]
J., Lee, D., Darlow, B., Reale, M., and Russell, G
MacDonald, A., Scarrott, C. J., Lee, D., Darlow, B., Reale, M., and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics & Data Analysis , 55(6):2137--2157
work page 2011
-
[40]
Meil a , M. (2007). Comparing clusterings -- an information based distance. Journal of Multivariate Analysis , 98(5):873--895
work page 2007
-
[41]
J., Attalides, N., and Jonathan, P
Northrop, P. J., Attalides, N., and Jonathan, P. (2017). Cross-validatory extreme value threshold selection and uncertainty with application to ocean storm severity. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 66(1):93--120
work page 2017
-
[42]
Pickands, J. (1975). Statistical inference using extreme order statistics. The Annals of Statistics , 3(1):119--131
work page 1975
-
[43]
Reich, B. J. and Shaby, B. A. (2012). A hierarchical max-stable spatial model for extreme precipitation. The Annals of Applied Statistics , 6(4):1430--1451
work page 2012
-
[44]
Reich, B. J. and Shaby, B. A. (2019). A spatial Markov model for climate extremes. Journal of Computational and Graphical Statistics , 28:117--126
work page 2019
-
[45]
Reich, B. J., Shaby, B. A., and Cooley, D. (2014). A hierarchical model for serially-dependent extremes: A study of heat waves in the western US . Journal of Agricultural, Biological, and Environmental Statistics , 19(1):119--135
work page 2014
-
[46]
Resnick, S. I. (2013). Extreme Values, Regular Variation and Point Processes . Springer-Verlag New York
work page 2013
-
[47]
Ribatet, M., Cooley, D., and Davison, A. C. (2012). Bayesian inference from composite likelihoods, with an application to spatial extremes. Statistica Sinica , 22(2):813--845
work page 2012
-
[48]
Rohrbeck, C., Eastoe, E. F., Frigessi, A., and Tawn, J. A. (2018). Extreme value modelling of water-related insurance claims. The Annals of Applied Statistics , 12(1):246--282
work page 2018
-
[49]
Rootzén, H., Segers, J., and Wadsworth, J. L. (2018). Multivariate generalized Pareto distributions: Parametrizations, representations, and properties. Journal of Multivariate Analysis , 165:117 -- 131
work page 2018
-
[50]
Rubio, R., de Carvalho, M., and Huser, R. G. (2018). Similarity-based clustering of extreme losses from the London stock exchange. Unpublished manuscript
work page 2018
-
[51]
Sang, H. and Gelfand, A. E. (2009). Hierarchical modeling for extreme values observed over space and time. Environmental and Ecological Statistics , 16(3):407--426
work page 2009
-
[52]
Scheel, I., Ferkingstad, E., Frigessi, A., Haug, O., Hinnerichsen, M., and Meze-Hausken, E. (2013). A Bayesian hierarchical model with spatial variable selection: the effect of weather on insurance claims. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 62(1):85--100
work page 2013
-
[53]
Smith, R. L. and Goodman, D. J. (2000). Bayesian risk analysis. In Embrechts, P., editor, Extremes and Integrated Risk Management , chapter 17, pages 235--251. Risk Books, London
work page 2000
-
[54]
Tawn, J. A. (1988). Bivariate extreme value theory: Models and estimation. Biometrika , 75(3):397--415
work page 1988
- [55]
-
[56]
Wade, S. and Ghahramani, Z. (2018). Bayesian cluster analysis: Point estimation and credible balls (with discussion). Bayesian Analysis , 13(2):559--626
work page 2018
-
[57]
Wadsworth, J. and Tawn, J. (2018). Spatial conditional extremes. Submitted
work page 2018
-
[58]
Wadsworth, J. L. (2016). Exploiting structure of maximum likelihood estimators for extreme value threshold selection. Technometrics , 58(1):116--126
work page 2016
-
[59]
Wadsworth, J. L. and Tawn, J. A. (2012). Dependence modelling for spatial extremes. Biometrika , 99(2):253--272
work page 2012
-
[60]
Wan, P. and Davis, R. A. (2019). Threshold selection for multivariate heavy-tailed data. Extremes , 22(1):131--166
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.