pith. sign in

arxiv: 1906.08522 · v1 · pith:X4BPMROVnew · submitted 2019-06-20 · 📊 stat.ME · stat.AP

Bayesian spatial clustering of extremal behaviour for hydrological variables

Pith reviewed 2026-05-25 19:31 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords spatial clusteringextreme value theoryBayesian inferencehydrological extremesspatial dependencemarginal tail estimationpooling
0
0 comments X

The pith

Bayesian clustering determines spatial pooling groups for extremes by jointly modeling marginal tails and dependence structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Bayesian approach to cluster sites for extreme value analysis in hydrology. Clusters are chosen to reflect both similarity in the tails of the marginal distributions and the strength of spatial dependence between sites. This determines how much information can be pooled across locations while still respecting the data's dependence patterns. Dependence enters the model twice: once to guide cluster formation and again to adjust the marginal inference for within-cluster correlation. The resulting site-specific tail estimates incorporate uncertainty about which cluster each site belongs to.

Core claim

We propose the first extreme value spatial clustering methods which account for both the similarity of the marginal tails and the spatial dependence structure of the data to determine the appropriate level of pooling. Spatial dependence is incorporated in two ways: to determine the cluster selection and to account for dependence of the data over sites within a cluster when making the marginal inference. We introduce a statistical model for the pairwise extremal dependence which incorporates distance between sites, and accommodates our belief that sites within the same cluster tend to exhibit a higher degree of dependence than sites in different clusters. We use a Bayesian framework whichlear

What carries the argument

A distance-based statistical model for pairwise extremal dependence that enforces higher dependence inside clusters than between clusters and feeds into both cluster selection and adjusted marginal inference.

If this is right

  • The Bayesian procedure returns both a most probable clustering and posterior uncertainty over allocations, so marginal tail estimates automatically average over plausible groupings.
  • Within-cluster dependence is modeled explicitly, preventing over-counting of information when pooling sites that are close together.
  • The number of clusters is learned from the data rather than fixed in advance, allowing the method to adapt to different hydrological regimes.
  • The same framework can be applied directly to daily precipitation or river-flow series without requiring separate steps for dependence modeling and marginal estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the distance-dependence link holds across many regions, the method could supply ready-made pooling regions for national flood-risk maps without manual delineation.
  • The posterior distribution over clusterings offers a natural way to propagate grouping uncertainty into return-level maps, something standard fixed-cluster pooling omits.
  • Extending the pairwise model to include elevation or land-cover covariates might further sharpen clusters in complex terrain.

Load-bearing premise

Sites inside the same cluster display stronger extremal dependence than sites in different clusters, and a simple distance function can capture this pattern.

What would settle it

Re-running the procedure on the Norway precipitation or UK river-flow data and finding that the inferred clusters produce worse out-of-sample tail predictions than either a single global pool or clusters chosen only by marginal similarity would falsify the claim that joint modeling of tails and dependence improves pooling.

Figures

Figures reproduced from arXiv: 1906.08522 by Christian Rohrbeck, Jonathan A Tawn.

Figure 1
Figure 1. Figure 1: Map of South Norway showing the boundaries of the municipalities (left) and [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Locations of the 45 river flow gauges considered in Section [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Map of areal units in Section 4 [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Empirical estimate for the coefficients of asymptotic dependence [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Trace plot (left) and posterior mass function (right) of the number [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Point estimate of the spatial cluster structure for the Norwegian precipitation [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Posterior mass functions of the number J of clusters for November-March (left) and May-September (right). additional information by pooling information across the fixed cluster. Conversely, our SWMC approach efficiently pools information from a larger set of municipalities because the cluster structure is allowed to change. 6.2 Daily river flow in the UK The data described in Section 2.2 exhibit a strong s… view at source ↗
Figure 8
Figure 8. Figure 8: Point estimates for the underlying spatial cluster structure for November [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗
read the original abstract

To address the need for efficient inference for a range of hydrological extreme value problems, spatial pooling of information is the standard approach for marginal tail estimation. We propose the first extreme value spatial clustering methods which account for both the similarity of the marginal tails and the spatial dependence structure of the data to determine the appropriate level of pooling. Spatial dependence is incorporated in two ways: to determine the cluster selection and to account for dependence of the data over sites within a cluster when making the marginal inference. We introduce a statistical model for the pairwise extremal dependence which incorporates distance between sites, and accommodates our belief that sites within the same cluster tend to exhibit a higher degree of dependence than sites in different clusters. We use a Bayesian framework which learns about both the number of clusters and their spatial structure, and that enables the inference of site-specific marginal distributions of extremes to incorporate uncertainty in the clustering allocation. The approach is illustrated using simulations, the analysis of daily precipitation levels in Norway and daily river flow levels in the UK.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the first Bayesian spatial clustering method for extreme value analysis of hydrological variables. It jointly uses similarity in marginal tails and a new model for pairwise extremal dependence (incorporating distance and higher within-cluster dependence) both to select clusters for pooling and to adjust for dependence during marginal tail inference. A Bayesian framework infers the number and spatial structure of clusters while propagating allocation uncertainty into site-specific marginal estimates. The method is illustrated via simulation studies and applications to daily precipitation in Norway and daily river flows in the UK.

Significance. If the central modeling assumptions hold and the method is shown to be robust, the work would provide a practical advance in spatial extremes by replacing ad-hoc pooling regions with a data-driven clustering that respects both marginal and dependence structure. The explicit propagation of clustering uncertainty into marginal inference is a methodological strength, as is the use of real hydrological datasets. The paper correctly positions the contribution relative to existing spatial extremes literature.

major comments (2)
  1. [Methods (pairwise extremal dependence model)] The statistical model for pairwise extremal dependence (introduced to accommodate the belief that intra-cluster dependence exceeds inter-cluster dependence): cluster membership directly enters the dependence parameters, so the same structure drives both the clustering criterion and the adjustment to marginal inference. No identifiability argument, prior sensitivity analysis, or simulation in which the 'higher within-cluster dependence' assumption is deliberately violated is provided; this is load-bearing for the claim that the procedure determines the 'appropriate level of pooling'.
  2. [Simulation study] Simulation study: all reported scenarios appear to be generated under the fitted model (including the intra-cluster dependence assumption). Without a misspecification experiment, it is not possible to assess whether clustering decisions or marginal tail estimates remain reliable when the core modeling belief does not hold exactly.
minor comments (2)
  1. [Introduction] The abstract states the method is the 'first' to account for both marginal tails and spatial dependence; the introduction should contain an explicit comparison table or paragraph distinguishing the new model from existing spatial clustering approaches in extremes (e.g., those based solely on marginal similarity or on max-stable processes).
  2. [Methods] Notation for the cluster indicator in the dependence model should be introduced with an equation number on first use to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and positive assessment of the contribution. We respond point-by-point to the major comments and indicate the revisions that will be made.

read point-by-point responses
  1. Referee: [Methods (pairwise extremal dependence model)] The statistical model for pairwise extremal dependence (introduced to accommodate the belief that intra-cluster dependence exceeds inter-cluster dependence): cluster membership directly enters the dependence parameters, so the same structure drives both the clustering criterion and the adjustment to marginal inference. No identifiability argument, prior sensitivity analysis, or simulation in which the 'higher within-cluster dependence' assumption is deliberately violated is provided; this is load-bearing for the claim that the procedure determines the 'appropriate level of pooling'.

    Authors: We agree that the intra-cluster dependence assumption is central and that supporting analyses are warranted. In the revision we will add a brief identifiability discussion, a prior sensitivity analysis, and a dedicated simulation in which the higher within-cluster dependence assumption is deliberately violated. These additions will directly test the robustness of both the clustering decisions and the resulting marginal tail estimates. revision: yes

  2. Referee: [Simulation study] Simulation study: all reported scenarios appear to be generated under the fitted model (including the intra-cluster dependence assumption). Without a misspecification experiment, it is not possible to assess whether clustering decisions or marginal tail estimates remain reliable when the core modeling belief does not hold exactly.

    Authors: We accept that the existing simulation study is confined to data generated under the model assumptions. The revised manuscript will include an explicit misspecification experiment that relaxes the intra-cluster dependence assumption, thereby allowing evaluation of the reliability of the clustering allocations and the propagated marginal inferences when the core modeling belief is violated. revision: yes

Circularity Check

0 steps flagged

No circularity: modeling assumptions are explicit and derivations remain independent

full rationale

The paper introduces a new Bayesian spatial clustering model for extremes that incorporates both marginal tail similarity and a pairwise extremal dependence structure modulated by distance and cluster membership. The assumption that intra-cluster dependence exceeds inter-cluster dependence is stated explicitly as a modeling belief used to define the likelihood, but no derived quantity (such as posterior allocations or marginal tail estimates) is shown to reduce by the paper's own equations to a fitted parameter or input by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are indicated. The approach is presented as self-contained with simulation validation and real-data application, so the central claims do not collapse to tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on a newly introduced statistical model for pairwise extremal dependence that incorporates distance and a cluster-dependent dependence assumption, plus standard Bayesian inference machinery; free parameters in the dependence model must be estimated from data.

free parameters (1)
  • parameters of the pairwise extremal dependence model
    The model for dependence as a function of distance between sites requires parameters that are fitted or chosen to match observed data.
axioms (1)
  • domain assumption Sites within the same cluster tend to exhibit a higher degree of dependence than sites in different clusters
    This belief is directly incorporated into the statistical model for pairwise extremal dependence as described in the abstract.

pith-pipeline@v0.9.0 · 5698 in / 1330 out tokens · 34485 ms · 2026-05-25T19:31:37.116604+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

  1. [1]

    C., and Engelke, S

    Asadi, P., Davison, A. C., and Engelke, S. (2015). Extremes on river networks. The Annals of Applied Statistics , 9(4):2023--2050

  2. [2]

    Asadi, P., Engelke, S., and Davison, A. C. (2018). Optimal regionalization of extreme value distributions for flood estimation. Journal of Hydrology , 556:182--193

  3. [3]

    Bador, M., Naveau, P., Gilleland, E., Castellà, M., and Arivelo, T. (2015). Spatial clustering of summer temperature maxima from the CNRM-CM5 climate model ensembles & E-OBS over Europe . Weather and Climate Extremes , 9:17--24

  4. [4]

    N., Lopes, H

    Behrens, C. N., Lopes, H. F., and Gamerman, D. (2004). Bayesian analysis of extreme events with threshold estimation. Statistical Modelling , 4(3):227--244

  5. [5]

    Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J. L. (2004). Statistics of Extremes: Theory and Applications . John Wiley & Sons Chichester

  6. [6]

    Bernard, E., Naveau, P., Vrac, M., and Mestre, O. (2013). Clustering of maxima: Spatial dependencies among heavy rainfall in France . Journal of Climate , 26(20):7929--7937

  7. [7]

    and Davison, A

    Blanchet, J. and Davison, A. C. (2011). Spatial modeling of extreme snow depth. The Annals of Applied Statistics , 5(3):1699--1725

  8. [8]

    Bottolo, L., Consonni, G., Dellaportas, P., and Lijoi, A. (2003). Bayesian analysis of extreme values by mixture modeling. Extremes , 6(1):25--47

  9. [9]

    Carreau, J., Naveau, P., and Neppel, L. (2017). Partitioning into hazard subregions for regional peaks-over-threshold modeling of heavy precipitation. Water Resources Research , 53(5):4407--4426

  10. [10]

    and Coles, S

    Casson, E. and Coles, S. G. (1999). Spatial regression models for extremes. Extremes , 1(4):449--468

  11. [11]

    and Davison, A

    Chavez-Demoulin, V. and Davison, A. C. (2005). Generalized additive modelling of sample extremes. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 54(1):207--222

  12. [12]

    Chavez-Demoulin, V., Embrechts, P., and Sardy, S. (2014). Extreme-quantile tracking for financial time series. Journal of Econometrics , 181(1):44--52

  13. [13]

    Coles, S. G. (2001). An Introduction to Statistical Modeling of Extreme Values . Springer-Verlag London

  14. [14]

    G., Heffernan, J., and Tawn, J

    Coles, S. G., Heffernan, J., and Tawn, J. A. (1999). Dependence measures for extreme value analyses. Extremes , 2(4):339--365

  15. [15]

    Coles, S. G. and Tawn, J. A. (1990). Statistics of coastal flood prevention. Philosophical Transactions of the Royal Society: Physical and Engineering Sciences (1990-1995) , 332(1627):457--476

  16. [16]

    Coles, S. G. and Tawn, J. A. (1996). A Bayesian analysis of extreme rainfall data. Journal of the Royal Statistical Society. Series C (Applied Statistics) , 45(4):463--478

  17. [17]

    Davis, R. A. and Mikosch, T. (2009). The extremogram: A correlogram for extreme events. Bernoulli , 15(4):977--1009

  18. [18]

    C., Padoan, S

    Davison, A. C., Padoan, S. A., and Ribatet, M. (2012). Statistical modeling of spatial extremes. Statistical Science , 27(2):161--186

  19. [19]

    Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society. Series B (Methodological) , 52(3):393--442

  20. [20]

    de Haan, L. (1984). A spectral representation for max-stable processes. The Annals of Probability , 12(4):1194--1204

  21. [21]

    and de Ronde, J

    de Haan, L. and de Ronde, J. (1998). Sea and wind: Multivariate extremes at work. Extremes , 1(1):7--45

  22. [22]

    and Ribatet, M

    Dombry, C. and Ribatet, M. (2015). Functional regular variations, Pareto processes and peaks over threshold. Statistics and Its Interface , 8:9--17

  23. [23]

    Dupuis, D. J. and Tawn, J. A. (2001). Effects of mis-specification in bivariate extreme value problems. Extremes , 4(4):315--330

  24. [24]

    Eastoe, E. F. and Tawn, J. A. (2009). Modelling non-stationary extremes with application to surface level ozone. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 58(1):25--45

  25. [25]

    Estimating the economic costs of the winter floods 2015 to 2016

    Environment Agency (2018). Estimating the economic costs of the winter floods 2015 to 2016. Ref: LIT 10736, https://www.gov.uk/government/publications/floods-of-winter-2015-to-2016-estimating-the-costs

  26. [26]

    and de Haan, L

    Ferreira, A. and de Haan, L. (2014). The generalized Pareto process; with a view towards application and simulation. Bernoulli , 20(4):1717--1737

  27. [27]

    Ferro, C. A. T. and Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 65(2):545--556

  28. [28]

    Genest, C., Ghoudi, K., and Rivest, L.-P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions . Biometrika , 82(3):543--552

  29. [29]

    and Segers, J

    Genest, C. and Segers, J. (2009). Rank-based inference for bivariate extreme-value copulas. The Annals of Statistics , 37(5B):2990--3022

  30. [30]

    Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination . Biometrika , 82:711--732

  31. [31]

    K., Vårdal, J

    Haug, O., Dimakos, X. K., Vårdal, J. F., Aldrin, M., and Meze-Hausken, E. (2011). Future building water loss projections posed by climate change. Scandinavian Actuarial Journal , 2011(1):1--20

  32. [32]

    Hilal, S., Poon, S.-H., and Tawn, J. A. (2014). Portfolio risk assessment using multivariate extreme value methods. Extremes , 17(4):531--556

  33. [33]

    Hosking, J. R. M. and Wallis, J. R. (1993). Some statistics useful in regional frequency analysis. Water Resources Research , 29(2):271--281

  34. [34]

    Flood studies report

    Institute of Hydrology (Great Britain) (1975). Flood studies report . Natural Environment Research Council, London

  35. [35]

    and Rousseeuw, P

    Kaufman, L. and Rousseeuw, P. J. (2005). Finding Groups in Data : An Introduction to Cluster Analysis . Wiley Hoboken, N.J

  36. [36]

    Kent, J. T. (1982). Robust properties of likelihood ratio tests. Biometrika , 69(1):19--27

  37. [37]

    and Ra er, G

    Knorr-Held, L. and Ra er, G. (2000). Bayesian detection of clusters and discontinuities in disease maps. Biometrics , 56(1):13--21

  38. [38]

    Ledford, A. W. and Tawn, J. A. (1997). Modelling dependence within joint tail regions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 59(2):475--499

  39. [39]

    J., Lee, D., Darlow, B., Reale, M., and Russell, G

    MacDonald, A., Scarrott, C. J., Lee, D., Darlow, B., Reale, M., and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics & Data Analysis , 55(6):2137--2157

  40. [40]

    Meil a , M. (2007). Comparing clusterings -- an information based distance. Journal of Multivariate Analysis , 98(5):873--895

  41. [41]

    J., Attalides, N., and Jonathan, P

    Northrop, P. J., Attalides, N., and Jonathan, P. (2017). Cross-validatory extreme value threshold selection and uncertainty with application to ocean storm severity. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 66(1):93--120

  42. [42]

    Pickands, J. (1975). Statistical inference using extreme order statistics. The Annals of Statistics , 3(1):119--131

  43. [43]

    Reich, B. J. and Shaby, B. A. (2012). A hierarchical max-stable spatial model for extreme precipitation. The Annals of Applied Statistics , 6(4):1430--1451

  44. [44]

    Reich, B. J. and Shaby, B. A. (2019). A spatial Markov model for climate extremes. Journal of Computational and Graphical Statistics , 28:117--126

  45. [45]

    J., Shaby, B

    Reich, B. J., Shaby, B. A., and Cooley, D. (2014). A hierarchical model for serially-dependent extremes: A study of heat waves in the western US . Journal of Agricultural, Biological, and Environmental Statistics , 19(1):119--135

  46. [46]

    Resnick, S. I. (2013). Extreme Values, Regular Variation and Point Processes . Springer-Verlag New York

  47. [47]

    Ribatet, M., Cooley, D., and Davison, A. C. (2012). Bayesian inference from composite likelihoods, with an application to spatial extremes. Statistica Sinica , 22(2):813--845

  48. [48]

    F., Frigessi, A., and Tawn, J

    Rohrbeck, C., Eastoe, E. F., Frigessi, A., and Tawn, J. A. (2018). Extreme value modelling of water-related insurance claims. The Annals of Applied Statistics , 12(1):246--282

  49. [49]

    Rootzén, H., Segers, J., and Wadsworth, J. L. (2018). Multivariate generalized Pareto distributions: Parametrizations, representations, and properties. Journal of Multivariate Analysis , 165:117 -- 131

  50. [50]

    Rubio, R., de Carvalho, M., and Huser, R. G. (2018). Similarity-based clustering of extreme losses from the London stock exchange. Unpublished manuscript

  51. [51]

    and Gelfand, A

    Sang, H. and Gelfand, A. E. (2009). Hierarchical modeling for extreme values observed over space and time. Environmental and Ecological Statistics , 16(3):407--426

  52. [52]

    Scheel, I., Ferkingstad, E., Frigessi, A., Haug, O., Hinnerichsen, M., and Meze-Hausken, E. (2013). A Bayesian hierarchical model with spatial variable selection: the effect of weather on insurance claims. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 62(1):85--100

  53. [53]

    Smith, R. L. and Goodman, D. J. (2000). Bayesian risk analysis. In Embrechts, P., editor, Extremes and Integrated Risk Management , chapter 17, pages 235--251. Risk Books, London

  54. [54]

    Tawn, J. A. (1988). Bivariate extreme value theory: Models and estimation. Biometrika , 75(3):397--415

  55. [55]

    Wade, S. (2015). mcclust.ext: Point estimation and credible balls for Bayesian cluster analysis. https://www.researchgate.net/publication/279848500\_mcclustext-manual

  56. [56]

    and Ghahramani, Z

    Wade, S. and Ghahramani, Z. (2018). Bayesian cluster analysis: Point estimation and credible balls (with discussion). Bayesian Analysis , 13(2):559--626

  57. [57]

    and Tawn, J

    Wadsworth, J. and Tawn, J. (2018). Spatial conditional extremes. Submitted

  58. [58]

    Wadsworth, J. L. (2016). Exploiting structure of maximum likelihood estimators for extreme value threshold selection. Technometrics , 58(1):116--126

  59. [59]

    Wadsworth, J. L. and Tawn, J. A. (2012). Dependence modelling for spatial extremes. Biometrika , 99(2):253--272

  60. [60]

    and Davis, R

    Wan, P. and Davis, R. A. (2019). Threshold selection for multivariate heavy-tailed data. Extremes , 22(1):131--166