pith. sign in

arxiv: 2606.19655 · v1 · pith:SRB6LLTAnew · submitted 2026-06-17 · 📊 stat.CO · math.ST· stat.TH

A Flat Connection: The Pooling Factor and the Geometry of Centring in Hierarchical MCMC

Pith reviewed 2026-06-26 18:03 UTC · model grok-4.3

classification 📊 stat.CO math.STstat.TH
keywords hierarchical modelsMCMC mixingfiber bundleEhresmann connectionpooling factorcentringFisher information metricconditional dependence
0
0 comments X

The pith

The Fisher-induced Ehresmann connection on hierarchical posteriors is flat, so the mixing obstruction reduces to the pooling factor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats the joint parameter space of a hierarchical model as a fiber bundle, with hyperparameters on the base and group-level parameters on the fibers. The Fisher information metric induces an Ehresmann connection whose curvature was hypothesized to cause the centring obstruction felt by MCMC. The central result is that this connection is flat for any smooth hierarchical posterior, because the horizontal leaves are exactly the level sets of the fiber score. What remains is therefore statistical: the conditional dependence of fiber on base, measured per group by the prior fraction known as the pooling factor. This recovers the standard picture of slow mixing in prior-dominated groups and the closed-form optimal non-centring weight, while separating that effect from the distinct funnel pathology.

Core claim

The Ehresmann connection A = -G_FF^{-1}G_BF induced by the Fisher information metric is flat for any smooth hierarchical posterior because its horizontal leaves coincide with the level sets of the fiber score ∂_α log p. There is therefore no geometric obstruction above the metric. The only remaining obstruction is the conditional dependence of the fiber parameters on the base parameters, governed per group by the prior fraction π_j (the pooling factor). From this quantity the paper recovers that prior-dominated groups mix slowly, that the optimal per-group non-centring weight follows in closed form, and that the funnel is a separate base-space pathology distinguished by its opposite dependen

What carries the argument

The Ehresmann connection induced by the Fisher information metric on the fiber bundle of hierarchical parameters, proved flat with horizontal leaves given by the level sets of the fiber score.

If this is right

  • The optimal per-group non-centring weight is recoverable in closed form from the pooling factor π_j.
  • Prior-dominated groups show excess conditional autocorrelation whose magnitude is predicted by π_j.
  • The funnel pathology is separable from the pooling effect by their opposite dependence on the hierarchical variance.
  • A direct attribution test confirms NUTS does not transport the fiber, with the chain-level footprint being conditional autocorrelation in prior-dominated groups.
  • Genuine curvature appears only when the connection is built from a sampler's fixed working metric, making holonomy an algorithmic rather than geometric phenomenon.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Group-level mixing diagnostics could be constructed by estimating the pooling factor directly from posterior draws.
  • The flatness result suggests that other apparent geometric obstructions in sampling algorithms may reduce to statistical dependence once the correct connection is identified.
  • Models with rotational curvature under fixed-mass-matrix connections offer a testable distinction between algorithmic and intrinsic geometric effects.

Load-bearing premise

The joint parameter space of a hierarchical model forms a fiber bundle with hyperparameters as the base manifold and group-level parameters as the fibers.

What would settle it

A direct computation of the curvature two-form of the Fisher-induced connection A = -G_FF^{-1}G_BF on a smooth non-Gaussian hierarchical posterior that yields a non-zero result would falsify the flatness claim.

Figures

Figures reproduced from arXiv: 2606.19655 by Aidan D. Bindoff.

Figure 1
Figure 1. Figure 1: Geometric structure of a hierarchical posterior. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Negative-control experiment: true-loop ρˆj vs. matched-gap control pairs, per group (colour) across the gap grid. Coloured lines and points: per-group ρˆj from true loops (small base distance, base-closure conditioning active). Grey band: full range of ρˆj from matched-gap control pairs with large base distance; dashed line is the control mean. Panels are ordered from high πj (top left) to low πj (bottom r… view at source ↗
Figure 3
Figure 3. Figure 3: Per-group dependence coefficients ρˆj with bootstrap 90% intervals for the centred (left) and non-centred (right) parameterisations of the sparse logistic GLMM (J = 8, nj = 3, σtrue = 3), shown across the gap grid g ∈ {3, 10, 25, 50}. The centred chain has additional short-gap signal that decays with lag (ρ¯ ≈ 0.044 at g = 3 falling to ≈ 0.005 at g = 50). The non-centred chain shows a flat, gap-stable base… view at source ↗
Figure 4
Figure 4. Figure 4: Linearised holonomy Hj vs. loop radius r for the centred GLMM, shown per group (j = 1, . . . , 8, colour). Top: lines are the first-order Stokes prediction Hj = 1 + Fj · πr2/α0j ; points are numerical integration of the transport ODE with the fiber held fixed at α0 (i.e. GF F frozen). Both objects are the linearised holonomy: with GF F updated along the fiber the displacement is zero (Proposition 4). Botto… view at source ↗
Figure 5
Figure 5. Figure 5: Simulation study: median mean dependence coefficient [PITH_FULL_IMAGE:figures/full_fig_p036_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Analytic prior fraction π¯ (x-axis) against empirical mean dependence coefficient ρ¯ (y-axis), faceted by loop gap g. Each panel contains all 160 cell-replicates at that gap. Colour encodes σtrue; shape encodes nj . The positive LOESS trend (grey band: 95% pointwise CI) is consistent with the prediction that a larger prior fraction implies stronger conditional dependence. The trend persists at all gaps but… view at source ↗
Figure 7
Figure 7. Figure 7: Per-group dependence coefficients ρj as a function of the minimum loop gap g ∈ {3, 10, 25, 50} for four corner cells (replicate 1). Shaded bands show 90% bootstrap intervals. In the prior-dominated cell (nj = 3, σtrue = 0.5, bottom left), ρj decays monotonically from gap 3 to gap 50, consistent with an autocorrelation-dominated signal. In data-dominated cells (nj = 100, top row), the coefficients are near … view at source ↗
Figure 8
Figure 8. Figure 8: Per-group αj ESS against the prior fraction πj , by design (columns) and σ (rows), median over 10 replicates. The πj -adaptive rule (green) tracks the best method across the whole πj range: it lifts the αj ESS of prior-dominated groups (high πj ), where centred (dark blue) lags, without harming data-dominated groups (low πj ). This per-group success coexists with non-centred winning on min-ESS ( [PITH_FUL… view at source ↗
Figure 9
Figure 9. Figure 9: Working-metric connections are genuinely curved. [PITH_FULL_IMAGE:figures/full_fig_p039_9.png] view at source ↗
read the original abstract

Standard MCMC diagnostics ($\hat{R}$, effective sample size, divergence counts) detect whether a chain has mixed, but not why it has not. We ask whether the centring/non-centring obstruction in hierarchical models has a geometric cause beyond the metric. The joint parameter space is a fiber bundle (hyperparameters the base, group-level parameters the fibers), and the Fisher information metric induces an Ehresmann connection $A = -G_{FF}^{-1}G_{BF}$; the natural hypothesis is that the obstruction is its curvature, felt by the sampler as holonomy. We prove this false. The connection is flat for any smooth hierarchical posterior, not only the Gaussian case, because its horizontal leaves are the level sets of the fiber score $\partial_\alpha \log p$: there is no geometric obstruction above the metric. What remains is statistical, not geometric, and the flat connection identifies it as a single quantity: the conditional dependence of fiber on base, governed per group by the prior fraction $\pi_j$, the classical pooling factor. From it the framework recovers the established picture, that prior-dominated groups mix slowly and that the optimal per-group non-centring weight follows in closed form, and a simulation study separates this base-fiber coupling from the funnel, a distinct base-space pathology, by their opposite dependence on the hierarchical variance. A direct attribution test confirms that NUTS does not transport the fiber: the chain-level footprint is excess conditional autocorrelation in prior-dominated groups, exactly as $\pi_j$ predicts. Genuine, even rotational, curvature does appear, but only for connections built from a sampler's working metric (a fixed mass matrix), where holonomy re-enters as an algorithmic rather than geometric phenomenon. The prior-fraction diagnostic is distributed as the R package fibr, with the geometric methods as accompanying reproduction code.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript models the joint parameter space of a hierarchical model as a fiber bundle (hyperparameters as base, group-level parameters as fibers) and equips it with the Fisher information metric to induce an Ehresmann connection A = −G_FF^{-1}G_BF. It claims to prove that this connection is flat for any smooth hierarchical posterior (not merely Gaussian), because the horizontal leaves coincide exactly with the level sets of the fiber score ∂_α log p. Consequently there is no geometric obstruction above the metric; the centring/non-centring difficulty reduces to the classical per-group pooling factor π_j that governs conditional dependence of fiber on base. The paper recovers known mixing behaviour, separates this effect from the funnel pathology via simulation, and supplies an R package fibr together with reproduction code.

Significance. If the flatness result holds under the metric actually employed, the work supplies a clean geometric re-derivation of the pooling factor as the sole source of the centring obstruction and cleanly distinguishes it from the distinct base-space funnel pathology. The explicit attribution test with NUTS and the closed-form optimal non-centring weight are useful. The release of the fibr package and accompanying code strengthens reproducibility.

major comments (1)
  1. [Abstract] Abstract (and opening paragraph defining the connection): the central identification that horizontal vectors satisfy ds(X)=0 precisely when X_F = −(∂_F ∂_F log p)^{-1}(∂_B ∂_F log p) X_B holds if and only if the blocks of G are taken from the observed information −∇² log p. The conventional Fisher information metric uses the expectation E[−∇² log p] (or the score variance), which is a different tensor; under that choice the pointwise equality fails and flatness need not follow. The manuscript never states which definition of G is used, yet asserts flatness “for any smooth hierarchical posterior” without qualification. This is load-bearing for the claim that there is “no geometric obstruction above the metric.”
minor comments (1)
  1. The simulation study that separates base-fiber coupling from the funnel by their opposite dependence on hierarchical variance is mentioned only in the abstract; a brief description of the design (number of groups, range of π_j, metrics used) would help readers assess the separation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The observation regarding the precise definition of the Fisher information metric is well taken; we address it directly below and will revise the manuscript to make the choice explicit.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and opening paragraph defining the connection): the central identification that horizontal vectors satisfy ds(X)=0 precisely when X_F = −(∂_F ∂_F log p)^{-1}(∂_B ∂_F log p) X_B holds if and only if the blocks of G are taken from the observed information −∇² log p. The conventional Fisher information metric uses the expectation E[−∇² log p] (or the score variance), which is a different tensor; under that choice the pointwise equality fails and flatness need not follow. The manuscript never states which definition of G is used, yet asserts flatness “for any smooth hierarchical posterior” without qualification. This is load-bearing for the claim that there is “no geometric obstruction above the metric.”

    Authors: We agree that the definition of G must be stated explicitly. The paper employs the observed information matrix G = −∇² log p (negative Hessian of the log-posterior evaluated pointwise), not its expectation. This is the tensor for which the horizontal distribution is exactly the kernel of the fiber-score map ds, so that the horizontal leaves coincide with the level sets of ∂_α log p and the connection is flat for any smooth posterior. The conventional expected Fisher metric would not yield this pointwise identification. We will revise the abstract and the introductory paragraphs that define the connection to specify that the metric is the observed information tensor. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained mathematical identification

full rationale

The paper defines the Ehresmann connection A = -G_FF^{-1}G_BF from the Fisher metric on the fiber bundle and proves flatness by showing that the horizontal condition matches the level sets of the fiber score ∂_α log p via direct differentiation. This is a definitional equivalence derived from the given objects rather than a reduction to fitted inputs or presupposed results. The subsequent identification of the pooling factor π_j follows as a statistical interpretation of the resulting flat geometry and recovers known behavior without circular renaming or self-citation load-bearing. No steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard differential-geometric modeling choices applied to the statistical posterior; no free parameters, invented entities, or ad-hoc axioms beyond the fiber-bundle construction and Fisher-metric connection are introduced in the abstract.

axioms (2)
  • domain assumption The joint parameter space of a hierarchical model can be modeled as a fiber bundle with hyperparameters as base and group-level parameters as fibers.
    Stated at the start of the abstract as the modeling framework.
  • domain assumption The Fisher information metric on this bundle induces an Ehresmann connection given by A = -G_FF^{-1}G_BF.
    Defined in the abstract as the natural connection for the problem.

pith-pipeline@v0.9.1-grok · 5865 in / 1525 out tokens · 35573 ms · 2026-06-26T18:03:37.149974+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 6 canonical work pages

  1. [1]

    Methods of Information Geometry, volume 191 of Translations of Mathematical Monographs

    Shun-ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry, volume 191 of Translations of Mathematical Monographs. American Mathematical Society, 2000

  2. [5]

    Hamiltonian Monte Carlo for Hierarchical Models

    Michael Betancourt and Mark Girolami. Hamiltonian Monte Carlo for Hierarchical Models . In S.K. Upadhyay, U. Singh, D.K. Dey, and A. Loganathan, editors, Current Trends in Bayesian Methodology with Applications, pages 79--101. CRC Press, 2015

  3. [7]

    Aidan D. Bindoff. smoothbp: Hierarchical Piecewise Regression with Smoothed Change-Points , 2026 b . URL https://CRAN.R-project.org/package=smoothbp. R package version 0.2.3

  4. [8]

    posterior: Tools for Working with Posterior Distributions , 2022

    Paul-Christian B\" u rkner, Jonah Gabry, Matthew Kay, and Aki Vehtari. posterior: Tools for Working with Posterior Distributions , 2022. URL https://mc-stan.org/posterior/. R package version 1.4.0

  5. [9]

    Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell

    Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A Probabilistic Programming Language . Journal of Statistical Software, 76 0 (1): 0 1--32, 2017. doi:10.18637/jss.v076.i01

  6. [10]

    cmdstanr: R Interface to CmdStan , 2024

    Jonah Gabry, Rok C e s novar, Andrew Johnson, and Steve Bronder. cmdstanr: R Interface to CmdStan , 2024. URL https://mc-stan.org/cmdstanr

  7. [11]

    Bayesian measures of explained variance and pooling in multilevel (hierarchical) models

    Andrew Gelman and Iain Pardoe. Bayesian measures of explained variance and pooling in multilevel (hierarchical) models. Technometrics, 48 0 (2): 0 241--251, 2006. doi:10.1198/004017005000000517

  8. [12]

    Andrew Gelman and Donald B. Rubin. Inference from Iterative Simulation Using Multiple Sequences . Statistical Science, 7 0 (4): 0 457--472, 1992

  9. [13]

    Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods

    Mark Girolami and Ben Calderhead. Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods . Journal of the Royal Statistical Society: Series B, 73 0 (2): 0 123--214, 2011

  10. [14]

    Gorinova, Dave Moore, and Matthew D

    Maria I. Gorinova, Dave Moore, and Matthew D. Hoffman. Automatic reparameterisation of probabilistic programs. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 3648--3657, 2020

  11. [15]

    Log-Density Gradient Covariance and Automatic Metric Tensors for Riemannian Manifold Monte Carlo Methods

    Tore Selland Kleppe. Log-Density Gradient Covariance and Automatic Metric Tensors for Riemannian Manifold Monte Carlo Methods . Scandinavian Journal of Statistics, 51 0 (3): 0 1206--1229, 2024. doi:10.1111/sjos.12705

  12. [16]

    Foundations of Differential Geometry, Volume I

    Shoshichi Kobayashi and Katsumi Nomizu. Foundations of Differential Geometry, Volume I . Wiley Interscience, 1963

  13. [17]

    On the Geometric Ergodicity of Hamiltonian Monte Carlo

    Samuel Livingstone, Michael Betancourt, Simon Byrne, and Mark Girolami. On the Geometric Ergodicity of Hamiltonian Monte Carlo . Bernoulli, 25 0 (4A): 0 3109--3138, 2019

  14. [18]

    Geometry, Topology and Physics

    Mikio Nakahara. Geometry, Topology and Physics . CRC Press, 2nd edition, 2003

  15. [19]

    Roberts, and Martin Sk\" o ld

    Omiros Papaspiliopoulos, Gareth O. Roberts, and Martin Sk\" o ld. Non-Centered Parameterisations for Hierarchical Models and Data Augmentation . In J.M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid, D. Heckerman, A.F.M. Smith, and M. West, editors, Bayesian Statistics 7, pages 307--326. Oxford University Press, 2003

  16. [20]

    Roberts, and Martin Sk\" o ld

    Omiros Papaspiliopoulos, Gareth O. Roberts, and Martin Sk\" o ld. A General Framework for the Parametrization of Hierarchical Models . Statistical Science, 22 0 (1): 0 59--73, 2007

  17. [21]

    2009 , journal =

    H vard Rue, Sara Martino, and Nicolas Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B, 71 0 (2): 0 319--392, 2009. doi:10.1111/j.1467-9868.2008.00700.x

  18. [22]

    Rank-Normalization, Folding, and Localization: An Improved R for Assessing Convergence of MCMC

    Aki Vehtari, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian B\" u rkner. Rank-Normalization, Folding, and Localization: An Improved R for Assessing Convergence of MCMC . Bayesian Analysis, 16 0 (2): 0 667--718, 2021

  19. [23]

    To center or not to center: That is not the question---an ancillarity--sufficiency interweaving strategy ( ASIS ) for boosting MCMC efficiency

    Yaming Yu and Xiao-Li Meng. To center or not to center: That is not the question---an ancillarity--sufficiency interweaving strategy ( ASIS ) for boosting MCMC efficiency. Journal of Computational and Graphical Statistics, 20 0 (3): 0 531--570, 2011. doi:10.1198/jcgs.2011.203main

  20. [24]

    Semi-Separable Hamiltonian Monte Carlo for Inference in Bayesian Hierarchical Models

    Yichuan Zhang and Charles Sutton. Semi-Separable Hamiltonian Monte Carlo for Inference in Bayesian Hierarchical Models . arXiv preprint arXiv:1406.3843, 2014

  21. [25]

    Journal of the Royal Statistical Society: Series B , year =

    Girolami, Mark and Calderhead, Ben , title =. Journal of the Royal Statistical Society: Series B , year =

  22. [26]

    arXiv preprint arXiv:1212.4693 , year =

    Betancourt, Michael , title =. arXiv preprint arXiv:1212.4693 , year =

  23. [27]

    Current Trends in Bayesian Methodology with Applications , editor =

    Betancourt, Michael and Girolami, Mark , title =. Current Trends in Bayesian Methodology with Applications , editor =. 2015 , pages =

  24. [28]

    and Sk\"

    Papaspiliopoulos, Omiros and Roberts, Gareth O. and Sk\". Bayesian Statistics 7 , editor =. 2003 , pages =

  25. [29]

    and Sk\"

    Papaspiliopoulos, Omiros and Roberts, Gareth O. and Sk\". Statistical Science , year =

  26. [30]

    2014 , note =

    Zhang, Yichuan and Sutton, Charles , title =. 2014 , note =

  27. [31]

    Scandinavian Journal of Statistics , year =

    Kleppe, Tore Selland , title =. Scandinavian Journal of Statistics , year =

  28. [32]

    , title =

    Neal, Radford M. , title =. Handbook of Markov Chain Monte Carlo , editor =. 2011 , chapter =

  29. [33]

    and Gelman, Andrew , title =

    Hoffman, Matthew D. and Gelman, Andrew , title =. Journal of Machine Learning Research , year =

  30. [34]

    , title =

    Gelman, Andrew and Rubin, Donald B. , title =. Statistical Science , year =

  31. [35]

    Bayesian Analysis , year =

    Vehtari, Aki and Gelman, Andrew and Simpson, Daniel and Carpenter, Bob and B\". Bayesian Analysis , year =

  32. [36]

    arXiv preprint arXiv:1701.02434 , year =

    Betancourt, Michael , title =. arXiv preprint arXiv:1701.02434 , year =

  33. [37]

    and Lee, Daniel and Goodrich, Ben and Betancourt, Michael and Brubaker, Marcus and Guo, Jiqiang and Li, Peter and Riddell, Allen , title =

    Carpenter, Bob and Gelman, Andrew and Hoffman, Matthew D. and Lee, Daniel and Goodrich, Ben and Betancourt, Michael and Brubaker, Marcus and Guo, Jiqiang and Li, Peter and Riddell, Allen , title =. Journal of Statistical Software , year =

  34. [38]

    2024 , url =

    Gabry, Jonah and. 2024 , url =

  35. [39]

    Kobayashi, Shoshichi and Nomizu, Katsumi , title =

  36. [40]

    Nakahara, Mikio , title =

  37. [41]

    Bernoulli , year =

    Livingstone, Samuel and Betancourt, Michael and Byrne, Simon and Girolami, Mark , title =. Bernoulli , year =

  38. [42]

    Bernoulli , year =

    Beskos, Alexandros and Pillai, Natesh and Roberts, Gareth and Sanz-Serna, Jesus-Maria and Stuart, Andrew , title =. Bernoulli , year =

  39. [43]

    Bernoulli , year =

    Atchad\'. Bernoulli , year =

  40. [44]

    and Lan, Shiwei and Vandenberg-Rodes, Alexander and Shahbaba, Babak , title =

    Holbrook, Andrew J. and Lan, Shiwei and Vandenberg-Rodes, Alexander and Shahbaba, Babak , title =. Journal of Statistical Computation and Simulation , year =

  41. [45]

    , year =

    Bindoff, Aidan D. , year =

  42. [46]

    , title =

    Bindoff, Aidan D. , title =. 2026 , note =. doi:10.5281/zenodo.20724550 , url =

  43. [47]

    Journal of Computational and Graphical Statistics , year =

    Yu, Yaming and Meng, Xiao-Li , title =. Journal of Computational and Graphical Statistics , year =

  44. [48]

    and Moore, Dave and Hoffman, Matthew D

    Gorinova, Maria I. and Moore, Dave and Hoffman, Matthew D. , title =. Proceedings of the 37th International Conference on Machine Learning , series =

  45. [49]

    Approximate

    Rue, H. Approximate. Journal of the Royal Statistical Society: Series B , year =

  46. [50]

    Technometrics , year =

    Gelman, Andrew and Pardoe, Iain , title =. Technometrics , year =

  47. [51]

    Amari, Shun-ichi and Nagaoka, Hiroshi , title =

  48. [52]

    arXiv preprint arXiv:1910.09407 , year =

    Betancourt, Michael , title =. arXiv preprint arXiv:1910.09407 , year =