Variance or Standard Deviation? Shell Geometry and Global-Scale Priors in High-Dimensional Shrinkage
Pith reviewed 2026-06-26 07:21 UTC · model grok-4.3
The pith
Priors flat on the standard deviation hold a one-unit asymptotic risk advantage near the origin over variance-flat priors under radial-power benchmarks in high-dimensional shrinkage.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under a radial-power benchmark, the SD-flat benchmark has a one-unit asymptotic risk advantage near the origin, crosses over in the critical regime, and is second-order equivalent to the variance-flat benchmark for strong signals. Proper single global-scale hyperpriors and bounded coordinate-multiplier mixtures inherit these limits through the near-zero exponent of their SD-scale density. For heavier-tailed or sparse priors, that exponent still classifies the common global-scale component, while local-scale tails, model-size priors, or allocation priors can also affect risk.
What carries the argument
The near-zero exponent of the SD-scale density, which determines how much prior mass is allocated near the zero-scale boundary and thereby controls first-order shrinkage risk.
If this is right
- Proper single global-scale hyperpriors inherit the risk limits through the near-zero exponent of their SD-scale density.
- Bounded coordinate-multiplier mixtures inherit these limits in the same way.
- For heavier-tailed or sparse priors the exponent continues to classify the common global-scale component.
- Local-scale tails, model-size priors, or allocation priors can additionally affect overall risk.
Where Pith is reading between the lines
- Default prior recommendations in empirical Bayes shrinkage should therefore favor SD-flatness when signals are expected to be weak or near zero.
- The geometric distinction between variance and SD flatness may appear in other high-dimensional scale estimation problems that use shell-volume arguments.
- Finite-sample simulations with controlled radial-power signals could test how quickly the one-unit advantage emerges.
Load-bearing premise
The near-zero behavior of the common scale prior has first-order consequences for shrinkage risk in the high-dimensional setting considered.
What would settle it
An explicit calculation of the asymptotic risk difference between the SD-flat and variance-flat benchmarks near the origin that yields a value other than one unit would falsify the central comparison.
Figures
read the original abstract
We study how the choice of default prior for a common Gaussian scale affects high-dimensional shrinkage risk, highlighting the role played by high-dimensional geometry. Formally, we consider a high-dimensional setting in which the near-zero behavior of the common scale prior has first-order consequences for shrinkage risk, and show that priors that are flat on the variance and those flat on the standard deviation allocate markedly different mass near the zero-scale boundary, leading to distinct shrinkage behavior and informing principled default prior selection. Specifically, under a radial-power benchmark, we establish that the SD-flat benchmark has a one-unit asymptotic risk advantage near the origin, crosses over in the critical regime, and is second-order equivalent to the variance-flat benchmark for strong signals. Proper single global-scale hyperpriors and bounded coordinate-multiplier mixtures inherit these limits through the near-zero exponent of their SD-scale density. For heavier-tailed or sparse priors, that exponent still classifies the common global-scale component, while local-scale tails, model-size priors, or allocation priors can also affect risk.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the effect of the near-zero behavior of common scale priors (variance-flat vs. SD-flat) on high-dimensional shrinkage risk. Under an explicit radial-power benchmark, it derives that the SD-flat prior yields a one-unit asymptotic risk advantage near the origin, a crossover in the critical regime, and second-order equivalence to the variance-flat prior for strong signals. These limits are inherited by proper global-scale hyperpriors and bounded coordinate-multiplier mixtures through the near-zero exponent of the SD-scale density; the exponent is also used to classify the global-scale component for heavier-tailed or sparse priors.
Significance. If the asymptotic derivations hold, the work supplies a geometrically grounded criterion for default prior selection in high-dimensional Bayesian shrinkage, showing that first-order risk differences arise directly from the near-zero exponent under the stated benchmark. This is a precise, falsifiable contribution to the literature on global-scale hyperpriors.
minor comments (2)
- The abstract and introduction would benefit from an explicit statement of the radial-power benchmark density (including the range of the power parameter) so that the one-unit advantage claim can be checked without consulting later sections.
- Notation for the SD-scale density and its near-zero exponent should be introduced once and used consistently; the current phrasing mixes “SD-flat benchmark” and “near-zero exponent of their SD-scale density” without a single defining equation.
Simulated Author's Rebuttal
We thank the referee for their positive summary of the manuscript, recognition of its contribution to default prior selection in high-dimensional shrinkage, and recommendation of minor revision. No specific major comments were raised.
Circularity Check
No significant circularity
full rationale
The paper's central claims derive asymptotic risk comparisons (one-unit advantage near origin, crossover, second-order equivalence) directly from the near-zero exponent of the SD-scale density under an explicit radial-power benchmark and high-dimensional shell geometry. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the derivations are presented as following from volume arguments and the stated premise on near-zero behavior. The analysis is self-contained against the benchmark without internal reduction to its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The near-zero behavior of the common scale prior has first-order consequences for shrinkage risk
Reference graph
Works this paper leans on
-
[1]
(1997): An elementary introduction to modern convex geometry, in Flavors of Geometry, Cambridge: Cambridge University Press, vol
Ball, K. (1997): An elementary introduction to modern convex geometry, in Flavors of Geometry, Cambridge: Cambridge University Press, vol. 31 of Mathematical Sciences Research Institute Publications, 1--58
1997
-
[2]
Berger, J. O., W. E. Strawderman, and D. Tang (2005): Posterior propriety and admissibility of hyperpriors in normal hierarchical models, Annals of Statistics, 33, 606--646
2005
-
[3]
Datta, N
Bhadra, A., J. Datta, N. G. Polson, and B. Willard (2016): Default B ayesian analysis with global-local shrinkage priors, Biometrika, 103, 955--969
2016
-
[4]
Bhattacharya, A., D. Pati, N. S. Pillai, and D. B. Dunson (2015): Dirichlet-- L aplace priors for optimal shrinkage, Journal of the American Statistical Association, 110, 1479--1490
2015
-
[5]
Brown, L. D. (1971): Admissible estimators, recurrent diffusions, and insoluble boundary value problems, Annals of Mathematical Statistics, 42, 855--903
1971
-
[6]
Brown, L. D. and L. H. Zhao (2012): A geometrical explanation of S tein shrinkage, Statistical Science, 27, 40--52
2012
-
[7]
Carvalho, C. M., N. G. Polson, and J. G. Scott (2010): The horseshoe estimator for sparse signals, Biometrika, 97, 465--480
2010
-
[8]
Castillo, I. and B. Szab \'o (2020): Spike and slab empirical B ayes sparse credible sets, Bernoulli, 26, 127--158
2020
-
[9]
Hansen, and Y
Chernozhukov, V., C. Hansen, and Y. Liao (2017): A lava attack on the recovery of sums of dense and sparse signals, Annals of Statistics, 45, 39--76
2017
-
[10]
Donoho, D. L. and J. Tanner (2009): Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367, 4273--4293
2009
-
[11]
(2006): Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1, 515--533
Gelman, A. (2006): Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1, 515--533
2006
-
[12]
Lenza, and G
Giannone, D., M. Lenza, and G. E. Primiceri (2021): Economic predictions with big data: the illusion of sparsity, Econometrica, 89, 2409--2437
2021
-
[13]
Ingster, Y. I. and I. A. Suslina (2000): Minimax nonparametric hypothesis testing for ellipsoids and B esov bodies, ESAIM: Probability and Statistics, 4, 53--135
2000
-
[14]
Johnstone, I. M. and B. W. Silverman (2004): Needles and straw in haystacks: empirical B ayes estimates of possibly sparse sequences, Annals of Statistics, 32, 1594--1649
2004
-
[15]
Koles\'ar, M., U. K. M \"u ller, and S. T. Roelsgaard (2025): The fragility of sparsity, Working paper, March 2025. arXiv:2311.02299
Pith/arXiv arXiv 2025
-
[16]
Laurent, B. and P. Massart (2000): Adaptive estimation of a quadratic functional by model selection, Annals of Statistics, 28, 1302--1338
2000
-
[17]
(2001): The Concentration of Measure Phenomenon, vol
Ledoux, M. (2001): The Concentration of Measure Phenomenon, vol. 89 of Mathematical Surveys and Monographs, Providence, RI: American Mathematical Society
2001
-
[18]
Maruyama, Y. and A. Takemura (2008): Admissibility and minimaxity of generalized B ayes estimators for spherically symmetric family, Journal of Multivariate Analysis, 99, 50--73
2008
-
[19]
Moran, G. E., V. Ro c kov \'a , and E. I. George (2019): Variance prior forms for high-dimensional B ayesian variable selection, Bayesian Analysis, 14, 1091--1119
2019
-
[20]
Piironen, J. and A. Vehtari (2017 a ): On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR, vol. 54 of Proceedings of Machine Learning Research, 905--913
2017
-
[21]
--- -.1pt --- -.1pt --- (2017 b ): Sparsity information and regularization in the horseshoe and other shrinkage priors, Electronic Journal of Statistics, 11, 5018--5051
2017
-
[22]
Polson, N. G. and J. G. Scott (2012): On the Half-Cauchy Prior for a Global Scale Parameter, Bayesian Analysis, 7, 887--902
2012
-
[23]
(2018): Bayesian estimation of sparse signals with a continuous spike-and-slab prior, Annals of Statistics, 46, 401--437
Ro c kov \'a , V. (2018): Bayesian estimation of sparse signals with a continuous spike-and-slab prior, Annals of Statistics, 46, 401--437
2018
-
[24]
Ro c kov \'a , V. and E. I. George (2018): The Spike-and-Slab LASSO , Journal of the American Statistical Association, 113, 431--444
2018
-
[25]
Scott, J. G. and J. O. Berger (2010): Bayes and empirical- B ayes multiplicity adjustment in the variable-selection problem, Annals of Statistics, 38, 2587--2619
2010
-
[26]
Stein, C. (1956): Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, in Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 197--206
1956
-
[27]
Stein, C. M. (1981): Estimation of the mean of a multivariate normal distribution, Annals of Statistics, 9, 1135--1151
1981
-
[28]
Strawderman, W. E. (1971): Proper B ayes minimax estimators of the multivariate normal mean, Annals of Mathematical Statistics, 42, 385--388
1971
-
[29]
van der Pas, S. L., B. J. Kleijn, and A. W. van der Vaart (2014): The horseshoe estimator: Posterior concentration around nearly black vectors, Electronic Journal of Statistics, 8, 2585--2618
2014
-
[30]
Vershynin, R. (2018): High-Dimensional Probability: An Introduction with Applications in Data Science, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge: Cambridge University Press
2018
-
[31]
Zhang, Y. D., B. P. Naughton, H. D. Bondell, and B. J. Reich (2022): Bayesian regression using a prior on the model fit: The R 2- D 2 shrinkage prior, Journal of the American Statistical Association, 117, 862--874
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.