Assumption-Lean Shrinkage and Model Averaging for Spatial Parameters

Harvey Barnhard

arxiv: 2606.12324 · v1 · pith:W5SQGG5Knew · submitted 2026-06-10 · 💰 econ.EM

Assumption-Lean Shrinkage and Model Averaging for Spatial Parameters

Harvey Barnhard This is my paper

Pith reviewed 2026-06-27 07:28 UTC · model grok-4.3

classification 💰 econ.EM

keywords shrinkage estimationspatial parametersSUREmodel averagingeconomic mobilityneighborhood effectsassumption-lean methodsempirical Bayes

0 comments

The pith

SURE selection and averaging lets researchers compare spatial shrinkage rules without committing to one true model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that Stein's Unbiased Risk Estimate can select among candidate shrinkage estimators for spatial parameters and produce a weighted average that performs nearly as well as the single best rule in the class. This holds under regularity conditions placed directly on the estimator maps rather than on any underlying data-generating process. A sympathetic reader would care because economic and policy decisions often rely on many noisy estimates of neighborhood effects or mobility, and the method reduces the need to defend a specific prior or adjacency structure while still achieving low risk. In the Opportunity Atlas application the SURE average cut estimated mean squared error by about 27 percent relative to the best non-spatial benchmark. The approach extends the comparison to nonlinear shrinkage rules that use the full vector of estimates.

Core claim

Under regularity conditions stated directly on the estimator maps, SURE selection performs nearly as well as the best rule in a candidate class. The SURE-chosen weighted average likewise performs nearly as well as the best fixed weighted average of trained candidates, including nonlinear shrinkage rules whose fitted values use the full vector of noisy estimates.

What carries the argument

Stein's Unbiased Risk Estimate (SURE) applied to maps of candidate shrinkage estimators for spatial parameters, used both for selection and for constructing weighted averages.

If this is right

Different candidate definitions of spatial relatedness can be compared and averaged without designating any one as the true model.
Nonlinear shrinkage rules that depend on the entire vector of noisy estimates are included in the candidate class and can be averaged.
The method yields lower estimated mean squared error than a standard non-spatial empirical Bayes benchmark in spatial mobility data.
Selection and averaging each achieve performance close to the best fixed rule within the candidate class.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be tested on non-spatial panel data where units share unobserved factors other than geography.
In practice, analysts would still need to verify or relax the regularity conditions for the specific estimators they include.
Policy applications that pool estimates across many units could reduce over-reliance on any single adjacency or covariance specification.

Load-bearing premise

The regularity conditions placed directly on the estimator maps hold for the shrinkage rules being compared.

What would settle it

An empirical example or simulation in which the SURE-selected or SURE-averaged estimator exhibits substantially higher risk than the oracle best candidate from the same class.

read the original abstract

Economic decisions often depend on many noisy estimates of neighborhood effects, school quality, and hospital performance. Shrinkage estimation can reduce this noise by pooling information across related units. When units are related through geography, adjacency, or shared characteristics, the main challenge is not only how much to shrink, but which relationships should guide pooling. We use Stein's Unbiased Risk Estimate (SURE) to select among and average over flexible shrinkage estimators, allowing researchers to compare candidate definitions of relatedness without treating any one prior, covariance model, or adjacency rule as the true model for the latent parameters. Under regularity conditions stated directly on the estimator maps, SURE selection performs nearly as well as the best rule in a candidate class. The SURE-chosen weighted average likewise performs nearly as well as the best fixed weighted average of trained candidates, including nonlinear shrinkage rules whose fitted values use the full vector of noisy estimates. In an application to Opportunity Atlas economic mobility data from 20 commuting zones, the best individual spatial specification varies across zones, and the SURE-chosen average reduces reported SURE-estimated mean squared error by about 27% relative to the best-performing non-spatial empirical Bayes benchmark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SURE selection and averaging over spatial shrinkage rules is a practical idea, but the near-oracle claims rest on regularity conditions that the abstract does not verify for the nonlinear candidates.

read the letter

The core contribution is a SURE-based procedure that selects among and averages over a menu of spatial shrinkage estimators, treating different adjacency or covariance definitions as candidates rather than assuming one is correct. This lets the data guide how much pooling happens across neighborhoods or zones without committing to a single model of relatedness.

What the paper does well is apply this to the Opportunity Atlas mobility data across 20 commuting zones. The best single spatial specification changes by zone, and the SURE-chosen average cuts the reported SURE-estimated MSE by 27 percent relative to the best non-spatial empirical Bayes benchmark. Extending the approach to nonlinear shrinkage rules whose fits use the full vector of estimates is also a clear step beyond standard linear SURE applications.

The soft spot is the theoretical guarantee. The abstract states that under regularity conditions placed directly on the estimator maps, SURE selection and the weighted average perform nearly as well as the best rule or best fixed average. No explicit verification appears for whether the nonlinear spatial candidates satisfy the needed properties such as uniform Lipschitz continuity or bounded moments. Without that check, the near-oracle result does not automatically carry over to the application or to new datasets.

This paper is aimed at applied economists who estimate place-based effects and want a data-driven way to handle uncertainty over dependence structures. A reader working with school, hospital, or neighborhood data would get immediate value from the method and the empirical illustration. The work shows clear engagement with the SURE literature and the spatial application, so it deserves a serious referee even if the regularity conditions need more explicit treatment in revision.

Referee Report

1 major / 3 minor

Summary. The paper develops an assumption-lean framework for shrinkage estimation and model averaging of spatial parameters by using Stein's Unbiased Risk Estimate (SURE) to select among and average candidate shrinkage estimators (including nonlinear rules) without committing to a single prior, covariance model, or adjacency structure. Under regularity conditions placed directly on the estimator maps, it claims that SURE selection achieves near-oracle performance relative to the best rule in the class and that the SURE-chosen weighted average performs nearly as well as the best fixed weighted average. An application to Opportunity Atlas economic mobility data across 20 commuting zones reports that the SURE-chosen average reduces estimated MSE by about 27% relative to the best non-spatial empirical Bayes benchmark, with the best individual spatial specification varying across zones.

Significance. If the regularity conditions hold and the near-oracle guarantees transfer, the approach provides a practical, assumption-lean tool for applied researchers facing model uncertainty over spatial relatedness in settings with many noisy unit-level estimates. The explicit allowance for nonlinear shrinkage rules within the averaging procedure and the reproducible application to real economic mobility data are strengths that could influence practice in empirical economics and econometrics.

major comments (1)

[Theoretical results and Abstract] Abstract and theoretical results section: The near-oracle performance claims for SURE selection and averaging rest on regularity conditions stated directly on the estimator maps (e.g., uniform Lipschitz continuity, bounded moments, differentiability). The manuscript invokes these conditions to justify the guarantees but provides no explicit verification or diagnostic checks that the candidate maps—including the nonlinear spatial shrinkage rules—satisfy them in the relevant regimes. This verification is load-bearing for transferring the theoretical results to the Opportunity Atlas application and the proposed averaging procedure.

minor comments (3)

The abstract reports a 27% MSE reduction but supplies no error-bar information, standard errors, or sensitivity checks on the SURE estimates themselves.
Derivation details for how SURE is computed and optimized over the candidate class (especially for nonlinear rules that use the full vector of noisy estimates) are not summarized in the abstract and would benefit from a short self-contained sketch or reference to the relevant proposition.
[Application section] Table or figure presenting the zone-by-zone best specifications and the SURE weights would clarify the claim that the best individual spatial specification varies across zones.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment. We agree that explicit verification of the regularity conditions would strengthen the transfer of the theoretical results to the application and propose to add this in revision.

read point-by-point responses

Referee: Abstract and theoretical results section: The near-oracle performance claims for SURE selection and averaging rest on regularity conditions stated directly on the estimator maps (e.g., uniform Lipschitz continuity, bounded moments, differentiability). The manuscript invokes these conditions to justify the guarantees but provides no explicit verification or diagnostic checks that the candidate maps—including the nonlinear spatial shrinkage rules—satisfy them in the relevant regimes. This verification is load-bearing for transferring the theoretical results to the Opportunity Atlas application and the proposed averaging procedure.

Authors: We agree that providing explicit verification would improve the manuscript. The regularity conditions are deliberately placed on the estimator maps themselves rather than on the data-generating process, which is central to the assumption-lean framing. For the linear candidates these conditions hold under the boundedness assumptions already maintained in the paper. For the nonlinear spatial shrinkage rules, which are trained on the full vector of estimates, we will add an appendix containing numerical diagnostics: empirical estimates of the Lipschitz constants over the observed range of the data, checks on moment bounds via bootstrap resampling of the noisy estimates, and verification of differentiability at the fitted points. These checks will be reported for the specific candidate maps and sample sizes appearing in the Opportunity Atlas application. We view this addition as a straightforward and useful revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; SURE applied as external benchmark to candidate maps

full rationale

The derivation relies on standard SURE applied to a class of estimator maps under explicitly stated regularity conditions (uniform Lipschitz continuity, bounded moments, differentiability). These conditions are external to the fitted values and are not shown to be satisfied by construction via the paper's own equations. No self-citation chains, fitted-input-renamed-as-prediction, or ansatz smuggling appear in the provided text; the Opportunity Atlas application is presented as an out-of-sample illustration rather than a definitional reduction. The central near-oracle claim therefore remains independent of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract supplies almost no explicit free parameters or invented entities; the central theoretical claims rest on regularity conditions whose precise content is not given.

axioms (1)

domain assumption Regularity conditions stated directly on the estimator maps
Invoked to guarantee that SURE selection and averaging perform nearly as well as the best candidate.

pith-pipeline@v0.9.1-grok · 5729 in / 1319 out tokens · 20622 ms · 2026-06-27T07:28:18.438207+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 1 linked inside Pith

[1]

Choosing Among Regularized Estimators in Empirical Economics: The Risk of Machine Learning,

ABADIE, A.ANDM. KASY(2019): “Choosing Among Regularized Estimators in Empirical Economics: The Risk of Machine Learning,”Review of Economics and Statistics, 101, 743–762. ADUSUMILLI, K., M. KASY,ANDA. WILSON(2026): “From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators,” ArXiv:2603.20388. 34 ANDREWS, I., T. KITAGAWA,ANDA. MCCLOSKE...

arXiv 2019
[2]

Empirical Bayes When Estimation Precision Predicts Parameters,

CHEN, J. (2026): “Empirical Bayes When Estimation Precision Predicts Parameters,” Econometrica, 94, 305–340. CHEN, J., L. LEI, T. SUDIJONO, L. SUN,ANDT. XIE(2025): “Compound Selection Decisions: An Almost SURE Approach,” ArXiv:2511.11862. CHETTY, R., J. N. FRIEDMAN, N. HENDREN, M. R. JONES,ANDS. R. PORTER(2026): “The Opportunity Atlas: Mapping the Childho...

Pith/arXiv arXiv 2026
[3]

General Maximum Likelihood Empirical Bayes Estimation of Normal Means,

JIANG, W.ANDC.-H. ZHANG(2009): “General Maximum Likelihood Empirical Bayes Estimation of Normal Means,”The Annals of Statistics, 37, 1647–1684. KANE, T. J.ANDD. O. STAIGER(2008): “Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation,” Working Paper 14607, National Bureau of Economic Research. KIEFER, J.ANDJ. WOLFOWITZ(1956): “Cons...

arXiv 2009
[4]

The present paper keeps MSE as the common estimation target because the same latent mobility estimates can feed many downstream analyses

and direct welfare optimization for compound selection in ASSURE (Chen et al., 2025). The present paper keeps MSE as the common estimation target because the same latent mobility estimates can feed many downstream analyses. A.2.Regularity Framework and Sobolev Extensions.The regularity conditions in Assumption 3.2 can be relaxed. Assumption 3.2 is a point...

2025
[5]

Lemma A.6(Pointwise polynomial regularity implies Sobolev moment regularity)

IfΓis a singleton, the supremum is interpreted as zero. Lemma A.6(Pointwise polynomial regularity implies Sobolev moment regularity). Under Assumption 3.1, Assumption 3.2 implies Assumption A.5 atk= 0, up to constants depending only on fixed sampling and envelope constants. Proof of Lemma A.6.Throughout this proof,≲hides constants that may depend onβ,C θ,...

2018
[6]

By the tail characterization in 43 Vershynin (2018, Proposition 2.5.2), for everyt≥0, Pr ∥Z∥2 −E[∥Z∥ 2] ≥t ≤2 exp(−ct

This gives a universal sub-Gaussian bound on∥Z∥ 2 −E[∥Z∥ 2]. By the tail characterization in 43 Vershynin (2018, Proposition 2.5.2), for everyt≥0, Pr ∥Z∥2 −E[∥Z∥ 2] ≥t ≤2 exp(−ct

2018
[7]

This exponent equals4 +βatk= 0and decreases toward1 +βask increases

For any fixed kwith verified higher-derivative moment bounds, the dimension exponent in Theorem A.7 is1 + 3·2 −k +β. This exponent equals4 +βatk= 0and decreases toward1 +βask increases. Theorem A.7 takes Assumption A.5 as its regularity condition. Assumption 3.2 is used as the simpler sufficient condition for thek= 0case through Lemma A.6; higher values o...

2018
[8]

In particular, since ˜ψα ≤ψ α pointwise, the first inequality holds with constant one, and the increments satisfy∥X γ −X γ′∥ ˜ψα ≤ρ(γ, γ ′)

Forα∈(0,1),ψ α is not convex near the origin and∥ · ∥ ψα is only a quasi-norm; however, there is a convex, nondecreasing ˜ψα with ˜ψα(0) = 0and ∥X∥ ˜ψα ≤ ∥X∥ ψα ≤κ α∥X∥ ˜ψα for every random variableX (van der Vaart and Wellner, 2023, Problem 2.14.1), so the two norms may be used inter- changeably up toα-dependent constants. In particular, since ˜ψα ≤ψ α p...

2023
[9]

The standard volumetric bound (Vershynin, 2018, Corollary 4.2.13) gives covering numbersN(u; Γ, ρ)≤(C∆/u) dΓ foru∈(0,∆], hence packing numbersD(u; Γ, ρ)≤N(u/2; Γ, ρ)≤(2C∆/u) dΓ. The maximal inequality for processes with Orlicz-Lipschitz increments (van der Vaart and Wellner, 2023, Theorem 2.2.4 and Corollary 2.2.5), applied with the convex function ˜ψα (t...

2018
[10]

The derivative term inΨ(g)is integrable by assumption, and⟨ε, g(Y)⟩is integrable by Cauchy–Schwarz becauseεis Gaussian andg(Y)∈L 2(PY )

The∥g(Y)∥ 2 2 terms cancel in the difference, giving the representation. The derivative term inΨ(g)is integrable by assumption, and⟨ε, g(Y)⟩is integrable by Cauchy–Schwarz becauseεis Gaussian andg(Y)∈L 2(PY ). Stein’s lemma for weak derivatives gives E[εjgj(Y)] = X l Σjl E[∂lgj(Y)]. Summing overjand using symmetry ofΣand the row-Jacobian convention gives ...

2006
[11]

Nualart obtains this case by duality

Now suppose1< p <2, and setq=p/(p−1)>2. Nualart obtains this case by duality. The second equality below is the duality step used in Nualart (2006, Exercise 1.4.5). Explicitly, ∥Tt(I−J 0 −J 1)G∥p = sup ∥H∥ q≤1 |E⟨Tt(I−J 0 −J 1)G, H⟩| = sup ∥H∥ q≤1 |E⟨G, Tt(I−J 0 −J 1)H⟩| ≤sup ∥H∥ q≤1 ∥G∥p∥Tt(I−J 0 −J 1)H∥ q ≤K(q,2)e −2t∥G∥p. 50 By theq >2case already prove...

2006
[12]

Therefore K(p,2)≤cq 2 =c p p−1 2 . Although the proof is stated for polynomial random variables, Exercise 1.4.6 of Nualart (2006) extends the argument to Hilbert-valued random variables, which covers the finite-dimensional Euclidean arrays used here.□ Lemma A.12(Boundedness of √ R).Fixp≥2, and setq=p/(p−1). LetVbe a finite-dimensional Euclidean array spac...

2006
[13]

The proof of Nualart (2006, Theorem 1.4.2) gives this formula for the operator with multiplierr −k on chaos orderr≥2

Fork≥1, define SkeG= 1 (k−1)! Z ∞ 0 tk−1Tt(I−J 0 −J 1)eG dt. The proof of Nualart (2006, Theorem 1.4.2) gives this formula for the operator with multiplierr −k on chaos orderr≥2. Thus, for polynomial eG, SkeG= ∞X r=2 r−kJreG, where the sum is finite. Using Nualart Lemma 1.4.1 withN= 2and Lemma A.11, with the endpointq= 2covered by itsp≥2case, ∥Tt(I−J 0 −J...

2006
[14]

Theorem 1.4.2 of Nualart (2006) gives the correspondingL q multiplier bound, and Nualart (2006, Exercise 1.4.6) extends the multiplier theorem to Hilbert-valued random variables

Thus, for polynomial array-valued eG, ∥ √ ReG∥q ≤cp 2√ 2∥eG∥q. Theorem 1.4.2 of Nualart (2006) gives the correspondingL q multiplier bound, and Nualart (2006, Exercise 1.4.6) extends the multiplier theorem to Hilbert-valued random variables. SinceVis finite-dimensional Euclidean, this applies toV-valued eGand gives ∥ √ ReG∥q ≤cp 2√ 2∥eG∥q. 53 Since this e...

2006
[15]

Proof.Letq=p/(p−1)

Then, for every integer 0≤m≤k, ∥δ(Dmh)∥Lp(PZ) ≤c 0p4 ∥Dmh∥Lp(PZ) +∥D m+1h∥Lp(PZ) . Proof.Letq=p/(p−1). We prove the base bound in two steps. First, we follow the divergence-continuity proof in Nualart (2006, Proposition 1.5.4), keeping explicit the constantK q in Meyer’s inequality ∥DF∥ q ≤K q∥CF∥ q, for mean-zeroF, whereDdenotes differentiation with resp...

2006
[16]

Thus the first-chaos part of eGdoes not contribute to the centered term

Therefore E hD eu, DeG Ei =E hD eu, DeG−E(D eG) Ei . Thus the first-chaos part of eGdoes not contribute to the centered term. In the operator notation below, this removal is handled byR, which has multiplier zero on chaos orders 0and1. Thus we keep eG=G−EGin the notation. The identity used in Nualart (2006, Proposition 1.5.4) gives E hD eu, DeG Ei = E hD ...

2006
[17]

Ba ˜nuelos (2010, Section 3.2) records that real- valued Riesz-transform bounds extend to Hilbert-valued functions

for the real-valued Gaussian Riesz transformDC −1: ∥DC −1∥q→q ≤2(q ∗ −1), q ∗ = max{q, q/(q−1)}, on mean-zero GaussianL q functions. Ba ˜nuelos (2010, Section 3.2) records that real- valued Riesz-transform bounds extend to Hilbert-valued functions. Since the derivative arrays here take values in finite-dimensional Euclidean spaces, the Hilbert-valued form...

2010
[18]

In this display, (a) applies Lemma A.15, (b) applies the induction hypothesis, (c) uses the Sobolev norm comparisons stated below, (d) usesp−1≤p, and (e) usesp≤p 2+3·2−(r−1) forp≥2

∥δ(Dmh)∥2 p ≤(p−1)∥D mh∥2 p + (p−1)∥D mh∥p ∥δ(Dm+1h)∥p (a) ≤(p−1)∥D mh∥2 p + (p−1)c r−1p1+3·2−(r−1) ∥Dmh∥p∥Dm+1h∥W r,p(PZ) (b) ≤(p−1)∥D mh∥2 W r+1,p(PZ) + (p−1)c r−1p1+3·2−(r−1) ∥Dmh∥2 W r+1,p(PZ) (c) ≤p∥D mh∥2 W r+1,p(PZ) +c r−1p2+3·2−(r−1) ∥Dmh∥2 W r+1,p(PZ) (d) ≤p 2+3·2−(r−1) ∥Dmh∥2 W r+1,p(PZ) +c r−1p2+3·2−(r−1) ∥Dmh∥2 W r+1,p(PZ) (e) = (1 +c r−1)p2+3...

2018
[19]

RemarkA.19 (Bounded-envelope benchmark).The caseβ= 0in Theorem 3.3 covers families whose shrinkage adjustments and parameter increments are uniformly bounded iny

toc p =O(p 1+3·2−k ). RemarkA.19 (Bounded-envelope benchmark).The caseβ= 0in Theorem 3.3 covers families whose shrinkage adjustments and parameter increments are uniformly bounded iny. A globally Lipschitz smoother can still have a linearly growing adjustment, for exampleg(y) = (S−I)y, and such cases are handled by the polynomial-envelope condition, typic...

2021
[20]

Suppose the scale map is bounded away from zero: si(y)≥s min >0, y∈R n, i= 1,

Thusϕ{g(y)}satisfies Definition B.8.□ Lemma B.11(Standardization maps).Letm, s:R n →R n satisfy the pointwise polyno- mial regularity condition in Definition B.8. Suppose the scale map is bounded away from zero: si(y)≥s min >0, y∈R n, i= 1, . . . , n. 86 Define the standardized map g(y) ={y−m(y)} ⊙ {1/s(y)}, where1/s(y)denotes componentwise reciprocals. T...

2026
[21]

Proof of Lemma B.18.First bound the value of the adjustment itself, namely the∥g(y)∥ 2 term in the displayed envelope

The operator norm is bounded by the Frobenius norm, so the same upper bound applies to ∥K(y+ ∆)−K(y)∥ op.□ Lemma B.18(Pointwise polynomial envelope).There is a finite constantC= C(λ, σ2,¯σ2), independent ofnandy, such that ∥g(y)∥2 +∥Dg(y)∥ F ≤Cn(1 +∥y∥ 2). Proof of Lemma B.18.First bound the value of the adjustment itself, namely the∥g(y)∥ 2 term in the d...

2015
[22]

The reported standardGPcandi- dates use learning rate0.02for 100 epochs; theGP-BILATcandidates use learning rate0.10 for 100 epochs

with cosine learning-rate schedule, noGPweight decay, and gradient clipping at norm1.0. The reported standardGPcandi- dates use learning rate0.02for 100 epochs; theGP-BILATcandidates use learning rate0.10 for 100 epochs. For theGP-BILATcandidates, the value-similarity component is computed after standardizing by the reported standard error and applying an...

1990
[23]

Local Nadaraya–Watson preprocessing is used forCLOSE-GAUSSand theGPcandidates, following the precision-based standardization in Chen (2026). Nadaraya–Watson weights 95 based on log reported variance define a local conditional meanˆµ i and standard deviation ˆsi, the outcome is standardized to ˜Yi = (Y i −ˆµi)/ˆsi before prediction, and the prediction is t...

2026
[24]

1 n nX i=1 (Y † i −K)S i(Y ∗) # =E

FIGURE7.SURE-estimated MSE ratio versus coupled-bootstrap risk ratio for Pittsburgh (B= 100,α= 0.1). The vertical coordinate divides reportedSUREby the raw-MLEMSE benchmark; the horizontal coordinate divides the coupled-bootstrap risk estimate by the coupled-bootstrap risk estimate for the rawMLE. Points lie close to, and systematically above, the 45-degr...

2025

[1] [1]

Choosing Among Regularized Estimators in Empirical Economics: The Risk of Machine Learning,

ABADIE, A.ANDM. KASY(2019): “Choosing Among Regularized Estimators in Empirical Economics: The Risk of Machine Learning,”Review of Economics and Statistics, 101, 743–762. ADUSUMILLI, K., M. KASY,ANDA. WILSON(2026): “From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators,” ArXiv:2603.20388. 34 ANDREWS, I., T. KITAGAWA,ANDA. MCCLOSKE...

arXiv 2019

[2] [2]

Empirical Bayes When Estimation Precision Predicts Parameters,

CHEN, J. (2026): “Empirical Bayes When Estimation Precision Predicts Parameters,” Econometrica, 94, 305–340. CHEN, J., L. LEI, T. SUDIJONO, L. SUN,ANDT. XIE(2025): “Compound Selection Decisions: An Almost SURE Approach,” ArXiv:2511.11862. CHETTY, R., J. N. FRIEDMAN, N. HENDREN, M. R. JONES,ANDS. R. PORTER(2026): “The Opportunity Atlas: Mapping the Childho...

Pith/arXiv arXiv 2026

[3] [3]

General Maximum Likelihood Empirical Bayes Estimation of Normal Means,

JIANG, W.ANDC.-H. ZHANG(2009): “General Maximum Likelihood Empirical Bayes Estimation of Normal Means,”The Annals of Statistics, 37, 1647–1684. KANE, T. J.ANDD. O. STAIGER(2008): “Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation,” Working Paper 14607, National Bureau of Economic Research. KIEFER, J.ANDJ. WOLFOWITZ(1956): “Cons...

arXiv 2009

[4] [4]

The present paper keeps MSE as the common estimation target because the same latent mobility estimates can feed many downstream analyses

and direct welfare optimization for compound selection in ASSURE (Chen et al., 2025). The present paper keeps MSE as the common estimation target because the same latent mobility estimates can feed many downstream analyses. A.2.Regularity Framework and Sobolev Extensions.The regularity conditions in Assumption 3.2 can be relaxed. Assumption 3.2 is a point...

2025

[5] [5]

Lemma A.6(Pointwise polynomial regularity implies Sobolev moment regularity)

IfΓis a singleton, the supremum is interpreted as zero. Lemma A.6(Pointwise polynomial regularity implies Sobolev moment regularity). Under Assumption 3.1, Assumption 3.2 implies Assumption A.5 atk= 0, up to constants depending only on fixed sampling and envelope constants. Proof of Lemma A.6.Throughout this proof,≲hides constants that may depend onβ,C θ,...

2018

[6] [6]

By the tail characterization in 43 Vershynin (2018, Proposition 2.5.2), for everyt≥0, Pr ∥Z∥2 −E[∥Z∥ 2] ≥t ≤2 exp(−ct

This gives a universal sub-Gaussian bound on∥Z∥ 2 −E[∥Z∥ 2]. By the tail characterization in 43 Vershynin (2018, Proposition 2.5.2), for everyt≥0, Pr ∥Z∥2 −E[∥Z∥ 2] ≥t ≤2 exp(−ct

2018

[7] [7]

This exponent equals4 +βatk= 0and decreases toward1 +βask increases

For any fixed kwith verified higher-derivative moment bounds, the dimension exponent in Theorem A.7 is1 + 3·2 −k +β. This exponent equals4 +βatk= 0and decreases toward1 +βask increases. Theorem A.7 takes Assumption A.5 as its regularity condition. Assumption 3.2 is used as the simpler sufficient condition for thek= 0case through Lemma A.6; higher values o...

2018

[8] [8]

In particular, since ˜ψα ≤ψ α pointwise, the first inequality holds with constant one, and the increments satisfy∥X γ −X γ′∥ ˜ψα ≤ρ(γ, γ ′)

Forα∈(0,1),ψ α is not convex near the origin and∥ · ∥ ψα is only a quasi-norm; however, there is a convex, nondecreasing ˜ψα with ˜ψα(0) = 0and ∥X∥ ˜ψα ≤ ∥X∥ ψα ≤κ α∥X∥ ˜ψα for every random variableX (van der Vaart and Wellner, 2023, Problem 2.14.1), so the two norms may be used inter- changeably up toα-dependent constants. In particular, since ˜ψα ≤ψ α p...

2023

[9] [9]

The standard volumetric bound (Vershynin, 2018, Corollary 4.2.13) gives covering numbersN(u; Γ, ρ)≤(C∆/u) dΓ foru∈(0,∆], hence packing numbersD(u; Γ, ρ)≤N(u/2; Γ, ρ)≤(2C∆/u) dΓ. The maximal inequality for processes with Orlicz-Lipschitz increments (van der Vaart and Wellner, 2023, Theorem 2.2.4 and Corollary 2.2.5), applied with the convex function ˜ψα (t...

2018

[10] [10]

The derivative term inΨ(g)is integrable by assumption, and⟨ε, g(Y)⟩is integrable by Cauchy–Schwarz becauseεis Gaussian andg(Y)∈L 2(PY )

The∥g(Y)∥ 2 2 terms cancel in the difference, giving the representation. The derivative term inΨ(g)is integrable by assumption, and⟨ε, g(Y)⟩is integrable by Cauchy–Schwarz becauseεis Gaussian andg(Y)∈L 2(PY ). Stein’s lemma for weak derivatives gives E[εjgj(Y)] = X l Σjl E[∂lgj(Y)]. Summing overjand using symmetry ofΣand the row-Jacobian convention gives ...

2006

[11] [11]

Nualart obtains this case by duality

Now suppose1< p <2, and setq=p/(p−1)>2. Nualart obtains this case by duality. The second equality below is the duality step used in Nualart (2006, Exercise 1.4.5). Explicitly, ∥Tt(I−J 0 −J 1)G∥p = sup ∥H∥ q≤1 |E⟨Tt(I−J 0 −J 1)G, H⟩| = sup ∥H∥ q≤1 |E⟨G, Tt(I−J 0 −J 1)H⟩| ≤sup ∥H∥ q≤1 ∥G∥p∥Tt(I−J 0 −J 1)H∥ q ≤K(q,2)e −2t∥G∥p. 50 By theq >2case already prove...

2006

[12] [12]

Therefore K(p,2)≤cq 2 =c p p−1 2 . Although the proof is stated for polynomial random variables, Exercise 1.4.6 of Nualart (2006) extends the argument to Hilbert-valued random variables, which covers the finite-dimensional Euclidean arrays used here.□ Lemma A.12(Boundedness of √ R).Fixp≥2, and setq=p/(p−1). LetVbe a finite-dimensional Euclidean array spac...

2006

[13] [13]

The proof of Nualart (2006, Theorem 1.4.2) gives this formula for the operator with multiplierr −k on chaos orderr≥2

Fork≥1, define SkeG= 1 (k−1)! Z ∞ 0 tk−1Tt(I−J 0 −J 1)eG dt. The proof of Nualart (2006, Theorem 1.4.2) gives this formula for the operator with multiplierr −k on chaos orderr≥2. Thus, for polynomial eG, SkeG= ∞X r=2 r−kJreG, where the sum is finite. Using Nualart Lemma 1.4.1 withN= 2and Lemma A.11, with the endpointq= 2covered by itsp≥2case, ∥Tt(I−J 0 −J...

2006

[14] [14]

Theorem 1.4.2 of Nualart (2006) gives the correspondingL q multiplier bound, and Nualart (2006, Exercise 1.4.6) extends the multiplier theorem to Hilbert-valued random variables

Thus, for polynomial array-valued eG, ∥ √ ReG∥q ≤cp 2√ 2∥eG∥q. Theorem 1.4.2 of Nualart (2006) gives the correspondingL q multiplier bound, and Nualart (2006, Exercise 1.4.6) extends the multiplier theorem to Hilbert-valued random variables. SinceVis finite-dimensional Euclidean, this applies toV-valued eGand gives ∥ √ ReG∥q ≤cp 2√ 2∥eG∥q. 53 Since this e...

2006

[15] [15]

Proof.Letq=p/(p−1)

Then, for every integer 0≤m≤k, ∥δ(Dmh)∥Lp(PZ) ≤c 0p4 ∥Dmh∥Lp(PZ) +∥D m+1h∥Lp(PZ) . Proof.Letq=p/(p−1). We prove the base bound in two steps. First, we follow the divergence-continuity proof in Nualart (2006, Proposition 1.5.4), keeping explicit the constantK q in Meyer’s inequality ∥DF∥ q ≤K q∥CF∥ q, for mean-zeroF, whereDdenotes differentiation with resp...

2006

[16] [16]

Thus the first-chaos part of eGdoes not contribute to the centered term

Therefore E hD eu, DeG Ei =E hD eu, DeG−E(D eG) Ei . Thus the first-chaos part of eGdoes not contribute to the centered term. In the operator notation below, this removal is handled byR, which has multiplier zero on chaos orders 0and1. Thus we keep eG=G−EGin the notation. The identity used in Nualart (2006, Proposition 1.5.4) gives E hD eu, DeG Ei = E hD ...

2006

[17] [17]

Ba ˜nuelos (2010, Section 3.2) records that real- valued Riesz-transform bounds extend to Hilbert-valued functions

for the real-valued Gaussian Riesz transformDC −1: ∥DC −1∥q→q ≤2(q ∗ −1), q ∗ = max{q, q/(q−1)}, on mean-zero GaussianL q functions. Ba ˜nuelos (2010, Section 3.2) records that real- valued Riesz-transform bounds extend to Hilbert-valued functions. Since the derivative arrays here take values in finite-dimensional Euclidean spaces, the Hilbert-valued form...

2010

[18] [18]

In this display, (a) applies Lemma A.15, (b) applies the induction hypothesis, (c) uses the Sobolev norm comparisons stated below, (d) usesp−1≤p, and (e) usesp≤p 2+3·2−(r−1) forp≥2

∥δ(Dmh)∥2 p ≤(p−1)∥D mh∥2 p + (p−1)∥D mh∥p ∥δ(Dm+1h)∥p (a) ≤(p−1)∥D mh∥2 p + (p−1)c r−1p1+3·2−(r−1) ∥Dmh∥p∥Dm+1h∥W r,p(PZ) (b) ≤(p−1)∥D mh∥2 W r+1,p(PZ) + (p−1)c r−1p1+3·2−(r−1) ∥Dmh∥2 W r+1,p(PZ) (c) ≤p∥D mh∥2 W r+1,p(PZ) +c r−1p2+3·2−(r−1) ∥Dmh∥2 W r+1,p(PZ) (d) ≤p 2+3·2−(r−1) ∥Dmh∥2 W r+1,p(PZ) +c r−1p2+3·2−(r−1) ∥Dmh∥2 W r+1,p(PZ) (e) = (1 +c r−1)p2+3...

2018

[19] [19]

RemarkA.19 (Bounded-envelope benchmark).The caseβ= 0in Theorem 3.3 covers families whose shrinkage adjustments and parameter increments are uniformly bounded iny

toc p =O(p 1+3·2−k ). RemarkA.19 (Bounded-envelope benchmark).The caseβ= 0in Theorem 3.3 covers families whose shrinkage adjustments and parameter increments are uniformly bounded iny. A globally Lipschitz smoother can still have a linearly growing adjustment, for exampleg(y) = (S−I)y, and such cases are handled by the polynomial-envelope condition, typic...

2021

[20] [20]

Suppose the scale map is bounded away from zero: si(y)≥s min >0, y∈R n, i= 1,

Thusϕ{g(y)}satisfies Definition B.8.□ Lemma B.11(Standardization maps).Letm, s:R n →R n satisfy the pointwise polyno- mial regularity condition in Definition B.8. Suppose the scale map is bounded away from zero: si(y)≥s min >0, y∈R n, i= 1, . . . , n. 86 Define the standardized map g(y) ={y−m(y)} ⊙ {1/s(y)}, where1/s(y)denotes componentwise reciprocals. T...

2026

[21] [21]

Proof of Lemma B.18.First bound the value of the adjustment itself, namely the∥g(y)∥ 2 term in the displayed envelope

The operator norm is bounded by the Frobenius norm, so the same upper bound applies to ∥K(y+ ∆)−K(y)∥ op.□ Lemma B.18(Pointwise polynomial envelope).There is a finite constantC= C(λ, σ2,¯σ2), independent ofnandy, such that ∥g(y)∥2 +∥Dg(y)∥ F ≤Cn(1 +∥y∥ 2). Proof of Lemma B.18.First bound the value of the adjustment itself, namely the∥g(y)∥ 2 term in the d...

2015

[22] [22]

The reported standardGPcandi- dates use learning rate0.02for 100 epochs; theGP-BILATcandidates use learning rate0.10 for 100 epochs

with cosine learning-rate schedule, noGPweight decay, and gradient clipping at norm1.0. The reported standardGPcandi- dates use learning rate0.02for 100 epochs; theGP-BILATcandidates use learning rate0.10 for 100 epochs. For theGP-BILATcandidates, the value-similarity component is computed after standardizing by the reported standard error and applying an...

1990

[23] [23]

Local Nadaraya–Watson preprocessing is used forCLOSE-GAUSSand theGPcandidates, following the precision-based standardization in Chen (2026). Nadaraya–Watson weights 95 based on log reported variance define a local conditional meanˆµ i and standard deviation ˆsi, the outcome is standardized to ˜Yi = (Y i −ˆµi)/ˆsi before prediction, and the prediction is t...

2026

[24] [24]

1 n nX i=1 (Y † i −K)S i(Y ∗) # =E

FIGURE7.SURE-estimated MSE ratio versus coupled-bootstrap risk ratio for Pittsburgh (B= 100,α= 0.1). The vertical coordinate divides reportedSUREby the raw-MLEMSE benchmark; the horizontal coordinate divides the coupled-bootstrap risk estimate by the coupled-bootstrap risk estimate for the rawMLE. Points lie close to, and systematically above, the 45-degr...

2025