Nonparametric Bayesian Policy Learning

Haonan Ye

arxiv: 2605.17068 · v1 · pith:2IDOMJ7Inew · submitted 2026-05-16 · 💰 econ.EM

Nonparametric Bayesian Policy Learning

Haonan Ye This is my paper

Pith reviewed 2026-05-20 15:17 UTC · model grok-4.3

classification 💰 econ.EM

keywords nonparametric Bayesian inferencepolicy learningtreatment choicewelfare maximizationDirichlet process priorminimax regretposterior consistencyreduced-form distribution

0 comments

The pith

Placing a Dirichlet process prior on the reduced-form distribution lets a decision maker select welfare-maximizing treatments with uncertainty quantified at the minimax-optimal regret rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Nonparametric Bayesian Policy Learning as a way for a decision maker to choose treatments that maximize expected welfare while properly accounting for uncertainty. The central move is to treat all welfare-relevant uncertainty as coming from uncertainty about the reduced-form distribution of outcomes and covariates, then place a nonparametric Dirichlet process prior on that distribution. The resulting posterior is used to infer optimal assignments, optimal welfare levels, and which policy class is best. Two key guarantees follow: the posterior welfare regret converges at the minimax-optimal rate, and posterior probabilities correctly rank policy classes as sample size grows. This matters for applied work because the method remains tractable through the Bayesian bootstrap and requires no parametric restrictions on the underlying distribution.

Core claim

For a fixed welfare criterion and policy class, all uncertainty about welfare-relevant objects is induced solely by uncertainty about the reduced-form distribution. Placing a nonparametric Dirichlet process prior on this reduced-form parameter and updating to the posterior delivers inference on optimal treatment rules, optimal welfare, and comparisons across policy classes. Posterior welfare regret under this procedure converges at the minimax-optimal rate, and posterior model comparison across policy classes is pointwise consistent.

What carries the argument

The Dirichlet process prior placed directly on the reduced-form distribution, which induces the posterior over welfare quantities and optimal treatment rules.

If this is right

Treatment rules selected from the posterior achieve the best possible rate of welfare regret without parametric assumptions on the data distribution.
Posterior probabilities over policy classes become reliable for ranking which class contains the highest-welfare rule.
The Bayesian bootstrap delivers a computationally simple way to sample from the posterior and obtain uncertainty statements for any welfare criterion.
The same posterior can be reused to compare entirely different policy classes without re-estimating the underlying distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be extended to sequential treatment decisions by updating the reduced-form posterior after each period.
Because only the reduced-form distribution is modeled nonparametrically, the method may serve as a benchmark for checking whether structural assumptions in more complex models change policy recommendations.
High-dimensional covariate settings could be tested to see whether the Dirichlet process prior still yields practical posterior concentration when many characteristics are available.
Large-sample equivalence with frequentist policy-learning estimators might be established by showing that the Bayesian bootstrap intervals match the corresponding frequentist confidence sets.

Load-bearing premise

That every source of uncertainty relevant to welfare maximization is fully captured by uncertainty in the reduced-form distribution alone.

What would settle it

A Monte Carlo experiment in which the posterior mean regret fails to shrink at the known minimax rate for the given policy class as sample size increases, or in which posterior odds between two policy classes converge to the wrong limit.

Figures

Figures reproduced from arXiv: 2605.17068 by Haonan Ye.

**Figure 1.** Figure 1: reports the posterior distribution of optimal welfare under NBPL in the absence of treatment costs. Relative to Kitagawa and Tetenov (2018), the analysis additionally considers decision-tree rules and supports posterior model comparison across policy classes. The left panel plots the empirical cumulative distribution functions (CDFs) of W⋆ Glin (P) and W⋆ Gtree,2 (P) across posterior draws. The distributio… view at source ↗

**Figure 2.** Figure 2: Posterior Welfare Comparison (No Capacity Constraint): Linear v.s. Tree Policies [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Posterior Welfare Comparison (70% Capacity Constraint): Linear v.s. Tree Policies 12 [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: reports the posterior distribution of optimal welfare under a $774 treatment cost per individual. As in [PITH_FULL_IMAGE:figures/full_fig_p032_4.png] view at source ↗

**Figure 5.** Figure 5: EWM Linear Rules (Treatment Regions) 33 [PITH_FULL_IMAGE:figures/full_fig_p034_5.png] view at source ↗

**Figure 6.** Figure 6: EWM (depth ≤ 2 decision-tree) optimal treatment rules for the JTPA data, conditioning on years of education (edu) and pre-program annual earnings (prevearn). Top: no treatment cost ( [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗

**Figure 7.** Figure 7: EWM (depth ≤ 2 decision-tree) optimal treatment rules for the Bhattacharya and Dupas (2012) data, conditioning on number of children under 10 (young_child), bank account ownership (bank_account), and log household wealth per capita (logwealth). Top: no capacity constraint ( [PITH_FULL_IMAGE:figures/full_fig_p036_7.png] view at source ↗

read the original abstract

I propose Nonparametric Bayesian Policy Learning (NBPL) as a framework for uncertainty-aware treatment choice. I consider a decision-maker (DM) seeking to select an expected welfare-maximizing treatment rule using observable characteristics. A key observation is that, for a given welfare criterion and policy class, uncertainty about welfare-relevant objects is entirely induced by uncertainty about a reduced-form distribution. I assume the DM places a nonparametric Dirichlet process prior on this reduced-form parameter and uses the resulting posterior to conduct inference on optimal treatment assignments, optimal welfare, and comparisons across policy classes. The NBPL framework is flexible, and its implementation via the Bayesian bootstrap is highly tractable. I establish two main theoretical properties of NBPL. First, posterior welfare regret under NBPL converges at the minimax-optimal rate. Second, posterior model comparison across policy classes is pointwise consistent. I illustrate NBPL in two empirical applications: the bednet subsidy experiment of Bhattacharya and Dupas (2012) and the JTPA experiment studied by Kitagawa and Tetenov (2018).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a clean Bayesian nonparametric route to treatment choice that reduces everything to a Dirichlet process on the reduced form and claims minimax-optimal posterior regret.

read the letter

The main thing to know is that Ye puts a Dirichlet process prior on the reduced-form distribution and then uses the posterior to pick policies, compare classes, and quantify welfare uncertainty. The two headline results are that posterior regret converges at the minimax rate and that posterior model comparison is pointwise consistent. Both follow from standard nonparametric Bayesian contraction arguments once the problem is reduced to functionals of the distribution, so the claims are plausible on their face.

Referee Report

2 major / 2 minor

Summary. The paper proposes Nonparametric Bayesian Policy Learning (NBPL) as a framework for uncertainty-aware treatment choice. It observes that, for a given welfare criterion and policy class, uncertainty about welfare-relevant objects is entirely induced by uncertainty about a reduced-form distribution. The decision-maker places a nonparametric Dirichlet process prior on this reduced-form parameter and uses the resulting posterior to conduct inference on optimal treatment assignments, optimal welfare, and comparisons across policy classes. The framework is implemented via the Bayesian bootstrap. Two main theoretical results are established: posterior welfare regret under NBPL converges at the minimax-optimal rate, and posterior model comparison across policy classes is pointwise consistent. The method is illustrated in applications to the bednet subsidy experiment of Bhattacharya and Dupas (2012) and the JTPA experiment of Kitagawa and Tetenov (2018).

Significance. If the central claims hold, NBPL contributes a tractable Bayesian nonparametric method for policy learning that directly incorporates posterior uncertainty over the reduced-form distribution. The minimax-optimal convergence of posterior welfare regret and the pointwise consistency of posterior model comparison are strengths, as they leverage standard Dirichlet process contraction properties for smooth functionals while remaining computationally feasible via the Bayesian bootstrap. This approach offers a coherent way to quantify uncertainty in treatment choice problems.

major comments (2)

[§2] The key modeling assumption that all welfare uncertainty is induced solely by reduced-form uncertainty (stated in the abstract and §2) is load-bearing for reducing the problem to a standard nonparametric Bayesian setup; the manuscript should explicitly verify that this holds for the welfare criteria and policy classes considered, including any cases where the welfare functional depends on conditional distributions.
[Theorem 1] Theorem 1 (posterior welfare regret convergence): the claim of minimax optimality requires explicit conditions on the smoothness of the welfare functional and the entropy of the policy class; without these, it is unclear whether the rate is exactly minimax or includes extra logarithmic factors from the Dirichlet process posterior.

minor comments (2)

[§5] In the empirical sections, report the specific values of the Dirichlet process concentration parameter used in the Bayesian bootstrap implementations.
[§2] Notation for the reduced-form distribution and the welfare functional should be introduced with a single consistent symbol early in §2 to avoid later ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation for minor revision. We address each major comment below and outline the revisions we will implement to clarify the manuscript.

read point-by-point responses

Referee: [§2] The key modeling assumption that all welfare uncertainty is induced solely by reduced-form uncertainty (stated in the abstract and §2) is load-bearing for reducing the problem to a standard nonparametric Bayesian setup; the manuscript should explicitly verify that this holds for the welfare criteria and policy classes considered, including any cases where the welfare functional depends on conditional distributions.

Authors: We agree that explicit verification of this assumption is warranted to strengthen the exposition. In the revised manuscript we will add a short subsection in §2 that formally states the assumption and verifies it for the welfare criteria and policy classes used in the theoretical results and the two empirical applications. The verification will explicitly address welfare functionals that depend on conditional distributions by showing that they remain well-defined maps from the reduced-form distribution alone. revision: yes
Referee: [Theorem 1] Theorem 1 (posterior welfare regret convergence): the claim of minimax optimality requires explicit conditions on the smoothness of the welfare functional and the entropy of the policy class; without these, it is unclear whether the rate is exactly minimax or includes extra logarithmic factors from the Dirichlet process posterior.

Authors: We thank the referee for this observation. The current proof of Theorem 1 invokes standard Dirichlet-process contraction rates for smooth functionals, which already deliver the minimax rate under appropriate regularity. To address the concern directly, we will revise the statement of Theorem 1 and the accompanying proof to list the required conditions explicitly: Hölder smoothness of the welfare functional and polynomial covering entropy of the policy class. Under these conditions the posterior welfare regret converges at the minimax-optimal rate without additional logarithmic factors beyond those inherent to the nonparametric Bayesian setup. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper explicitly states the key modeling assumption that uncertainty about welfare-relevant objects is induced solely by uncertainty in the reduced-form distribution, then places a standard Dirichlet process prior on that distribution. The claimed convergence of posterior welfare regret at the minimax-optimal rate and pointwise consistency of model comparison follow from known contraction and consistency results for nonparametric Bayesian procedures applied to smooth functionals; these are not derived by construction from fitted inputs within the paper itself. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear in the derivation chain. The framework is self-contained against external benchmarks in Bayesian nonparametrics.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on a nonparametric prior and the reduced-form uncertainty assumption; no explicit free parameters or invented entities are detailed in the abstract.

free parameters (1)

Dirichlet process concentration parameter
Standard hyperparameter in the nonparametric prior setup, value not specified in abstract.

axioms (1)

domain assumption Uncertainty about welfare-relevant objects is entirely induced by uncertainty about a reduced-form distribution for a given welfare criterion and policy class.
Key observation stated directly in the abstract as foundational to the framework.

pith-pipeline@v0.9.0 · 5698 in / 1238 out tokens · 45643 ms · 2026-05-20T15:17:15.409210+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A key observation is that, for a given welfare criterion and policy class, uncertainty about welfare-relevant objects is entirely induced by uncertainty about a reduced-form distribution. I assume the DM places a nonparametric Dirichlet process prior on this reduced-form parameter
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

posterior welfare regret under NBPL converges at the minimax-optimal rate

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 1 internal anchor

[1]

Accountability and flexibility in public schools: Evidence from Boston’s charters and pilots,

Abdulkadiroğlu, Atila, Joshua D Angrist, Susan M Dynarski, Thomas J Kane, and Parag A Pathak, “Accountability and flexibility in public schools: Evidence from Boston’s charters and pilots,”The Quarterly Journal of Economics, 2011,126(2), 699–748. , , Yusuke Narita, and Parag Pathak, “Breaking ties: Regression discontinuity design meets market design,”Econ...

work page arXiv 2011
[2]

BayesiananalysisofDSGEmodels,

An,SungbaeandFrankSchorfheide,“BayesiananalysisofDSGEmodels,”EconometricReviews, 2007,26(2-4), 113–172. Andrews, Isaiah and Jesse M Shapiro, “Communicating scientific uncertainty via approximate posteriors,”(forthcoming) Econometrica,

work page 2007
[3]

Prediction-powered inference,

Angelopoulos, Anastasios N, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zrnic, “Prediction-powered inference,”Science, 2023,382(6671), 669–674. Angrist, Joshua D., “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records,”The American Economic Review, 1990,80(3), 313–336. Angrist, Joshua D...

work page 2023
[4]

arXiv preprint arXiv:2006.09676 , year=

,NiallKeleher,andJannSpiess,“Machinelearningwhotonudge: causalvspredictivetargeting in a field experiment on student financial aid renewal,”Journal of Econometrics, 2025,249, 105945. 19 , Raj Chetty, and Guido Imbens, “Combining experimental and observational data to estimate treatment effects on long term outcomes,”arXiv preprint arXiv:2006.09676, 2025,4...

work page arXiv 2025
[5]

Optimal decision rules when payoffs are partially identified,

Christensen, Timothy, Hyungsik Roger Moon, and Frank Schorfheide, “Optimal decision rules when payoffs are partially identified,”arXiv preprint arXiv:2204.11748,

work page arXiv
[6]

Program evaluation as a decision problem,

Dehejia, Rajeev H, “Program evaluation as a decision problem,”Journal of Econometrics, 2005, 125(1-2), 141–173. Dupas, Pascaline, “What Matters (and What Does Not) in Households’ Decision to Invest in Malaria Prevention?,”American Economic Review, May 2009,99(2), 224–30. Fang, Ethan X, Zhaoran Wang, and Lan Wang, “Fairness-oriented learning for optimal in...

work page 2005
[7]

Convergence rates of posterior distributions,

, Jayanta K. Ghosh, and Aad W. van der Vaart, “Convergence rates of posterior distributions,” The Annals of Statistics, 2000,28(2), 500 –

work page 2000
[8]

Robust Bayesian inference for set-identified models,

Giacomini, Raffaella and Toru Kitagawa, “Robust Bayesian inference for set-identified models,” Econometrica, 2021,89(4), 1519–1556. Goller, Daniel, Michael Lechner, Tamara Pongratz, and Joachim Wolff, “Active labor market policies for the long-term unemployed: New evidence from causal machine learning,”Labour Economics, 2025,94, 102729. Hahn, P Richard, J...

work page 2021
[9]

Asymptotics for statistical treatment rules,

Hirano, Keisuke and Jack R Porter, “Asymptotics for statistical treatment rules,”Econometrica, 2009,77(5), 1683–1701. and , “Impossibility results for nondifferentiable functionals,”Econometrica, 2012,80(4), 1769–1790. Hoeffding, Wassily, “Probability inequalities for sums of bounded random variables,”Journal of the American Statistical Association, 1963,...

work page arXiv 2009
[10]

Confounding-robust policy improvement,

Kallus, Nathan and Angela Zhou, “Confounding-robust policy improvement,”Advances in Neural Information Processing Systems, 2018,31. Kato, Kengo, “Lecture notes on empirical process theory,”Lecture notes available from https://sites.google.com/site/kkatostat/home,

work page 2018
[11]

Moving to opportunity in Boston: Early results of a randomized mobility experiment,

Katz, Lawrence F, Jeffrey R Kling, and Jeffrey B Liebman, “Moving to opportunity in Boston: Early results of a randomized mobility experiment,”The Quarterly Journal of Economics, 2001, 116(2), 607–654. Kenya National Bureau of Statistics (KNBS) and ICF,Kenya Demographic and Health Survey 2022: Key Indicators Report, Nairobi, Kenya and Rockville, Maryland,...

work page 2001
[12]

Distributionally robust policy learning with wasserstein distance,

Kido, Daido, “Distributionally robust policy learning with wasserstein distance,”arXiv preprint arXiv:2205.04637,

work page arXiv
[13]

Who should be treated? empirical welfare maximization methods for treatment choice,

Kitagawa, Toru and Aleksey Tetenov, “Who should be treated? empirical welfare maximization methods for treatment choice,”Econometrica, 2018,86(2), 591–616. and , “Equality-minded treatment choice,”Journal of Business & Economic Statistics, 2021, 39(2), 561–574. , Hugo Lopez, and Jeff Rowley, “Stochastic treatment choice with empirical welfare updating,” a...

work page arXiv 2018
[14]

Leave No One Undermined: Policy Targeting with Regret Aversion

, Sokbae Lee, and Chen Qiu, “Leave No One Undermined: Policy Targeting with Regret Aversion,”arXiv preprint arXiv:2506.16430,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Bayesian inference in a class of partially identified models,

Kline, Brendan and Elie Tamer, “Bayesian inference in a class of partially identified models,” Quantitative Economics, 2016,7(2), 329–366. Kosorok, Michael R and Eric B Laber, “Precision medicine,”Annual Review of Statistics and its Application, 2019,6(1), 263–286. 22 Li, Fan, Peng Ding, and Fabrizia Mealli, “Bayesian causal inference: a critical review,”...

work page 2016
[16]

General Bayesian updating and the loss-likelihood bootstrap,

Lyddon, Simon P, Chris C Holmes, and Stephen G Walker, “General Bayesian updating and the loss-likelihood bootstrap,”Biometrika, 2019,106(2), 465–478. Manski, Charles F, “Statistical treatment rules for heterogeneous populations,”Econometrica, 2004,72(4), 1221–1246. ,Identification for prediction and decision, Harvard University Press,

work page 2019
[17]

Communicating uncertainty in policy analysis,

, “Communicating uncertainty in policy analysis,”Proceedings of the National Academy of Sciences, 2019,116(16), 7634–7641. , “Econometrics for decision making: Building foundations sketched by Haavelmo and Wald,” Econometrica, 2021,89(6), 2827–2853. , “Discourse on social planning under uncertainty,”Cambridge Books,

work page 2019
[18]

Model selection for treatment choice: Penalized welfare maximization,

Mbakop, Eric and Max Tabord-Meehan, “Model selection for treatment choice: Penalized welfare maximization,”Econometrica, 2021,89(2), 825–848. Mo, Weibin, Zhengling Qi, and Yufeng Liu, “Learning optimal distributionally robust indi- vidualized treatment rules,”Journal of the American Statistical Association, 2021,116(534), 659–674. Moon, Hyungsik Roger and...

work page arXiv 2021
[19]

RiskofBayesianinferenceinmisspecifiedmodels, andthesandwichcovariance matrix,

23 Müller, UlrichK,“RiskofBayesianinferenceinmisspecifiedmodels, andthesandwichcovariance matrix,”Econometrica, 2013,81(5), 1805–1849. Norets, Andriy and Xun Tang, “Semiparametric inference in dynamic binary choice models,” Review of Economic Studies, 2014,81(3), 1229–1262. O’Hagan, Sean and Veronika Ročková, “AI-Powered Bayesian Inference,”arXiv preprint...

work page arXiv 2013
[20]

Decision Theory for Treatment Choice Problems with Partial Identification,

Olea, José Luis Montiel, Chen Qiu, and Jörg Stoye, “Decision Theory for Treatment Choice Problems with Partial Identification,”arXiv preprint arXiv:2312.17623,

work page arXiv
[21]

On the Lower Confidence Band for the Optimal Welfare in Policy Learning,

Ponomarev, Kirill and Vira Semenova, “On the Lower Confidence Band for the Optimal Welfare in Policy Learning,”arXiv preprint arXiv:2410.07443,

work page arXiv
[22]

On robustness of individualized decision rules,

Qi, Zhengling, Jong-Shi Pang, and Yufeng Liu, “On robustness of individualized decision rules,” Journal of the American Statistical Association, 2023,118(543), 2143–2157. Qian, Min and Susan A Murphy, “Performance guarantees for individualized treatment rules,” Annals of Statistics, 2011,39(2),

work page 2023
[23]

Semiparametric Bayesian Causal Inference,

Ray, Kolyan and Aad van der Vaart, “Semiparametric Bayesian Causal Inference,”Annals of Statistics, 2020,48(5). and Aad van der vaart, “On the Bernstein-von Mises theorem for the Dirichlet process,” Electronic Journal of Statistics, 2021,15(1). Ročková, Veronika and Stéphanie van der Pas, “Posterior concentration for Bayesian regression trees and forests,...

work page 2020
[24]

The bayesian bootstrap,

Rubin, Donald B, “The bayesian bootstrap,”The Annals of Statistics, 1981, pp. 130–134. Sims, Christopher, “On an example of Larry Wasserman,”online manuscript, available from http://sims.princeton.edu/yftp/WassermanExmpl/WassermanComment.pdf, 2006,2(10). Stoye, Jörg, “Minimax regret treatment choice with finite samples,”Journal of Econometrics, 2009, 151(...

work page 1981
[25]

Policy targeting under network interference,

Viviano, Davide, “Policy targeting under network interference,”Review of Economic Studies, 2025, 92(2), 1257–1292. Walker, Christopher D, “Parametrization, prior independence, and the semiparametric Bernstein- von Mises theorem for the partially linear model,”Bernoulli, 2026,32(2), 1503–1522. , “Semiparametric Bayesian Inference for a Conditional Moment E...

work page arXiv 2025
[26]

Quantile-optimal treatment regimes,

Wang, Lan, Yu Zhou, Rui Song, and Ben Sherwood, “Quantile-optimal treatment regimes,” Journal of the American Statistical Association, 2018,113(523), 1243–1254. Whitehouse, Justin, Morgane Austern, and Vasilis Syrgkanis, “Inference on optimal policy values and other irregular functionals via smoothing,”arXiv preprint arXiv:2507.11780,

work page arXiv 2018
[27]

Convergence rates of variational posterior distributions,

Zhang, Fengshuo and Chao Gao, “Convergence rates of variational posterior distributions,”The Annals of Statistics, 2020,48(4), 2180 –

work page 2020
[28]

Offline multi-action policy learning: Generalization and optimization,

Zhou, Zhengyuan, Susan Athey, and Stefan Wager, “Offline multi-action policy learning: Generalization and optimization,”Operations Research, 2023,71(1), 148–183. 25 A Extensions A.1 Alternative Welfare Criteria The main text focuses on utilitarian welfare. More generally, alternative welfare criteria differ in the reduced-form information they require. Cr...

work page 2023
[29]

26 A.1.3 Fairness-constrained Welfare Fang et al

This criterion targets the lower tail of the realized outcome distribution directly, rather than aggregating welfare over the full distribution. 26 A.1.3 Fairness-constrained Welfare Fang et al. (2023) study welfare maximization subject to a lower bound on a specified quantile of the realized outcome distribution: max G∈G W(P ⋆ 0 ;G)subject toQ P⋆ 0 ,G(τ)...

work page 2023
[30]

,Y(J))⊥ ⊥T|X

27 (b)(Unconfoundedness)(Y(0),Y(1), . . . ,Y(J))⊥ ⊥T|X. (c)(Outcome Moments)E Q0|Y|2+δ <∞for someδ>0. (d) (Generalized Overlap)There existsκ∈(0, 1/(J+1)) such that the generalized propensity scores ej(x):=E Q0[1{T=j} |X=x] satisfy ej(x)≥κ for all j∈ {0, . . . ,J} and Q0-almost everyx∈ X. Moreover, the propensity scores are known. Under Assumption 3, welfa...

work page 2023
[31]

Again applying the union bound and (A.19) yields Π(P:|∆(P)| ≥ε | Dn) P0 →0, or equivalently, Π P: W⋆ G1(P)−W ⋆ G2(P) <ε Dn P0 →1

+ρ G2(P,P 0)≥ε ⊆ 2[ i=1 n P:ρ Gi (P,P 0)≥ ε 2 o . Again applying the union bound and (A.19) yields Π(P:|∆(P)| ≥ε | Dn) P0 →0, or equivalently, Π P: W⋆ G1(P)−W ⋆ G2(P) <ε Dn P0 →1. D.7 Proof of Proposition 1 Proof of Proposition 1.Since W⋆ G (P) does not depend onG, minimizing posterior expected loss is equivalent to maximizing posterior mean welfare: arg ...

work page 2017
[32]

50 Proof of Lemma 2.Note thatρG (P0,P n) =∥P n −P 0∥F, where F :={f(·;G):G∈ G},f(D;G) := YT e(X) − Y(1−T) 1−e(X) 1{X∈G}

:=sup G∈G |W(Q 1;G)−W(Q 2;G)| for two probability measuresQ1 and Q2. 50 Proof of Lemma 2.Note thatρG (P0,P n) =∥P n −P 0∥F, where F :={f(·;G):G∈ G},f(D;G) := YT e(X) − Y(1−T) 1−e(X) 1{X∈G}. By Lemma A.1 of Kitagawa and Tetenov (2018),F is a VC-subgraph class with VC dimension at mostv :=VC(G). Step 1: Expected supremum bound. Since F is VC-subgraph with f...

work page 2018
[33]

re-materialize

=Exp(1), •G α and{J i}n i=1 are mutually independent. Step 2: Algebraic Decomposition. LetT:=G α(D)andS:= ∑n i=1 Ji. From the properties of the Gamma distribution: T∼Gamma(M, 1),S∼Gamma(n, 1),T⊥ ⊥S. Define Vn :=T/(T+S) . By the relationship between Gamma and Beta distributions,Vn ∼ Beta(M,n). I then rewrite the integralPhas: Ph d= Gαh+ ∑n i=1 Jih(Di) T+S ...

work page 2021
[34]

,Dn) fixed yields: EbZ bGn,k F ≤ r n k ∥ ˜Ni∥2,1 ·max 1≤k≤n Eε,R 1√ k k ∑ i=1 εiδDRi F , where R1,

with˜Zi =ε iδDi, ξi =| ˜N1|, and(D1, . . . ,Dn) fixed yields: EbZ bGn,k F ≤ r n k ∥ ˜Ni∥2,1 ·max 1≤k≤n Eε,R 1√ k k ∑ i=1 εiδDRi F , where R1, . . . ,Rk are i.i.d. indices drawn uniformly from{1, . . . ,n}. Notice that for a fixedk, the expectation over the indicesR represents the average over all possible subsamples of sizek from {D1, . . . ,Dn}. It can b...

work page 2023
[35]

This concludes the proof

From the above, I conclude that: E  exp  sup k≥9 |Xk|p 6 logk !2    = Z ∞ 0 P   exp  sup k≥9 |Xk|p 6 logk !2  >t    dt ≤ 3 2 + Z ∞ 3/2 1 4t2 dt<2. This concludes the proof. Proof of Lemma 13.The proof follows from Theorem 6 of Chapter 2 in Kato (2019). Step 1: Chaining Arguments. Ifirstprovetheinequality (A.10). Withoutlossofgenerality,I...

work page 2019

[1] [1]

Accountability and flexibility in public schools: Evidence from Boston’s charters and pilots,

Abdulkadiroğlu, Atila, Joshua D Angrist, Susan M Dynarski, Thomas J Kane, and Parag A Pathak, “Accountability and flexibility in public schools: Evidence from Boston’s charters and pilots,”The Quarterly Journal of Economics, 2011,126(2), 699–748. , , Yusuke Narita, and Parag Pathak, “Breaking ties: Regression discontinuity design meets market design,”Econ...

work page arXiv 2011

[2] [2]

BayesiananalysisofDSGEmodels,

An,SungbaeandFrankSchorfheide,“BayesiananalysisofDSGEmodels,”EconometricReviews, 2007,26(2-4), 113–172. Andrews, Isaiah and Jesse M Shapiro, “Communicating scientific uncertainty via approximate posteriors,”(forthcoming) Econometrica,

work page 2007

[3] [3]

Prediction-powered inference,

Angelopoulos, Anastasios N, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zrnic, “Prediction-powered inference,”Science, 2023,382(6671), 669–674. Angrist, Joshua D., “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records,”The American Economic Review, 1990,80(3), 313–336. Angrist, Joshua D...

work page 2023

[4] [4]

arXiv preprint arXiv:2006.09676 , year=

,NiallKeleher,andJannSpiess,“Machinelearningwhotonudge: causalvspredictivetargeting in a field experiment on student financial aid renewal,”Journal of Econometrics, 2025,249, 105945. 19 , Raj Chetty, and Guido Imbens, “Combining experimental and observational data to estimate treatment effects on long term outcomes,”arXiv preprint arXiv:2006.09676, 2025,4...

work page arXiv 2025

[5] [5]

Optimal decision rules when payoffs are partially identified,

Christensen, Timothy, Hyungsik Roger Moon, and Frank Schorfheide, “Optimal decision rules when payoffs are partially identified,”arXiv preprint arXiv:2204.11748,

work page arXiv

[6] [6]

Program evaluation as a decision problem,

Dehejia, Rajeev H, “Program evaluation as a decision problem,”Journal of Econometrics, 2005, 125(1-2), 141–173. Dupas, Pascaline, “What Matters (and What Does Not) in Households’ Decision to Invest in Malaria Prevention?,”American Economic Review, May 2009,99(2), 224–30. Fang, Ethan X, Zhaoran Wang, and Lan Wang, “Fairness-oriented learning for optimal in...

work page 2005

[7] [7]

Convergence rates of posterior distributions,

, Jayanta K. Ghosh, and Aad W. van der Vaart, “Convergence rates of posterior distributions,” The Annals of Statistics, 2000,28(2), 500 –

work page 2000

[8] [8]

Robust Bayesian inference for set-identified models,

Giacomini, Raffaella and Toru Kitagawa, “Robust Bayesian inference for set-identified models,” Econometrica, 2021,89(4), 1519–1556. Goller, Daniel, Michael Lechner, Tamara Pongratz, and Joachim Wolff, “Active labor market policies for the long-term unemployed: New evidence from causal machine learning,”Labour Economics, 2025,94, 102729. Hahn, P Richard, J...

work page 2021

[9] [9]

Asymptotics for statistical treatment rules,

Hirano, Keisuke and Jack R Porter, “Asymptotics for statistical treatment rules,”Econometrica, 2009,77(5), 1683–1701. and , “Impossibility results for nondifferentiable functionals,”Econometrica, 2012,80(4), 1769–1790. Hoeffding, Wassily, “Probability inequalities for sums of bounded random variables,”Journal of the American Statistical Association, 1963,...

work page arXiv 2009

[10] [10]

Confounding-robust policy improvement,

Kallus, Nathan and Angela Zhou, “Confounding-robust policy improvement,”Advances in Neural Information Processing Systems, 2018,31. Kato, Kengo, “Lecture notes on empirical process theory,”Lecture notes available from https://sites.google.com/site/kkatostat/home,

work page 2018

[11] [11]

Moving to opportunity in Boston: Early results of a randomized mobility experiment,

Katz, Lawrence F, Jeffrey R Kling, and Jeffrey B Liebman, “Moving to opportunity in Boston: Early results of a randomized mobility experiment,”The Quarterly Journal of Economics, 2001, 116(2), 607–654. Kenya National Bureau of Statistics (KNBS) and ICF,Kenya Demographic and Health Survey 2022: Key Indicators Report, Nairobi, Kenya and Rockville, Maryland,...

work page 2001

[12] [12]

Distributionally robust policy learning with wasserstein distance,

Kido, Daido, “Distributionally robust policy learning with wasserstein distance,”arXiv preprint arXiv:2205.04637,

work page arXiv

[13] [13]

Who should be treated? empirical welfare maximization methods for treatment choice,

Kitagawa, Toru and Aleksey Tetenov, “Who should be treated? empirical welfare maximization methods for treatment choice,”Econometrica, 2018,86(2), 591–616. and , “Equality-minded treatment choice,”Journal of Business & Economic Statistics, 2021, 39(2), 561–574. , Hugo Lopez, and Jeff Rowley, “Stochastic treatment choice with empirical welfare updating,” a...

work page arXiv 2018

[14] [14]

Leave No One Undermined: Policy Targeting with Regret Aversion

, Sokbae Lee, and Chen Qiu, “Leave No One Undermined: Policy Targeting with Regret Aversion,”arXiv preprint arXiv:2506.16430,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Bayesian inference in a class of partially identified models,

Kline, Brendan and Elie Tamer, “Bayesian inference in a class of partially identified models,” Quantitative Economics, 2016,7(2), 329–366. Kosorok, Michael R and Eric B Laber, “Precision medicine,”Annual Review of Statistics and its Application, 2019,6(1), 263–286. 22 Li, Fan, Peng Ding, and Fabrizia Mealli, “Bayesian causal inference: a critical review,”...

work page 2016

[16] [16]

General Bayesian updating and the loss-likelihood bootstrap,

Lyddon, Simon P, Chris C Holmes, and Stephen G Walker, “General Bayesian updating and the loss-likelihood bootstrap,”Biometrika, 2019,106(2), 465–478. Manski, Charles F, “Statistical treatment rules for heterogeneous populations,”Econometrica, 2004,72(4), 1221–1246. ,Identification for prediction and decision, Harvard University Press,

work page 2019

[17] [17]

Communicating uncertainty in policy analysis,

, “Communicating uncertainty in policy analysis,”Proceedings of the National Academy of Sciences, 2019,116(16), 7634–7641. , “Econometrics for decision making: Building foundations sketched by Haavelmo and Wald,” Econometrica, 2021,89(6), 2827–2853. , “Discourse on social planning under uncertainty,”Cambridge Books,

work page 2019

[18] [18]

Model selection for treatment choice: Penalized welfare maximization,

Mbakop, Eric and Max Tabord-Meehan, “Model selection for treatment choice: Penalized welfare maximization,”Econometrica, 2021,89(2), 825–848. Mo, Weibin, Zhengling Qi, and Yufeng Liu, “Learning optimal distributionally robust indi- vidualized treatment rules,”Journal of the American Statistical Association, 2021,116(534), 659–674. Moon, Hyungsik Roger and...

work page arXiv 2021

[19] [19]

RiskofBayesianinferenceinmisspecifiedmodels, andthesandwichcovariance matrix,

23 Müller, UlrichK,“RiskofBayesianinferenceinmisspecifiedmodels, andthesandwichcovariance matrix,”Econometrica, 2013,81(5), 1805–1849. Norets, Andriy and Xun Tang, “Semiparametric inference in dynamic binary choice models,” Review of Economic Studies, 2014,81(3), 1229–1262. O’Hagan, Sean and Veronika Ročková, “AI-Powered Bayesian Inference,”arXiv preprint...

work page arXiv 2013

[20] [20]

Decision Theory for Treatment Choice Problems with Partial Identification,

Olea, José Luis Montiel, Chen Qiu, and Jörg Stoye, “Decision Theory for Treatment Choice Problems with Partial Identification,”arXiv preprint arXiv:2312.17623,

work page arXiv

[21] [21]

On the Lower Confidence Band for the Optimal Welfare in Policy Learning,

Ponomarev, Kirill and Vira Semenova, “On the Lower Confidence Band for the Optimal Welfare in Policy Learning,”arXiv preprint arXiv:2410.07443,

work page arXiv

[22] [22]

On robustness of individualized decision rules,

Qi, Zhengling, Jong-Shi Pang, and Yufeng Liu, “On robustness of individualized decision rules,” Journal of the American Statistical Association, 2023,118(543), 2143–2157. Qian, Min and Susan A Murphy, “Performance guarantees for individualized treatment rules,” Annals of Statistics, 2011,39(2),

work page 2023

[23] [23]

Semiparametric Bayesian Causal Inference,

Ray, Kolyan and Aad van der Vaart, “Semiparametric Bayesian Causal Inference,”Annals of Statistics, 2020,48(5). and Aad van der vaart, “On the Bernstein-von Mises theorem for the Dirichlet process,” Electronic Journal of Statistics, 2021,15(1). Ročková, Veronika and Stéphanie van der Pas, “Posterior concentration for Bayesian regression trees and forests,...

work page 2020

[24] [24]

The bayesian bootstrap,

Rubin, Donald B, “The bayesian bootstrap,”The Annals of Statistics, 1981, pp. 130–134. Sims, Christopher, “On an example of Larry Wasserman,”online manuscript, available from http://sims.princeton.edu/yftp/WassermanExmpl/WassermanComment.pdf, 2006,2(10). Stoye, Jörg, “Minimax regret treatment choice with finite samples,”Journal of Econometrics, 2009, 151(...

work page 1981

[25] [25]

Policy targeting under network interference,

Viviano, Davide, “Policy targeting under network interference,”Review of Economic Studies, 2025, 92(2), 1257–1292. Walker, Christopher D, “Parametrization, prior independence, and the semiparametric Bernstein- von Mises theorem for the partially linear model,”Bernoulli, 2026,32(2), 1503–1522. , “Semiparametric Bayesian Inference for a Conditional Moment E...

work page arXiv 2025

[26] [26]

Quantile-optimal treatment regimes,

Wang, Lan, Yu Zhou, Rui Song, and Ben Sherwood, “Quantile-optimal treatment regimes,” Journal of the American Statistical Association, 2018,113(523), 1243–1254. Whitehouse, Justin, Morgane Austern, and Vasilis Syrgkanis, “Inference on optimal policy values and other irregular functionals via smoothing,”arXiv preprint arXiv:2507.11780,

work page arXiv 2018

[27] [27]

Convergence rates of variational posterior distributions,

Zhang, Fengshuo and Chao Gao, “Convergence rates of variational posterior distributions,”The Annals of Statistics, 2020,48(4), 2180 –

work page 2020

[28] [28]

Offline multi-action policy learning: Generalization and optimization,

Zhou, Zhengyuan, Susan Athey, and Stefan Wager, “Offline multi-action policy learning: Generalization and optimization,”Operations Research, 2023,71(1), 148–183. 25 A Extensions A.1 Alternative Welfare Criteria The main text focuses on utilitarian welfare. More generally, alternative welfare criteria differ in the reduced-form information they require. Cr...

work page 2023

[29] [29]

26 A.1.3 Fairness-constrained Welfare Fang et al

This criterion targets the lower tail of the realized outcome distribution directly, rather than aggregating welfare over the full distribution. 26 A.1.3 Fairness-constrained Welfare Fang et al. (2023) study welfare maximization subject to a lower bound on a specified quantile of the realized outcome distribution: max G∈G W(P ⋆ 0 ;G)subject toQ P⋆ 0 ,G(τ)...

work page 2023

[30] [30]

,Y(J))⊥ ⊥T|X

27 (b)(Unconfoundedness)(Y(0),Y(1), . . . ,Y(J))⊥ ⊥T|X. (c)(Outcome Moments)E Q0|Y|2+δ <∞for someδ>0. (d) (Generalized Overlap)There existsκ∈(0, 1/(J+1)) such that the generalized propensity scores ej(x):=E Q0[1{T=j} |X=x] satisfy ej(x)≥κ for all j∈ {0, . . . ,J} and Q0-almost everyx∈ X. Moreover, the propensity scores are known. Under Assumption 3, welfa...

work page 2023

[31] [31]

Again applying the union bound and (A.19) yields Π(P:|∆(P)| ≥ε | Dn) P0 →0, or equivalently, Π P: W⋆ G1(P)−W ⋆ G2(P) <ε Dn P0 →1

+ρ G2(P,P 0)≥ε ⊆ 2[ i=1 n P:ρ Gi (P,P 0)≥ ε 2 o . Again applying the union bound and (A.19) yields Π(P:|∆(P)| ≥ε | Dn) P0 →0, or equivalently, Π P: W⋆ G1(P)−W ⋆ G2(P) <ε Dn P0 →1. D.7 Proof of Proposition 1 Proof of Proposition 1.Since W⋆ G (P) does not depend onG, minimizing posterior expected loss is equivalent to maximizing posterior mean welfare: arg ...

work page 2017

[32] [32]

50 Proof of Lemma 2.Note thatρG (P0,P n) =∥P n −P 0∥F, where F :={f(·;G):G∈ G},f(D;G) := YT e(X) − Y(1−T) 1−e(X) 1{X∈G}

:=sup G∈G |W(Q 1;G)−W(Q 2;G)| for two probability measuresQ1 and Q2. 50 Proof of Lemma 2.Note thatρG (P0,P n) =∥P n −P 0∥F, where F :={f(·;G):G∈ G},f(D;G) := YT e(X) − Y(1−T) 1−e(X) 1{X∈G}. By Lemma A.1 of Kitagawa and Tetenov (2018),F is a VC-subgraph class with VC dimension at mostv :=VC(G). Step 1: Expected supremum bound. Since F is VC-subgraph with f...

work page 2018

[33] [33]

re-materialize

=Exp(1), •G α and{J i}n i=1 are mutually independent. Step 2: Algebraic Decomposition. LetT:=G α(D)andS:= ∑n i=1 Ji. From the properties of the Gamma distribution: T∼Gamma(M, 1),S∼Gamma(n, 1),T⊥ ⊥S. Define Vn :=T/(T+S) . By the relationship between Gamma and Beta distributions,Vn ∼ Beta(M,n). I then rewrite the integralPhas: Ph d= Gαh+ ∑n i=1 Jih(Di) T+S ...

work page 2021

[34] [34]

,Dn) fixed yields: EbZ bGn,k F ≤ r n k ∥ ˜Ni∥2,1 ·max 1≤k≤n Eε,R 1√ k k ∑ i=1 εiδDRi F , where R1,

with˜Zi =ε iδDi, ξi =| ˜N1|, and(D1, . . . ,Dn) fixed yields: EbZ bGn,k F ≤ r n k ∥ ˜Ni∥2,1 ·max 1≤k≤n Eε,R 1√ k k ∑ i=1 εiδDRi F , where R1, . . . ,Rk are i.i.d. indices drawn uniformly from{1, . . . ,n}. Notice that for a fixedk, the expectation over the indicesR represents the average over all possible subsamples of sizek from {D1, . . . ,Dn}. It can b...

work page 2023

[35] [35]

This concludes the proof

From the above, I conclude that: E  exp  sup k≥9 |Xk|p 6 logk !2    = Z ∞ 0 P   exp  sup k≥9 |Xk|p 6 logk !2  >t    dt ≤ 3 2 + Z ∞ 3/2 1 4t2 dt<2. This concludes the proof. Proof of Lemma 13.The proof follows from Theorem 6 of Chapter 2 in Kato (2019). Step 1: Chaining Arguments. Ifirstprovetheinequality (A.10). Withoutlossofgenerality,I...

work page 2019