NBPL uses a nonparametric Dirichlet process prior on the reduced-form distribution for posterior inference on optimal treatment assignments and welfare, with minimax-optimal regret convergence and pointwise consistent policy class comparisons.
Leave No One Undermined: Policy Targeting with Regret Aversion
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
While the importance of personalized policymaking is widely recognized, fully personalized implementation remains rare in practice, often due to legal, fairness or cost concerns. We study the problem of policy targeting for a regret-averse planner when training data gives a rich set of observables while the assignment rules can only depend on its subset. Our regret-averse criterion reflects a planner's concern about regret inequality across the population. This, in general, leads to a fractional optimal rule due to treatment effect heterogeneity beyond the average treatment effects conditional on the subset of observables. We propose a debiased empirical risk minimization approach to learn the optimal rule from data and establish favorable, new upper and lower bounds for the excess risk, indicating a convergence rate of 1/n and asymptotic efficiency in certain cases. We apply our approach to the National JTPA Study and the International Stroke Trial.
fields
econ.EM 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Nonparametric Bayesian Policy Learning
NBPL uses a nonparametric Dirichlet process prior on the reduced-form distribution for posterior inference on optimal treatment assignments and welfare, with minimax-optimal regret convergence and pointwise consistent policy class comparisons.