Leave No One Undermined: Policy Targeting with Regret Aversion

Chen Qiu; Sokbae Lee; Toru Kitagawa

arxiv: 2506.16430 · v2 · submitted 2025-06-19 · 💰 econ.EM

Leave No One Undermined: Policy Targeting with Regret Aversion

Toru Kitagawa , Sokbae Lee , Chen Qiu This is my paper

Pith reviewed 2026-05-19 08:59 UTC · model grok-4.3

classification 💰 econ.EM

keywords policy targetingregret aversionfractional assignmentdebiased estimationempirical risk minimizationtreatment effect heterogeneityexcess risk boundsasymptotic efficiency

0 comments

The pith

A regret-averse planner's optimal targeting rule is fractional when limited to a subset of observables and can be learned at rate 1/n from data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how a regret-averse policy planner should assign treatment when rich data exists but rules are restricted to a smaller set of observed characteristics. This regret concern, which penalizes unequal regret across individuals, produces an optimal rule that is fractional rather than all-or-nothing because treatment effects vary beyond what the limited observables capture. The authors develop a debiased empirical risk minimization procedure to estimate the rule and derive new upper and lower bounds on excess risk that establish a 1/n convergence rate and asymptotic efficiency in some cases. The results matter for real-world targeting where legal or cost limits prevent full personalization yet fairness in individual outcomes still counts, as shown in applications to job training and stroke treatment data.

Core claim

A regret-averse planner's optimal policy rule that can depend only on a subset of available covariates is generally fractional due to treatment effect heterogeneity beyond the conditional average treatment effects on that subset. The paper proposes a debiased empirical risk minimization approach to learn this rule from data and establishes new upper and lower bounds on the excess risk that deliver a parametric convergence rate of 1/n with asymptotic efficiency under suitable conditions. The method is illustrated on the National JTPA Study and the International Stroke Trial.

What carries the argument

The regret-averse criterion that penalizes inequality in individual regrets and produces fractional optimal rules when treatment effects are heterogeneous beyond the averages conditional on the limited observables.

If this is right

The learned rule converges to the optimal fractional assignment at the rate of 1/n.
Asymptotic efficiency is attained under the paper's stated conditions on the data distribution and model.
The procedure applies directly to existing randomized trial datasets to produce regret-aware targeting rules.
Policy rules remain feasible under constraints that limit the use of full individual-level information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same debiased estimation strategy could be adapted to other planner objectives that trade off average outcomes against outcome inequality.
In practice the fractional rules may offer a transparent way to implement partial randomization without violating assignment constraints.
Extensions to dynamic or sequential policy settings could test whether the 1/n rate persists when rules are updated over time.

Load-bearing premise

The planner's objective is accurately captured by the regret-averse criterion, which creates fractional rules precisely when treatment effect heterogeneity remains after conditioning on the subset of observables used for assignment.

What would settle it

If the estimated rule on the JTPA or stroke trial data is always all-or-nothing (probabilities of exactly zero or one) for every group or if the excess risk in finite samples declines slower than order 1/n, the central claims on fractional rules and convergence rates would be refuted.

read the original abstract

While the importance of personalized policymaking is widely recognized, fully personalized implementation remains rare in practice, often due to legal, fairness or cost concerns. We study the problem of policy targeting for a regret-averse planner when training data gives a rich set of observables while the assignment rules can only depend on its subset. Our regret-averse criterion reflects a planner's concern about regret inequality across the population. This, in general, leads to a fractional optimal rule due to treatment effect heterogeneity beyond the average treatment effects conditional on the subset of observables. We propose a debiased empirical risk minimization approach to learn the optimal rule from data and establish favorable, new upper and lower bounds for the excess risk, indicating a convergence rate of 1/n and asymptotic efficiency in certain cases. We apply our approach to the National JTPA Study and the International Stroke Trial.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Regret aversion produces fractional rules for constrained policy targeting and the debiased ERM claims a 1/n excess-risk rate that needs strong smoothness on unobserved heterogeneity to hold.

read the letter

The paper's main move is to replace standard welfare or regret minimization with a regret-aversion criterion that penalizes inequality in individual regrets. When the assignment rule is restricted to a subset of covariates, this criterion generally yields fractional treatment probabilities because of treatment-effect variation that the limited observables do not capture. They then estimate the optimal fractional rule with a debiased empirical risk minimization procedure and derive new upper and lower bounds on excess risk that deliver a 1/n rate and asymptotic efficiency under certain conditions. The two empirical applications to the JTPA study and the stroke trial are straightforward and show the method can be run on real data without obvious implementation problems.

Referee Report

1 major / 2 minor

Summary. The manuscript studies policy targeting under a regret-averse planner who must restrict assignment rules to a subset of rich observables. Regret aversion combined with treatment-effect heterogeneity beyond the observed subset generally produces a fractional optimal rule. The authors propose a debiased empirical risk minimization estimator for this rule and derive new upper and lower bounds on its excess risk that achieve a 1/n convergence rate and asymptotic efficiency in certain cases. The method is illustrated on the National JTPA Study and the International Stroke Trial.

Significance. If the claimed 1/n excess-risk bounds are valid under appropriate conditions, the paper would advance the policy-learning literature by delivering a faster rate than standard semiparametric results through debiasing, while incorporating a regret-inequality criterion. The two empirical applications demonstrate relevance to labor and medical policy settings.

major comments (1)

[Abstract and theoretical development] Abstract and theoretical development: The headline result of new upper and lower bounds yielding a 1/n excess-risk rate for the debiased ERM estimator is load-bearing for the contribution. This rate requires that the debiased objective eliminates first-order bias while the remainder is controlled by higher-order smoothness of the conditional average treatment effect outside the policy-relevant covariates. No explicit smoothness class (e.g., Hölder index β > 1 or Sobolev index) is stated. Under only Lipschitz smoothness the remainder is typically O(n^{-1/2}), which would prevent the claimed rate. Please state the precise smoothness assumption and show that it delivers the o(n^{-1}) remainder needed for the 1/n bound.

minor comments (2)

[Setup] The notation for the regret function and the distinction between the population criterion and the sample objective could be clarified with an explicit display of the population risk functional early in the setup.
[Empirical applications] In the empirical section, report the estimated fractional assignment probabilities and the implied regret-inequality measure for the learned rule to make the practical output more transparent.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. The observation regarding the need for an explicit smoothness class to support the 1/n excess-risk rate is valid, and we address it by strengthening the theoretical statements in the revision.

read point-by-point responses

Referee: [Abstract and theoretical development] Abstract and theoretical development: The headline result of new upper and lower bounds yielding a 1/n excess-risk rate for the debiased ERM estimator is load-bearing for the contribution. This rate requires that the debiased objective eliminates first-order bias while the remainder is controlled by higher-order smoothness of the conditional average treatment effect outside the policy-relevant covariates. No explicit smoothness class (e.g., Hölder index β > 1 or Sobolev index) is stated. Under only Lipschitz smoothness the remainder is typically O(n^{-1/2}), which would prevent the claimed rate. Please state the precise smoothness assumption and show that it delivers the o(n^{-1}) remainder needed for the 1/n bound.

Authors: We agree that the manuscript did not state the smoothness class with the required precision. In the revised version we add Assumption 3.2, which requires that the CATE function outside the policy-relevant covariates belongs to a Hölder ball of radius R with index β > 1. Under this condition the approximation error of the debiased objective after first-order bias removal is o_p(n^{-1}), which is shown in the new Lemma A.3 in the appendix by combining the standard Hölder approximation rate with the n^{-1} scaling of the debiasing correction. The proof of Theorem 3.1 is updated to invoke this lemma explicitly, and the abstract is revised to note the strengthened assumption. We also include a brief remark explaining why β = 1 would only deliver the slower rate the referee correctly identifies. revision: yes

Circularity Check

0 steps flagged

Debiased ERM for regret-averse policy targeting yields independent excess-risk bounds

full rationale

The paper defines a regret-averse population criterion that produces a fractional optimal rule due to treatment-effect heterogeneity beyond the observed subset. It then proposes a debiased empirical risk minimization procedure to target that external criterion and derives new upper and lower bounds on excess risk. No step in the abstract or setup reduces the claimed 1/n rate or asymptotic efficiency to a fitted quantity by construction, nor does any load-bearing claim rest on a self-citation chain whose content is itself unverified within the paper. The derivation therefore remains self-contained against the stated population objective.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; cannot enumerate specific free parameters or axioms without the full derivation. Likely relies on standard econometric assumptions such as i.i.d. sampling and correct specification of the regret function, but these are not detailed here.

pith-pipeline@v0.9.0 · 5673 in / 1072 out tokens · 25958 ms · 2026-05-19T08:59:24.258966+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a debiased empirical risk minimization approach... convergence rate of 1/n... weighted least squares problem... L(δ,τ)=E[τ²(X)(1{τ(X)≥0}-δ(W))²]
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

regret-averse criterion... fractional optimal rule due to treatment effect heterogeneity beyond... CATE(W)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Nonparametric Bayesian Policy Learning
econ.EM 2026-05 unverdicted novelty 7.0

NBPL uses a nonparametric Dirichlet process prior on the reduced-form distribution for posterior inference on optimal treatment assignments and welfare, with minimax-optimal regret convergence and pointwise consistent...