Predicting the Distribution of Treatment Effects: A Covariate-Adjustment Approach

Bruno Fava

arxiv: 2407.14635 · v4 · pith:TX2UM7ZMnew · submitted 2024-07-19 · 💰 econ.EM

Predicting the Distribution of Treatment Effects: A Covariate-Adjustment Approach

Bruno Fava This is my paper

Pith reviewed 2026-05-25 08:37 UTC · model grok-4.3

classification 💰 econ.EM

keywords distribution of treatment effectscovariate adjustmentsample splittingfinite-sample inferencemicrocreditheterogeneous effectscounterfactual prediction

0 comments

The pith

Covariate adjustment using predicted counterfactuals delivers finite-sample valid inference on points of the treatment effect distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an approach to estimate specific points on the distribution of treatment effects when individual counterfactual outcomes remain unobserved. It does so by predicting those counterfactuals from observed covariates and then constructing inference that remains valid in finite samples through sample splitting. Asymptotically valid inference follows from cross-fitting under weak conditions on the prediction step. The method is applied to five randomized microcredit trials that previously reported zero average effects, revealing that credit access helps some participants while harming others. This matters because many policy questions turn on whether gains and losses are concentrated or diffuse rather than on the mean alone.

Core claim

The central claim is that predicted counterfactuals obtained via covariate adjustment permit construction of valid confidence sets for any functional of the treatment effect distribution, with exact finite-sample coverage guaranteed by sample splitting and asymptotic validity available via cross-fitting without strong restrictions on the data-generating process.

What carries the argument

Covariate-adjusted prediction of individual counterfactual outcomes, paired with sample splitting to separate prediction and inference steps.

If this is right

Applied researchers can recover evidence of heterogeneous effects in existing randomized trials that reported only average null results.
In the microcredit setting the distribution of effects is non-degenerate even when the mean is zero, implying both winners and losers from expanded credit access.
The same procedure applies to any randomized experiment possessing baseline covariates that are predictive of potential outcomes.
Inference remains valid without requiring correct specification of the outcome model or knowledge of the full joint distribution of potential outcomes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be combined with flexible machine-learning predictors to improve precision when the covariate dimension is high.
Similar logic might extend to observational data if the covariates satisfy a conditional independence condition that justifies the counterfactual predictions.
Policymakers could use the resulting distributional estimates to design compensation or targeting rules rather than uniform program expansion.

Load-bearing premise

The covariates must be sufficiently rich and the prediction model sufficiently accurate that the predicted counterfactuals introduce no systematic bias into the estimated points of the treatment effect distribution.

What would settle it

A Monte Carlo experiment in which the proposed intervals exhibit coverage rates materially below the nominal level in finite samples despite correct sample splitting would falsify the finite-sample validity claim.

Figures

Figures reproduced from arXiv: 2407.14635 by Bruno Fava.

**Figure 2.** Figure 2: Example of data generating process that satisfies A2(iii). [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of Estimators for Lower and Upper Bounds on [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗

read the original abstract

Important questions for impact evaluation require knowledge not only of average effects, but of the distribution of treatment effects. The inability to observe individual counterfactuals makes answering these empirical questions challenging. I propose an inference approach for points of the distribution of treatment effects that uses predicted counterfactuals through covariate adjustment. I provide finite-sample valid inference using sample-splitting and asymptotically valid inference using cross-fitting under arguably weak conditions. Revisiting five randomized controlled trials on microcredit that reported null average effects, I find important distributional impacts, with some individuals helped and others harmed by the increased credit access.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims finite-sample valid inference on points of the treatment effect distribution via covariate-adjusted counterfactual predictions and sample splitting, but that guarantee looks sensitive to misspecification in the adjustment step.

read the letter

The main takeaway is a procedure that predicts individual counterfactuals from covariates, then uses sample splitting to deliver finite-sample coverage for points on the distribution of treatment effects, with cross-fitting for asymptotic results. It applies this to five microcredit RCTs that had null average effects and reports meaningful shares of people helped and harmed by the loans.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a covariate-adjustment approach that uses predicted counterfactuals to estimate and conduct inference on points of the distribution of treatment effects (DTE). It claims finite-sample valid inference for these points via sample-splitting and asymptotically valid inference via cross-fitting under weak conditions. The method is applied to five microcredit RCTs that previously reported null average treatment effects, revealing evidence of both positive and negative individual-level impacts.

Significance. If the finite-sample validity claim holds, the approach would provide a practical tool for distributional impact evaluation in randomized settings where individual counterfactuals are unobserved. The reanalysis of the microcredit trials illustrates potential empirical value by uncovering heterogeneity missed by average effects. The paper's emphasis on weak conditions and cross-fitting is a strength if substantiated, but the absence of explicit derivation details for the validity guarantees limits assessment of its contribution relative to existing methods for heterogeneous effects.

major comments (2)

[Abstract] Abstract: the finite-sample validity claim for DTE points via sample-splitting and covariate-adjusted predictions is asserted without any derivation, proof sketch, or explicit conditions on approximation error from the prediction model; this is load-bearing because, as noted in the stress-test, misspecification (e.g., omitted nonlinear heterogeneity) can introduce systematic shifts in the estimated DTE points that sample-splitting does not automatically correct.
[Abstract] Abstract: no specification is given for how prediction error from the covariate adjustment is incorporated into the finite-sample coverage guarantee or the form of the error bars; without this, it is impossible to verify whether the procedure remains valid when the prediction model is imperfect, which directly affects the central inference claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the manuscript. The points raised highlight the need for greater clarity in presenting the finite-sample validity results, and we will revise accordingly to include additional details.

read point-by-point responses

Referee: [Abstract] Abstract: the finite-sample validity claim for DTE points via sample-splitting and covariate-adjusted predictions is asserted without any derivation, proof sketch, or explicit conditions on approximation error from the prediction model; this is load-bearing because, as noted in the stress-test, misspecification (e.g., omitted nonlinear heterogeneity) can introduce systematic shifts in the estimated DTE points that sample-splitting does not automatically correct.

Authors: The abstract is necessarily brief and does not contain derivations. The main text (Section 3, Theorem 1) states the finite-sample coverage result under sample-splitting, with conditions that the prediction model is trained on an independent subsample. We acknowledge that explicit discussion of approximation error bounds and the impact of misspecification (such as omitted nonlinear terms) is limited. The procedure delivers valid inference for the DTE points induced by the chosen prediction model; misspecification alters the target estimand rather than invalidating coverage for that estimand. We will add a proof sketch and a subsection clarifying the role of approximation error and misspecification in the revised version. revision: yes
Referee: [Abstract] Abstract: no specification is given for how prediction error from the covariate adjustment is incorporated into the finite-sample coverage guarantee or the form of the error bars; without this, it is impossible to verify whether the procedure remains valid when the prediction model is imperfect, which directly affects the central inference claim.

Authors: We agree the abstract provides no such specification. The coverage guarantee in Theorem 1 relies on sample-splitting to make the prediction model independent of the evaluation sample, so that standard concentration bounds apply directly to the adjusted outcomes without further adjustment for prediction error. The error bars are formed from the empirical distribution of the split-sample adjusted effects. To address the concern, we will move an explicit description of this mechanism and the precise form of the error bars into the main text of the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: new inference procedure independent of fitted inputs

full rationale

The paper proposes a covariate-adjustment method for finite-sample valid inference on distribution of treatment effects points via sample-splitting and cross-fitting. The abstract and description present this as a methodological contribution whose validity claims rest on the splitting procedure and weak conditions rather than any quantity defined in terms of the same in-sample fits or self-citation chains. No load-bearing step reduces by construction to its inputs, and the reader's assessment of score 2 aligns with absence of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard causal inference assumptions for randomized experiments plus the adequacy of the covariate set for counterfactual prediction; no new entities are introduced and no free parameters are explicitly named in the abstract.

axioms (1)

domain assumption Random assignment in the RCTs ensures that treatment is independent of potential outcomes conditional on covariates
Implicit foundation for using covariate adjustment to recover counterfactuals in the described setting.

pith-pipeline@v0.9.0 · 5609 in / 1232 out tokens · 33215 ms · 2026-05-25T08:37:31.295408+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Trust Me, I'm a Doctor?
stat.AP 2026-05 unverdicted novelty 7.0

Sharp bounds are derived on the proportion of physicians whose personal strategies perform at least as well as the trial's better average treatment, using nested randomized and observational data from the same population.
Trust Me, I'm a Doctor?
stat.AP 2026-05 unverdicted novelty 5.0

Using nested randomized and observational data, the paper derives sharp bounds on the proportion of physicians whose personal strategies perform at least as well as the trial's better-performing treatment.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · cited by 1 Pith paper

[1]

Inference for parameters defined by moment inequalities using generalized moment selection,

Andrews, D. W. and G. Soares (2010): “Inference for parameters defined by moment inequalities using generalized moment selection,”Econometrica, 78, 119–

work page 2010
[2]

Microcredit impacts: Evi- dence from a randomized microcredit program placement experiment by Compar- tamos Banco,

Angelucci, M., D. Karlan, and J. Zinman (2015): “Microcredit impacts: Evi- dence from a randomized microcredit program placement experiment by Compar- tamos Banco,”American Economic Journal: Applied Economics, 7, 151–182. Athey, S. and G. Imbens (2016): “Recursive partitioning for heterogeneous causal effects,”Proceedings of the National Academy of Scienc...

work page arXiv 2015
[3]

Double/debiased machine learning for treat- ment and structural parameters,

Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins (2018): “Double/debiased machine learning for treat- ment and structural parameters,”The Econometrics Journal, 21, C1–C68. Chernozhukov, V., M. Demirer, E. Duflo, and I. Fernández-V al (2023): “Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogeno...

work page 2018
[4]

Microfinance’s transfor- mational potential: looking beyond average treatment effects,

29 Cuev a, R. A., A. Osman, and J. D. Speer (2024): “Microfinance’s transfor- mational potential: looking beyond average treatment effects,”Oxford Review of Economic Policy, 40, 71–81. Da vidson, J. (2021): Stochastic Limit Theory: An Introduction for Econometricians, Oxford University Press. Dvoretzky, A., J. Kiefer, and J. Wolfowitz (1956): “Asymptotic ...

work page arXiv 2024
[5]

Estimates for the distribution function of a sum of two random variables when the marginal distributions are fixed,

Makarov, G. (1982): “Estimates for the distribution function of a sum of two random variables when the marginal distributions are fixed,”Theory of Probability & its Applications, 26, 803–806. Manski, C. F. (1997): “Monotone treatment response,”Econometrica: Journal of the Econometric Society, 1311–1334. 31 Massart, P. (1990): “The tight constant in the Dv...

work page arXiv 1982
[6]

The use of covari- ate adjustment in randomized controlled trials: An overview,

v an der V aart, A. W.(1998): Asymptotic Statistics, Cambridge university press. V an Der V aart, A. W. and J. A. Wellner (2023): Weak convergence and empirical processes: with applications to statistics, Springer. 32 V an Lancker, K., F. Bretz, and O. Dukes (2023): “The use of covari- ate adjustment in randomized controlled trials: An overview,” arXiv pr...

work page arXiv 1998
[7]

ForA ∈ { L, U} and j ∈ { 0, 1}, define ZA,P (j) = I(Y (j) ≤ sA,P (X) + tA,P )

(as in A1). ForA ∈ { L, U} and j ∈ { 0, 1}, define ZA,P (j) = I(Y (j) ≤ sA,P (X) + tA,P ). Then, the asymptotic variance ofbθA (as in Definition 2(iii)) is given by: V ar[ZA,P (1)] π + V ar[ZA,P (0)] 1 − π 35 Now, if p(x) = π, the asymptotic variance ofbθB A simplifies to: V ar[ZA,P (1)] π + V ar[ZA,P (0)] 1 − π + E[ZA,P (1)] π − E[ZA,P (0)] 1 − π 2 π(1 −...

work page 2003
[8]

Letˆtmax ∈ arg maxt[ ˆF1,L(t) − ˆF0,L(t)]

For the case of the lower bound, denote forj = 0, 1 ˆFj,L(t) = 1 |I j M | X i∈Ij M I (Yi − ˆsL(Xi) ≤ t) Let IA denote the auxiliary sample, and letFj,L(t) = E[ ˆFj,L(t)|IA] for j = 0, 1 denote the true cdf ofY (j) − ˆsL(X) taking ˆsL as fixed. Letˆtmax ∈ arg maxt[ ˆF1,L(t) − ˆF0,L(t)]. Then, we have that eθL = ˆF1,L(ˆtmax) − ˆF0,L(ˆtmax). Finally, denote ...

work page 1990
[9]

Then, there exists¯θL,Pn, ¯θU,Pn with ¯θL,Pn ≤ θ∗ L,Pn ≤ θPn ≤ θ∗ U,Pn ≤ ¯θU,Pn such that √n " bθL − ¯θL,Pn bθU − ¯θU,Pn # d → N 0 0 , σ2 L σL,U σL,U σ2 U Proof of Theorem A.1

Let{Pn}n≥1 ⊂ P be a sequence of probability functions such that σ2 L,Pn σL,U,Pn σL,U,Pn σ2 U,Pn → σ2 L σL,U σL,U σ2 U for some σ2 L, σ2 U ≥ 0 and σL,U ∈ R. Then, there exists¯θL,Pn, ¯θU,Pn with ¯θL,Pn ≤ θ∗ L,Pn ≤ θPn ≤ θ∗ U,Pn ≤ ¯θU,Pn such that √n " bθL − ¯θL,Pn bθU − ¯θU,Pn # d → N 0 0 , σ2 L σL,U σL,U σ2 U Proof of Theorem A.1. For A ∈ {L, U}, define •...

work page 2021
[10]

Hence, it is enough to show that V ar   r nj K −1 X i∈Ij,k Wi(t)   = E  V ar   r nj K −1 X i∈Ij,k Wi(t) ˆsA,k     = E h V ar h Wi(t) ˆsA,k ii ≤ V ar [W (t)] → 0 where the first equality follows from the Law of Total Variance, and the second from the fact that{Wi(t)}i∈Ij,k are iid conditional onˆsA,k for any t. We have V ar [W (t)] ≤ E W (t)2 ≤...

work page 1998
[11]

The inequalities for the one-sided confidence intervals follow directly from consistency of (ˆσL, ˆσU) and from Theorem A.1, since ¯θL,Pn ≤ θ∗ L,Pn ≤ θPn ≤ θ∗ U,Pn ≤ ¯θU,Pn. For the two-sided case, the inequality follows from Theorem A.1 and consistency of (ˆσL, ˆσU , ˆσL,U) by applying Proposition 3 in Stoye (2009) centering the estimators at the outer b...

work page 2009
[12]

Predicting the Distribution of Treatment Effects: A Covariate-Adjustment Approach

Let A ∈ {L, U}. ¯θA,P − θ∗ A,P P − →0 follows since ¯θA,P,k − θ∗ A,P = [FP,1(ˆtA, ˆsA,k) − FP,0(ˆtA, ˆsA,k)] − [FP,1(tA,P , s∗ A,P ) − FP,0(tA,P , s∗ A,P )], which converges to zero by equicontinuity of P (Y (j) ≤ t) by A2(i), |ˆsA,k(X) − s∗ A,P (X)| P − →0 by A2(ii) if sA,P = s∗ A,P, and since ˆtA − tA,P P − →0 (see Step 2 in the proof of Theorem A.1). T...

work page 2009

[1] [1]

Inference for parameters defined by moment inequalities using generalized moment selection,

Andrews, D. W. and G. Soares (2010): “Inference for parameters defined by moment inequalities using generalized moment selection,”Econometrica, 78, 119–

work page 2010

[2] [2]

Microcredit impacts: Evi- dence from a randomized microcredit program placement experiment by Compar- tamos Banco,

Angelucci, M., D. Karlan, and J. Zinman (2015): “Microcredit impacts: Evi- dence from a randomized microcredit program placement experiment by Compar- tamos Banco,”American Economic Journal: Applied Economics, 7, 151–182. Athey, S. and G. Imbens (2016): “Recursive partitioning for heterogeneous causal effects,”Proceedings of the National Academy of Scienc...

work page arXiv 2015

[3] [3]

Double/debiased machine learning for treat- ment and structural parameters,

Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins (2018): “Double/debiased machine learning for treat- ment and structural parameters,”The Econometrics Journal, 21, C1–C68. Chernozhukov, V., M. Demirer, E. Duflo, and I. Fernández-V al (2023): “Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogeno...

work page 2018

[4] [4]

Microfinance’s transfor- mational potential: looking beyond average treatment effects,

29 Cuev a, R. A., A. Osman, and J. D. Speer (2024): “Microfinance’s transfor- mational potential: looking beyond average treatment effects,”Oxford Review of Economic Policy, 40, 71–81. Da vidson, J. (2021): Stochastic Limit Theory: An Introduction for Econometricians, Oxford University Press. Dvoretzky, A., J. Kiefer, and J. Wolfowitz (1956): “Asymptotic ...

work page arXiv 2024

[5] [5]

Estimates for the distribution function of a sum of two random variables when the marginal distributions are fixed,

Makarov, G. (1982): “Estimates for the distribution function of a sum of two random variables when the marginal distributions are fixed,”Theory of Probability & its Applications, 26, 803–806. Manski, C. F. (1997): “Monotone treatment response,”Econometrica: Journal of the Econometric Society, 1311–1334. 31 Massart, P. (1990): “The tight constant in the Dv...

work page arXiv 1982

[6] [6]

The use of covari- ate adjustment in randomized controlled trials: An overview,

v an der V aart, A. W.(1998): Asymptotic Statistics, Cambridge university press. V an Der V aart, A. W. and J. A. Wellner (2023): Weak convergence and empirical processes: with applications to statistics, Springer. 32 V an Lancker, K., F. Bretz, and O. Dukes (2023): “The use of covari- ate adjustment in randomized controlled trials: An overview,” arXiv pr...

work page arXiv 1998

[7] [7]

ForA ∈ { L, U} and j ∈ { 0, 1}, define ZA,P (j) = I(Y (j) ≤ sA,P (X) + tA,P )

(as in A1). ForA ∈ { L, U} and j ∈ { 0, 1}, define ZA,P (j) = I(Y (j) ≤ sA,P (X) + tA,P ). Then, the asymptotic variance ofbθA (as in Definition 2(iii)) is given by: V ar[ZA,P (1)] π + V ar[ZA,P (0)] 1 − π 35 Now, if p(x) = π, the asymptotic variance ofbθB A simplifies to: V ar[ZA,P (1)] π + V ar[ZA,P (0)] 1 − π + E[ZA,P (1)] π − E[ZA,P (0)] 1 − π 2 π(1 −...

work page 2003

[8] [8]

Letˆtmax ∈ arg maxt[ ˆF1,L(t) − ˆF0,L(t)]

For the case of the lower bound, denote forj = 0, 1 ˆFj,L(t) = 1 |I j M | X i∈Ij M I (Yi − ˆsL(Xi) ≤ t) Let IA denote the auxiliary sample, and letFj,L(t) = E[ ˆFj,L(t)|IA] for j = 0, 1 denote the true cdf ofY (j) − ˆsL(X) taking ˆsL as fixed. Letˆtmax ∈ arg maxt[ ˆF1,L(t) − ˆF0,L(t)]. Then, we have that eθL = ˆF1,L(ˆtmax) − ˆF0,L(ˆtmax). Finally, denote ...

work page 1990

[9] [9]

Then, there exists¯θL,Pn, ¯θU,Pn with ¯θL,Pn ≤ θ∗ L,Pn ≤ θPn ≤ θ∗ U,Pn ≤ ¯θU,Pn such that √n " bθL − ¯θL,Pn bθU − ¯θU,Pn # d → N 0 0 , σ2 L σL,U σL,U σ2 U Proof of Theorem A.1

Let{Pn}n≥1 ⊂ P be a sequence of probability functions such that σ2 L,Pn σL,U,Pn σL,U,Pn σ2 U,Pn → σ2 L σL,U σL,U σ2 U for some σ2 L, σ2 U ≥ 0 and σL,U ∈ R. Then, there exists¯θL,Pn, ¯θU,Pn with ¯θL,Pn ≤ θ∗ L,Pn ≤ θPn ≤ θ∗ U,Pn ≤ ¯θU,Pn such that √n " bθL − ¯θL,Pn bθU − ¯θU,Pn # d → N 0 0 , σ2 L σL,U σL,U σ2 U Proof of Theorem A.1. For A ∈ {L, U}, define •...

work page 2021

[10] [10]

Hence, it is enough to show that V ar   r nj K −1 X i∈Ij,k Wi(t)   = E  V ar   r nj K −1 X i∈Ij,k Wi(t) ˆsA,k     = E h V ar h Wi(t) ˆsA,k ii ≤ V ar [W (t)] → 0 where the first equality follows from the Law of Total Variance, and the second from the fact that{Wi(t)}i∈Ij,k are iid conditional onˆsA,k for any t. We have V ar [W (t)] ≤ E W (t)2 ≤...

work page 1998

[11] [11]

The inequalities for the one-sided confidence intervals follow directly from consistency of (ˆσL, ˆσU) and from Theorem A.1, since ¯θL,Pn ≤ θ∗ L,Pn ≤ θPn ≤ θ∗ U,Pn ≤ ¯θU,Pn. For the two-sided case, the inequality follows from Theorem A.1 and consistency of (ˆσL, ˆσU , ˆσL,U) by applying Proposition 3 in Stoye (2009) centering the estimators at the outer b...

work page 2009

[12] [12]

Predicting the Distribution of Treatment Effects: A Covariate-Adjustment Approach

Let A ∈ {L, U}. ¯θA,P − θ∗ A,P P − →0 follows since ¯θA,P,k − θ∗ A,P = [FP,1(ˆtA, ˆsA,k) − FP,0(ˆtA, ˆsA,k)] − [FP,1(tA,P , s∗ A,P ) − FP,0(tA,P , s∗ A,P )], which converges to zero by equicontinuity of P (Y (j) ≤ t) by A2(i), |ˆsA,k(X) − s∗ A,P (X)| P − →0 by A2(ii) if sA,P = s∗ A,P, and since ˆtA − tA,P P − →0 (see Step 2 in the proof of Theorem A.1). T...

work page 2009