pith. sign in

arxiv: 2606.11013 · v1 · pith:SDXXW7KTnew · submitted 2026-06-09 · 📊 stat.ME

Empirical stratification for treatment effect heterogeneity with post-treatment variables

Pith reviewed 2026-06-27 12:16 UTC · model grok-4.3

classification 📊 stat.ME
keywords treatment effect heterogeneitypost-treatment variablesprincipal stratificationempirical stratificationcausal inferencesemiparametric estimationheterogeneous treatment effects
0
0 comments X

The pith

Empirical scores from predicted potential post-treatment responses define observable subgroups with identifiable treatment effects under standard assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an empirical stratification framework to study how post-treatment variables modify treatment effects on a primary outcome. It constructs empirical scores by predicting potential post-treatment variable responses from baseline covariates alone, then forms subgroups using these scores. These subgroups allow estimation of empirical-stratum treatment effects that avoid the selection bias that arises from directly conditioning on the observed post-treatment variable. The approach connects to principal stratification, recovering its effects under principal ignorability while remaining useful when that assumption fails. Efficient estimators and projected curves are also developed for practical use.

Core claim

This paper develops an assumption-lean empirical stratification framework for characterizing treatment effect heterogeneity with respect to post-treatment variables. We define empirical scores using the predicted potential PV responses based on baseline covariates, and use the empirical scores to construct empirically accessible subgroups. The resulting empirical-stratum treatment effects (ETEs) are identifiable under standard causal assumptions. We connect the proposed framework to principal stratification by showing that the average ETE recovers principal causal effects under the principal ignorability assumption, but remains informative under violations of this assumption.

What carries the argument

Empirical scores defined from predicted potential post-treatment variable responses based on baseline covariates, used to form observable subgroups for estimating empirical-stratum treatment effects (ETEs).

If this is right

  • ETEs are identifiable under standard causal assumptions without additional restrictions.
  • The average ETE recovers principal causal effects when principal ignorability holds.
  • ETEs remain informative about treatment effect heterogeneity even when principal ignorability is violated.
  • Projected ETE curves provide a way to visualize and summarize the heterogeneity.
  • Efficient influence function-based estimators support semiparametric inference for the ETEs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework may allow practitioners to approximate principal strata effects using only baseline data in settings where full principal stratification is infeasible.
  • Extending the prediction of potential PVs to include more flexible models could improve subgroup definition in complex data.
  • Applying this to time-varying post-treatment variables might require adapting the empirical score construction.

Load-bearing premise

The empirical scores from predicted potential PV responses based on baseline covariates create subgroups without inducing endogenous selection bias when estimating treatment effects.

What would settle it

In a simulation with known principal strata and imposed principal ignorability, the average ETE should equal the principal causal effect; a significant difference would falsify the recovery result.

Figures

Figures reproduced from arXiv: 2606.11013 by Chao Cheng, Rui Wang, Yichi Zhang.

Figure 1
Figure 1. Figure 1: Directed acyclic graph for depicting the causal relationships among [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of two distributional cases for the empirical score [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Projected ETE curves in the WHO-LARES study. Panel (A) shows the EIF [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Projected ETE curves in the National Job Corps study. Panel (A) shows the [PITH_FULL_IMAGE:figures/full_fig_p030_4.png] view at source ↗
read the original abstract

Post-treatment variables (PVs), such as treatment noncompliance, behavioral responses, intercurrent events, often modify the ultimate treatment effect on the primary outcome. However, existing methods provide limited tools for studying treatment effect heterogeneity with respect to PVs. Conventional heterogeneous treatment effect estimands condition on baseline covariates. However, similarly conditioning on the observed PV can induce endogenous selection bias for the treatment effect estimation. Principal stratification offers a rigorous framework for studying principal causal effects across principal strata, but principal strata are latent and their identification often requires stringent assumptions. This paper develops an assumption-lean empirical stratification framework for characterizing treatment effect heterogeneity with respect to PVs. We define empirical scores using the predicted potential PV responses based on baseline covariates, and use the empirical scores to construct empirically accessible subgroups. The resulting empirical-stratum treatment effects (ETEs) are identifiable under standard causal assumptions. We connect the proposed framework to principal stratification by showing that the average ETE recovers principal causal effects under the principal ignorability assumption, but remains informative under violations of this assumption. We further introduce projected ETE curves and develop efficient influence function-based estimators for the semiparametric inference. We illustrate the proposed framework with two real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper proposes an empirical stratification framework for treatment effect heterogeneity involving post-treatment variables (PVs). It defines empirical scores from predicted potential PV responses based on baseline covariates to form empirically accessible subgroups, yielding empirical-stratum treatment effects (ETEs) identifiable under standard causal assumptions (consistency, positivity, no unmeasured confounding). The average ETE recovers principal causal effects under principal ignorability but remains informative otherwise; the paper also develops projected ETE curves and efficient influence function (EIF) estimators, with illustrations from two real-world applications.

Significance. If the identification arguments and EIF estimators hold, the framework provides an assumption-lean alternative to direct conditioning on observed PVs (which risks endogenous selection bias) and to principal stratification (which often requires strong assumptions like principal ignorability). It enables subgroup analyses of treatment effect heterogeneity with respect to PVs while connecting to latent strata, potentially useful in noncompliance or intercurrent event settings.

minor comments (3)
  1. [§3] §3 (or wherever the EIF derivation appears): clarify whether the nuisance estimation of the empirical score function requires cross-fitting or sample splitting to achieve the stated semiparametric efficiency, and state the rate conditions explicitly.
  2. [§2] The definition of 'empirical scores' and 'projected ETE curves' would benefit from a small numerical example or simulation in the main text to illustrate how the prediction step separates from the observed PV.
  3. [§5] In the real-data applications, report the predictive performance (e.g., R² or AUC) of the models used to construct the empirical scores, as this directly affects the quality of the resulting strata.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed summary of our empirical stratification framework and for the positive assessment of its potential as an assumption-lean approach connecting to principal stratification. The recommendation for minor revision is noted.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation defines empirical scores as functions of predicted potential post-treatment variable responses given only baseline covariates, then constructs ETEs as identifiable quantities under standard causal assumptions (consistency, positivity, no unmeasured confounding). This separation from direct conditioning on the observed post-treatment variable prevents the target estimand from reducing to its own fitted inputs by construction. The link to principal stratification invokes the external principal ignorability assumption and shows recovery only under that assumption while remaining informative otherwise; no self-citation chain, self-definitional loop, or fitted-input-renamed-as-prediction appears in the identifiability argument or EIF estimators. The framework is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Framework rests on standard causal identification assumptions and introduces new empirical constructs whose validity depends on the quality of the baseline prediction model for post-treatment responses.

axioms (1)
  • domain assumption Standard causal assumptions (consistency, positivity, no unmeasured confounding) suffice for identification of the empirical-stratum treatment effects.
    Abstract states that ETEs are identifiable under standard causal assumptions.
invented entities (2)
  • Empirical scores no independent evidence
    purpose: Predicted potential post-treatment variable responses used to form observable subgroups.
    Newly defined quantity central to the stratification procedure.
  • Empirical-stratum treatment effects (ETEs) no independent evidence
    purpose: Treatment effect estimands within the empirically constructed subgroups.
    Central new estimand of the paper.

pith-pipeline@v0.9.1-grok · 5738 in / 1374 out tokens · 26620 ms · 2026-06-27T12:16:56.413725+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

120 extracted references · 1 canonical work pages

  1. [1]

    Sociological Methodology , volume=

    Estimating heterogeneous treatment effects with observational data , author=. Sociological Methodology , volume=. 2012 , publisher=

  2. [2]

    Statistical Science , volume=

    Identification of causal effects within principal strata using auxiliary variables , author=. Statistical Science , volume=. 2021 , publisher=

  3. [3]

    Multivariate Behavioral Research , volume=

    The use of propensity scores in mediation analysis , author=. Multivariate Behavioral Research , volume=. 2011 , publisher=

  4. [4]

    Biometrika , volume=

    Principal ignorability in mediation analysis: through and beyond sequential ignorability , author=. Biometrika , volume=. 2018 , publisher=

  5. [5]

    Journal of Educational and Behavioral Statistics , volume=

    Principal score methods: Assumptions, extensions, and practical considerations , author=. Journal of Educational and Behavioral Statistics , volume=. 2017 , publisher=

  6. [6]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Principal stratification analysis using principal scores , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2017 , publisher=

  7. [7]

    Statistics in medicine , volume=

    On the use of propensity scores in principal causal effect estimation , author=. Statistics in medicine , volume=. 2009 , publisher=

  8. [8]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Multiply robust estimation of causal effects under principal ignorability , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=

  9. [9]

    Journal of Clinical Epidemiology , volume=

    Considerations when assessing heterogeneity of treatment effect in patient-centered outcomes research , author=. Journal of Clinical Epidemiology , volume=. 2018 , publisher=

  10. [10]

    Journal of the American Statistical Association , volume=

    Identification of causal effects using instrumental variables , author=. Journal of the American Statistical Association , volume=. 1996 , publisher=

  11. [11]

    Primary and secondary outcome reporting in randomized trials:

    Pocock, Stuart J and Rossello, Xavier and Owen, Ruth and Collier, Tim J and Stone, Gregg W and Rockhold, Frank W , journal=. Primary and secondary outcome reporting in randomized trials:. 2021 , publisher=

  12. [12]

    2013 , publisher=

    Public policy in an uncertain world: analysis and decisions , author=. 2013 , publisher=

  13. [13]

    bmj , volume=

    The estimands framework: a primer on the ICH E9 (R1) addendum , author=. bmj , volume=. 2024 , publisher=

  14. [14]

    Journal of the American Statistical Association , volume=

    Estimation and inference of heterogeneous treatment effects using random forests , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

  15. [15]

    Proceedings of the National Academy of Sciences , volume=

    Metalearners for estimating heterogeneous treatment effects using machine learning , author=. Proceedings of the National Academy of Sciences , volume=. 2019 , publisher=

  16. [16]

    Biometrika , volume=

    Quasi-oracle estimation of heterogeneous treatment effects , author=. Biometrika , volume=. 2021 , publisher=

  17. [17]

    The Econometrics Journal , volume=

    Debiased machine learning of conditional average treatment effects and other causal functions , author=. The Econometrics Journal , volume=. 2021 , publisher=

  18. [18]

    Electronic Journal of Statistics , volume=

    Towards optimal doubly robust estimation of heterogeneous causal effects , author=. Electronic Journal of Statistics , volume=. 2023 , publisher=

  19. [19]

    Econometrica , volume=

    Fisher--Schultz Lecture: Generic machine learning inference on heterogeneous treatment effects in randomized experiments, with an application to immunization in India , author=. Econometrica , volume=. 2025 , publisher=

  20. [20]

    American Journal of Political Science , volume=

    How conditioning on posttreatment variables can ruin your experiment and what to do about it , author=. American Journal of Political Science , volume=. 2018 , publisher=

  21. [21]

    Annual Review of Sociology , volume=

    Endogenous selection bias: The problem of conditioning on a collider variable , author=. Annual Review of Sociology , volume=. 2014 , publisher=

  22. [22]

    Journal of Causal Inference , volume=

    Conditioning on post-treatment variables , author=. Journal of Causal Inference , volume=. 2015 , publisher=

  23. [23]

    Biometrics , volume=

    Principal stratification in causal inference , author=. Biometrics , volume=. 2002 , publisher=

  24. [24]

    The International Journal of Biostatistics , volume=

    Principal stratification—uses and limitations , author=. The International Journal of Biostatistics , volume=

  25. [25]

    The International Journal of Biostatistics , volume=

    Principal stratification—a goal or a tool? , author=. The International Journal of Biostatistics , volume=

  26. [26]

    Econometrica , volume=

    Identification and Estimation of Local Average Treatment Effects , author=. Econometrica , volume=

  27. [27]

    Statistical Methods in Medical Research , volume=

    Assessing the sensitivity of methods for estimating principal causal effects , author=. Statistical Methods in Medical Research , volume=. 2015 , publisher=

  28. [28]

    Statistics in Biopharmaceutical Research , volume=

    Chasing shadows: how implausible assumptions skew our understanding of causal estimands , author=. Statistics in Biopharmaceutical Research , volume=. 2025 , publisher=

  29. [29]

    Journal of Health and Social Behavior , pages=

    Impact of a preventive job search intervention on the likelihood of depression among the unemployed , author=. Journal of Health and Social Behavior , pages=. 1992 , publisher=

  30. [30]

    American Journal of Epidemiology , volume=

    Eliminating ambiguous treatment effects using estimands , author=. American Journal of Epidemiology , volume=. 2023 , publisher=

  31. [31]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=

    Identification and multiply robust estimation in causal mediation analysis across principal strata , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=. 2025 , publisher=

  32. [32]

    American Economic Journal: Applied Economics , volume=

    Bridging the intention-behavior gap? The effect of plan-making prompts on job search and employment , author=. American Economic Journal: Applied Economics , volume=. 2019 , publisher=

  33. [33]

    Annals of Internal Medicine , volume=

    Individual-versus group-based financial incentives for weight loss: a randomized, controlled trial , author=. Annals of Internal Medicine , volume=. 2013 , publisher=

  34. [34]

    New England Journal of Medicine , volume=

    Comparative effectiveness of aspirin dosing in cardiovascular disease , author=. New England Journal of Medicine , volume=. 2021 , publisher=

  35. [35]

    Annals of Applied Statistics (in press) , year=

    Multiply robust estimation for causal survival analysis with treatment noncompliance , author=. Annals of Applied Statistics (in press) , year=

  36. [36]

    The Annals of Applied Statistics , pages=

    Exploiting multiple outcomes in Bayesian principal stratification analysis with application to the evaluation of a job training program , author=. The Annals of Applied Statistics , pages=. 2013 , publisher=

  37. [37]

    Causality: Models, Reasoning, and Inference , publisher=

    Pearl, Judea , year=. Causality: Models, Reasoning, and Inference , publisher=

  38. [38]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Principal stratification with continuous post-treatment variables: Nonparametric identification and semiparametric estimation , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2026 , publisher=

  39. [39]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Semiparametric localized principal stratification analysis with continuous strata , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

  40. [40]

    Biometrika , volume=

    The central role of the propensity score in observational studies for causal effects , author=. Biometrika , volume=. 1983 , publisher=

  41. [41]

    Statistica Sinica (in press) , year=

    Semiparametric principal stratification analysis beyond monotonicity , author=. Statistica Sinica (in press) , year=

  42. [42]

    Journal of the American Statistical Association , volume=

    On the effect of treatment among would-be treatment compliers: An analysis of the multiple risk factor intervention trial , author=. Journal of the American Statistical Association , volume=. 2000 , publisher=

  43. [43]

    Biostatistics , volume=

    The compliance score as a regressor in randomized trials , author=. Biostatistics , volume=. 2003 , publisher=

  44. [44]

    The Annals of Applied Statistics , volume=

    Assessing treatment effect through compliance score in randomized trials with noncompliance , author=. The Annals of Applied Statistics , volume=. 2022 , publisher=

  45. [46]

    Targeted learning: causal inference for observational and experimental data , pages=

    Cross-validated targeted minimum-loss-based estimation , author=. Targeted learning: causal inference for observational and experimental data , pages=. 2011 , publisher=

  46. [47]

    The Annals of Statistics , volume=

    Sharp instruments for classifying compliers and generalizing causal effects , author=. The Annals of Statistics , volume=

  47. [48]

    Journal of the American statistical association , volume=

    Comment: Which ifs have causal answers , author=. Journal of the American statistical association , volume=. 1986 , publisher=

  48. [49]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Robust causal inference with continuous instruments using the local instrumental variable curve , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2019 , publisher=

  49. [50]

    Biometrics , volume=

    Instrumented difference-in-differences , author=. Biometrics , volume=. 2023 , publisher=

  50. [51]

    Machine learning , volume=

    Random forests , author=. Machine learning , volume=. 2001 , publisher=

  51. [52]

    Annals of statistics , pages=

    Greedy function approximation: a gradient boosting machine , author=. Annals of statistics , pages=. 2001 , publisher=

  52. [53]

    1993 , publisher=

    Efficient and adaptive estimation for semiparametric models , author=. 1993 , publisher=

  53. [54]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Covariate-assisted bounds on causal effects with instrumental variables , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

  54. [55]

    Annals of Statistics , volume=

    Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy , author=. Annals of Statistics , volume=

  55. [56]

    The Annals of Statistics , volume=

    Optimal aggregation of classifiers in statistical learning , author=. The Annals of Statistics , volume=. 2004 , publisher=

  56. [57]

    arXiv preprint arXiv:1503.06388 , year=

    Adaptive concentration of regression trees, with application to random forests , author=. arXiv preprint arXiv:1503.06388 , year=

  57. [58]

    Journal of Machine Learning Research , volume=

    High-dimensional L2-boosting: Rate of Convergence , author=. Journal of Machine Learning Research , volume=

  58. [59]

    Electronic Journal of Statistics , volume=

    Gaussian copula marginal regression , author=. Electronic Journal of Statistics , volume=

  59. [60]

    International Journal of Epidemiology , volume=

    Practical considerations for specifying a super learner , author=. International Journal of Epidemiology , volume=. 2023 , publisher=

  60. [61]

    American journal of public health , volume=

    Dampness and mold in the home and depression: an examination of mold-related illness and perceived control of one’s home as possible depression pathways , author=. American journal of public health , volume=. 2007 , publisher=

  61. [62]

    The American Statistician , volume=

    Demystifying statistical learning based on efficient influence functions , author=. The American Statistician , volume=. 2022 , publisher=

  62. [63]

    Handbook of statistical methods for precision medicine , pages=

    Semiparametric doubly robust targeted double machine learning: a review , author=. Handbook of statistical methods for precision medicine , pages=. 2024 , publisher=

  63. [64]

    2000 , publisher=

    Asymptotic statistics , author=. 2000 , publisher=

  64. [65]

    Epidemiology , volume=

    Conditioning on intermediates in perinatal epidemiology , author=. Epidemiology , volume=. 2012 , publisher=

  65. [66]

    Review of Economics and Statistics , volume=

    Endogenous stratification in randomized experiments , author=. Review of Economics and Statistics , volume=. 2018 , publisher=

  66. [67]

    arXiv preprint arXiv:2507.12673 , year=

    Semiparametric Learning of Integral Functionals on Submanifolds , author=. arXiv preprint arXiv:2507.12673 , year=

  67. [68]

    Biometrika , volume=

    Asymptotic inference of causal effects with observational studies trimmed by the estimated propensity scores , author=. Biometrika , volume=. 2018 , publisher=

  68. [69]

    Epidemiology , volume=

    Marginal structural models and causal inference in epidemiology , author=. Epidemiology , volume=. 2000 , publisher=

  69. [70]

    Biometrics , volume=

    Principal stratification analysis of noncompliance with time-to-event outcomes , author=. Biometrics , volume=. 2024 , publisher=

  70. [71]

    2001 , publisher=

    National Job Corps Study: The impacts of Job Corps on participants' employment and related outcomes , author=. 2001 , publisher=

  71. [72]

    Journal of the American Statistical Association , volume=

    Evaluating the effect of training on wages in the presence of noncompliance, nonemployment, and missing outcome data , author=. Journal of the American Statistical Association , volume=. 2012 , publisher=

  72. [73]

    Journal of Business & Economic Statistics , volume=

    Bounds on treatment effects in the presence of sample selection and noncompliance: the wage effects of job corps , author=. Journal of Business & Economic Statistics , volume=. 2015 , publisher=

  73. [74]

    M., and West, M

    Abadie, A., Chingos, M. M., and West, M. R. (2018), Endogenous stratification in randomized experiments, Review of Economics and Statistics\/ , 100, 567--580

  74. [75]

    D., Imbens, G

    Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996), Identification of causal effects using instrumental variables, Journal of the American Statistical Association\/ , 91, 444--455

  75. [76]

    J., Klaassen, C

    Bickel, P. J., Klaassen, C. A., Bickel, P. J., Ritov, Y., Klaassen, J., Wellner, J. A., and Ritov, Y. (1993), Efficient and adaptive estimation for semiparametric models\/ , volume 4, Springer

  76. [77]

    (2001), Random forests, Machine learning\/ , 45, 5--32

    Breiman, L. (2001), Random forests, Machine learning\/ , 45, 5--32

  77. [78]

    and Flores, C

    Chen, X. and Flores, C. A. (2015), Bounds on treatment effects in the presence of sample selection and noncompliance: the wage effects of job corps, Journal of Business & Economic Statistics\/ , 33, 523--540

  78. [79]

    and Li, F

    Cheng, C. and Li, F. (2025), Identification and multiply robust estimation in causal mediation analysis across principal strata, Journal of the Royal Statistical Society Series B: Statistical Methodology\/ , qkaf037

  79. [80]

    Rama Cont and Jean-Philippe Bouchaud

    Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018), Double/debiased machine learning for treatment and structural parameters, The Econometrics Journal\/ , 21, C1--C68, ://doi.org/10.1111/ectj.12097

  80. [81]

    Chernozhukov, V., Demirer, M., Duflo, E., and Fernandez-Val, I. (2025), Fisher--Schultz Lecture: Generic machine learning inference on heterogeneous treatment effects in randomized experiments, with an application to immunization in India, Econometrica\/ , 93, 1177--1181

Showing first 80 references.