pith. sign in

arxiv: 2509.09773 · v2 · submitted 2025-09-11 · 📊 stat.ME · math.ST· stat.TH

Optimal Inference of the Mean Outcome under Optimal Treatment Regime

Pith reviewed 2026-05-18 17:14 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH
keywords optimal treatment regimemean outcome inferenceasymptotic variance lower boundadaptive smoothingnonregularitysemiparametric efficiencyrobust estimationcausal inference
0
0 comments X

The pith

Adaptive smoothing produces an estimator that reaches the asymptotic variance lower bound for inferring the mean outcome under an optimal treatment regime.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a procedure for estimating and inferring the mean outcome when an optimal treatment regime is applied. It uses adaptive smoothing on the estimated regime to avoid bias from potential non-uniqueness of the regime. The method is shown to be optimal by establishing a lower bound on the asymptotic variance of any robust asymptotically linear unbiased estimator and demonstrating that the proposed estimator attains this bound. This framework supports efficiency results in nonregular settings and is applied to reanalyze data from a clinical trial.

Core claim

By employing adaptive smoothing over the estimated optimal treatment regime, the authors construct a robust asymptotically linear unbiased estimator for the mean outcome under the OTR. They derive the lower bound of its asymptotic variance and verify that their estimator achieves this bound, thereby establishing optimality irrespective of regularity conditions on the OTR.

What carries the argument

The adaptive smoothing mechanism applied to the estimated optimal treatment regime, enabling a general class of robust asymptotically linear unbiased estimators to achieve the variance lower bound.

If this is right

  • The inference remains valid and unbiased even if the optimal treatment regime is not uniquely identified.
  • The estimator attains the minimal possible asymptotic variance within the class of robust asymptotically linear unbiased estimators.
  • This paves the way for establishing semiparametric efficiency bounds in more general nonregular OTR inference problems.
  • The method is applicable to real-world data, as shown by the re-analysis of the ACTG 175 trial.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Techniques like this could be adapted for inference on other nonregular causal effects in observational studies.
  • Future work might explore how this bound behaves under different data distributions or with high-dimensional covariates.
  • The optimality result suggests that smoothing can be a general strategy for handling discontinuities in treatment effect models.

Load-bearing premise

The proof depends on the assumption that there exists a broad class of robust asymptotically linear unbiased estimators whose asymptotic properties can be characterized without depending on the details of the smoothing procedure.

What would settle it

Finding an alternative estimator within the robust class that has a smaller asymptotic variance while remaining unbiased and asymptotically linear would falsify the claim that the proposed method achieves the lower bound.

Figures

Figures reproduced from arXiv: 2509.09773 by 2), (2) Division of Biostatistics, Berkeley, CA, China, Clear Water Bay, Hong Kong, School of Public Health, Shuoxun Xu (1, Technology, The Hong Kong University of Science, University of California, USA), Xinzhou Guo (1) ((1) Department of Mathematics.

Figure 1
Figure 1. Figure 1: Illustration of the proposed smoothing decision function [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The boxplot of the plug-in estimator in Eq. ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The left panel is τb(male), the middle panel is db(male), and the right panel is ds(male) as sample size increases in Example 1. 2.4 Proposed Method: Adaptive Smoothing Built on the smoothing estimator, we introduce the proposed method, adaptive smoothing, for inference on V0, the mean outcome under OTR. The key element of the proposed method is an adaptive adjustment of the asymmetric 7 [PITH_FULL_IMAGE:… view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the proposed adaptive smoothing decision function with different propensity scores. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Relationships between efficient regular estimators, regular estimators and the proposed robust asymptot [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

When an optimal treatment regime (OTR) is considered, we need to evaluate the OTR in a valid and efficient way. The classical inference applied to the mean outcome under OTR, assuming the OTR is the same as the estimated OTR, might be biased when the regularity assumption that OTR is unique is violated. Although several methods have been proposed to allow nonregularity in such inference, its optimality is unclear due to challenges in deriving semiparametric efficiency bounds under potential nonregularity. In this paper, we address the bias issue via adaptive smoothing over the estimated OTR and develop a valid inference procedure on the mean outcome under OTR regardless of whether regularity is satisfied. We establish the optimality of the proposed method by deriving a lower bound of the asymptotic variance for the robust asymptotically linear unbiased estimator to the mean outcome under OTR and showing that our proposed estimator achieves the variance lower bound. The considered estimator class is general and the derived variance lower bound paves a novel way to establish efficiency optimality theories for OTR in a more general scenario allowing nonregularity. The merit of the proposed method is demonstrated by re-analyzing the ACTG 175 trial.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an adaptive smoothing procedure over the estimated optimal treatment regime (OTR) to obtain valid inference for the mean outcome under the OTR, even when the OTR is non-unique. It derives a lower bound on the asymptotic variance that applies to the general class of robust asymptotically linear unbiased estimators for this functional and shows that the proposed estimator attains the bound, thereby establishing optimality. The method is illustrated via re-analysis of the ACTG 175 trial.

Significance. If the lower-bound derivation and attainment result are correct, the work supplies a concrete efficiency benchmark for nonregular OTR inference that has been missing from the literature; this could serve as a template for establishing optimality in other nonregular causal functionals.

major comments (2)
  1. [Abstract and lower-bound derivation section] Abstract and the section deriving the lower bound: the claim that the asymptotic variance lower bound holds for a general class of robust asymptotically linear unbiased estimators whose behavior can be characterized independently of the smoothing procedure is load-bearing for the optimality conclusion. The adaptive smoothing step (introduced to restore regularity when the OTR is non-unique) selects a data-dependent bandwidth that may enter the influence function; without an explicit uniformity argument showing that this dependence does not alter the class of allowable influence functions, the bound may not automatically apply to the proposed estimator.
  2. [Attainment / optimality section] The section establishing attainment of the bound: the proof that the adaptive-smoothing estimator belongs to the class and achieves the variance lower bound must verify that the smoothing parameter choice preserves asymptotic linearity uniformly over the neighborhoods where non-uniqueness occurs. If the linearity expansion depends on the rate at which the smoothing parameter vanishes, the attainment step requires additional technical conditions that are not yet visible in the abstract.
minor comments (2)
  1. [Method section] The notation distinguishing the estimated OTR from the true OTR could be made more consistent across the bias-correction and smoothing steps.
  2. [Numerical studies] The simulation study would be strengthened by reporting coverage probabilities under both regular and non-regular regimes side-by-side.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments. We provide point-by-point responses to the major comments below, and we plan to incorporate clarifications and additional technical details in a revised version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract and lower-bound derivation section] Abstract and the section deriving the lower bound: the claim that the asymptotic variance lower bound holds for a general class of robust asymptotically linear unbiased estimators whose behavior can be characterized independently of the smoothing procedure is load-bearing for the optimality conclusion. The adaptive smoothing step (introduced to restore regularity when the OTR is non-unique) selects a data-dependent bandwidth that may enter the influence function; without an explicit uniformity argument showing that this dependence does not alter the class of allowable influence functions, the bound may not automatically apply to the proposed estimator.

    Authors: We are grateful to the referee for identifying this key point regarding the applicability of the lower bound to our estimator. The class of robust asymptotically linear unbiased estimators is defined such that their influence functions can be characterized independently of any particular smoothing procedure. Our adaptive smoothing method is constructed to ensure that the data-dependent bandwidth selection does not affect the limiting influence function, as the bandwidth converges to zero at an appropriate rate. To strengthen the argument, we will add an explicit uniformity argument in the revised manuscript to demonstrate that the dependence on the smoothing procedure does not alter the class of allowable influence functions. This will be detailed in the section deriving the lower bound. revision: yes

  2. Referee: [Attainment / optimality section] The section establishing attainment of the bound: the proof that the adaptive-smoothing estimator belongs to the class and achieves the variance lower bound must verify that the smoothing parameter choice preserves asymptotic linearity uniformly over the neighborhoods where non-uniqueness occurs. If the linearity expansion depends on the rate at which the smoothing parameter vanishes, the attainment step requires additional technical conditions that are not yet visible in the abstract.

    Authors: We acknowledge the referee's concern about the need for uniform preservation of asymptotic linearity in the attainment proof. Our current development shows that the smoothing parameter vanishes at a rate sufficient to maintain the asymptotic expansion uniformly over the relevant neighborhoods. We agree that making the dependence on the vanishing rate explicit would improve clarity. In the revised manuscript, we will include additional technical conditions on the rate at which the smoothing parameter vanishes and provide a detailed uniform asymptotic linearity expansion in the attainment section to confirm that the proposed estimator achieves the variance lower bound. revision: yes

Circularity Check

0 steps flagged

Variance lower bound derived for general estimator class independently of specific adaptive smoothing

full rationale

The paper derives a lower bound on asymptotic variance for the broad class of robust asymptotically linear unbiased estimators to the mean outcome under OTR, then separately verifies that the proposed adaptive-smoothing estimator belongs to this class and attains the bound. The abstract explicitly states that the class is general and that its asymptotic behavior can be characterized independently of the specific smoothing procedure. No equations or steps in the provided description reduce the bound to a fit from the same data, a self-citation chain, or a redefinition of the proposed estimator itself. The derivation is therefore self-contained against external benchmarks for the class.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; the paper implicitly relies on standard semiparametric regularity conditions for OTR estimation and on the existence of a broad class of asymptotically linear estimators, but no explicit free parameters, ad-hoc axioms, or invented entities are described.

axioms (1)
  • domain assumption Standard semiparametric assumptions allowing consistent estimation of the optimal treatment regime and asymptotic linearity of estimators
    Required for deriving efficiency bounds and for the adaptive smoothing to be valid under both regular and nonregular cases.

pith-pipeline@v0.9.0 · 5794 in / 1297 out tokens · 47831 ms · 2026-05-18T17:14:02.336487+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    C. Ai, Y. Fang, and H. Xie. Data-driven policy learning for a continuous treatment. arXiv preprint arXiv:2402.02535, 2024

  2. [2]

    Audibert and A

    J.-Y. Audibert and A. B. Tsybakov. Fast learning rates for plug-in classifiers. 2007

  3. [3]

    Beutner and H

    E. Beutner and H. Z \"a hle. Functional delta-method for the bootstrap of quasi-hadamard differentiable functionals. 2016

  4. [4]

    P. J. Bickel, C. A. Klaassen, P. J. Bickel, Y. Ritov, J. Klaassen, J. A. Wellner, and Y. Ritov. Efficient and adaptive estimation for semiparametric models, volume 4. Springer, 1993

  5. [5]

    S. W. Cardoso, T. S. Torres, M. Santini-Oliveira, L. M. S. Marins, V. G. Veloso, and B. Grinsztejn. Aging with hiv: a practical review. Brazilian Journal of Infectious Diseases, 17: 0 464--479, 2013

  6. [6]

    Chakraborty and S

    B. Chakraborty and S. A. Murphy. Dynamic treatment regimes. Annual review of statistics and its application, 1: 0 447--464, 2014

  7. [7]

    Chakraborty, S

    B. Chakraborty, S. Murphy, and V. Strecher. Inference for non-regular parameters in optimal dynamic treatment regimes. Statistical methods in medical research, 19 0 (3): 0 317--343, 2010

  8. [8]

    Chakraborty, E

    B. Chakraborty, E. B. Laber, and Y.-Q. Zhao. Inference about the expected performance of a data-driven dynamic treatment regime. Clinical Trials, 11 0 (4): 0 408--417, 2014

  9. [9]

    Chernozhukov, D

    V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters, 2018

  10. [10]

    P. Ding. A first course in causal inference. arXiv preprint arXiv:2305.18793, 2023

  11. [11]

    C. Fan, W. Lu, R. Song, and Y. Zhou. Concordance-assisted learning for estimating optimal individualized treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79 0 (5): 0 1565--1582, 2017

  12. [12]

    Inference on Directionally Differentiable Functions

    Z. Fang and A. Santos. Inference on directionally differentiable functions. arXiv preprint arXiv:1404.3763, 2014

  13. [13]

    Farahmandpour, N

    S. Farahmandpour, N. Dehghani, A. Khalkhalizadeh, P. Hajihossein, and A. Nikdehqan. Personalized medicine for hiv control: A systematic review study. Personalized Medicine Journal, 8 0 (30): 0 9--16, 2023

  14. [14]

    P. G de, H. Lund-Andersen, H.-H. Parving, and O. Pedersen. Effect of a multifactorial intervention on mortality in type 2 diabetes. New England Journal of Medicine, 358 0 (6): 0 580--591, 2008

  15. [15]

    Searching results of google scholar

    Google. Searching results of google scholar. https://scholar.google.com/scholar?hl=en&as_sdt=0 Accessed: 2025-08-031

  16. [16]

    S. R. Group. A randomized trial of intensive versus standard blood-pressure control. New England Journal of Medicine, 373 0 (22): 0 2103--2116, 2015

  17. [17]

    S. M. Hammer, D. A. Katzenstein, M. D. Hughes, H. Gundacker, R. T. Schooley, R. H. Haubrich, W. K. Henry, M. M. Lederman, J. P. Phair, M. Niu, et al. A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine, 335 0 (15): 0 1081--1090, 1996

  18. [18]

    Illenberger, A

    N. Illenberger, A. J. Spieker, and N. Mitra. Identifying optimally cost-effective dynamic treatment regimes with a q-learning approach. Journal of the Royal Statistical Society Series C: Applied Statistics, 72 0 (2): 0 434--449, 2023

  19. [19]

    E. H. Kennedy. Semiparametric theory. arXiv preprint arXiv:1709.06418, 2017

  20. [20]

    P. W. Lavori and R. Dawson. Dynamic treatment regimes: practical design considerations. Clinical trials, 1 0 (1): 0 9--20, 2004

  21. [21]

    Z. Li, J. Chen, E. Laber, F. Liu, and R. Baumgartner. Optimal treatment regimes: a review and empirical comparison. International Statistical Review, 91 0 (3): 0 427--463, 2023

  22. [22]

    A. R. Luedtke and M. J. Van Der Laan. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Annals of statistics, 44 0 (2): 0 713, 2016

  23. [23]

    D. C. Malone, L. E. Hines, and J. S. Graff. The good, the bad, and the different: a primer on aspects of heterogeneity of treatment effects. Journal of Managed Care Pharmacy, 20 0 (6): 0 555--563, 2014

  24. [24]

    E. E. Moodie, B. Chakraborty, and M. S. Kramer. Q-learning for estimating optimal dynamic treatment rules from observational data. Canadian Journal of Statistics, 40 0 (4): 0 629--645, 2012

  25. [25]

    Orellana, A

    L. Orellana, A. Rotnitzky, and J. M. Robins. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part i: main content. The international journal of biostatistics, 6 0 (2), 2010 a

  26. [26]

    Orellana, A

    L. Orellana, A. Rotnitzky, and J. M. Robins. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part ii: proofs of results. The international journal of biostatistics, 6 0 (2), 2010 b

  27. [27]

    Research

    P. Research. Download sample personalized medicine market size to hit usd 1,233.23 billion by 2033. https://www.precedenceresearch.com/sample/1491, 2024. Accessed: 2024-07-08

  28. [28]

    J. M. Robins. Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium in Biostatistics: analysis of correlated data, pages 189--326. Springer, 2004

  29. [29]

    P. E. Sax, C. Tierney, A. C. Collier, M. A. Fischl, K. Mollan, L. Peeples, C. Godfrey, N. C. Jahed, L. Myers, D. Katzenstein, et al. Abacavir--lamivudine versus tenofovir--emtricitabine for initial hiv-1 therapy. New England Journal of Medicine, 361 0 (23): 0 2230--2240, 2009

  30. [30]

    X. Shen, D. Wolfe, and S. Zhou. Local asymptotics for regression splines and confidence regions. The annals of statistics, 26 0 (5): 0 1760--1782, 1998

  31. [31]

    C. Shi, W. Lu, and R. Song. Breaking the curse of nonregularity with subagging: inference of the mean outcome under optimal treatment regimes. Journal of Machine Learning Research, 21, 2020

  32. [32]

    J. A. Sparano, R. J. Gray, D. F. Makower, K. I. Pritchard, K. S. Albain, D. F. Hayes, C. E. Geyer Jr, E. C. Dees, M. P. Goetz, J. A. Olson Jr, et al. Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer. New England Journal of Medicine, 379 0 (2): 0 111--121, 2018

  33. [33]

    Thomassen, S

    D. Thomassen, S. Cessie, H. C. Houwelingen, and E. W. Steyerberg. Effective sample size: A measure of individual uncertainty in predictions. Statistics in medicine., 0 (7): 0 43, 2024

  34. [34]

    M. J. Van der Laan and A. R. Luedtke. Targeted learning of an optimal dynamic treatment, and statistical inference for its mean outcome. 2014

  35. [35]

    M. J. Van der Laan, S. Rose, et al. Targeted learning: causal inference for observational and experimental data, volume 4. Springer, 2011

  36. [36]

    A. W. Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000

  37. [37]

    Wager and S

    S. Wager and S. Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113 0 (523): 0 1228--1242, 2018

  38. [38]

    L. Wang, A. Rotnitzky, X. Lin, R. E. Millikan, and P. F. Thall. Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. Journal of the American Statistical Association, 107 0 (498): 0 493--508, 2012

  39. [39]

    Y. Wang, H. Fu, and D. Zeng. Learning optimal personalized treatment rules in consideration of benefit and risk: with an application to treating type 2 diabetes patients with insulin therapies. Journal of the American Statistical Association, 113 0 (521): 0 1--13, 2018

  40. [40]

    Zhang, A

    B. Zhang, A. A. Tsiatis, E. B. Laber, and M. Davidian. A robust method for estimating optimal treatment regimes. Biometrics, 68 0 (4): 0 1010--1018, 2012

  41. [41]

    Zhang, A

    B. Zhang, A. A. Tsiatis, E. B. Laber, and M. Davidian. Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika, 100 0 (3): 0 681--694, 2013 a

  42. [42]

    Zhang, J

    Y. Zhang, J. Duchi, and M. Wainwright. Divide and conquer kernel ridge regression. In Conference on learning theory, pages 592--617. PMLR, 2013 b

  43. [43]

    Y. Zhao, D. Zeng, A. J. Rush, and M. R. Kosorok. Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107 0 (499): 0 1106--1118, 2012