Optimal Inference of the Mean Outcome under Optimal Treatment Regime
Pith reviewed 2026-05-18 17:14 UTC · model grok-4.3
The pith
Adaptive smoothing produces an estimator that reaches the asymptotic variance lower bound for inferring the mean outcome under an optimal treatment regime.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By employing adaptive smoothing over the estimated optimal treatment regime, the authors construct a robust asymptotically linear unbiased estimator for the mean outcome under the OTR. They derive the lower bound of its asymptotic variance and verify that their estimator achieves this bound, thereby establishing optimality irrespective of regularity conditions on the OTR.
What carries the argument
The adaptive smoothing mechanism applied to the estimated optimal treatment regime, enabling a general class of robust asymptotically linear unbiased estimators to achieve the variance lower bound.
If this is right
- The inference remains valid and unbiased even if the optimal treatment regime is not uniquely identified.
- The estimator attains the minimal possible asymptotic variance within the class of robust asymptotically linear unbiased estimators.
- This paves the way for establishing semiparametric efficiency bounds in more general nonregular OTR inference problems.
- The method is applicable to real-world data, as shown by the re-analysis of the ACTG 175 trial.
Where Pith is reading between the lines
- Techniques like this could be adapted for inference on other nonregular causal effects in observational studies.
- Future work might explore how this bound behaves under different data distributions or with high-dimensional covariates.
- The optimality result suggests that smoothing can be a general strategy for handling discontinuities in treatment effect models.
Load-bearing premise
The proof depends on the assumption that there exists a broad class of robust asymptotically linear unbiased estimators whose asymptotic properties can be characterized without depending on the details of the smoothing procedure.
What would settle it
Finding an alternative estimator within the robust class that has a smaller asymptotic variance while remaining unbiased and asymptotically linear would falsify the claim that the proposed method achieves the lower bound.
Figures
read the original abstract
When an optimal treatment regime (OTR) is considered, we need to evaluate the OTR in a valid and efficient way. The classical inference applied to the mean outcome under OTR, assuming the OTR is the same as the estimated OTR, might be biased when the regularity assumption that OTR is unique is violated. Although several methods have been proposed to allow nonregularity in such inference, its optimality is unclear due to challenges in deriving semiparametric efficiency bounds under potential nonregularity. In this paper, we address the bias issue via adaptive smoothing over the estimated OTR and develop a valid inference procedure on the mean outcome under OTR regardless of whether regularity is satisfied. We establish the optimality of the proposed method by deriving a lower bound of the asymptotic variance for the robust asymptotically linear unbiased estimator to the mean outcome under OTR and showing that our proposed estimator achieves the variance lower bound. The considered estimator class is general and the derived variance lower bound paves a novel way to establish efficiency optimality theories for OTR in a more general scenario allowing nonregularity. The merit of the proposed method is demonstrated by re-analyzing the ACTG 175 trial.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an adaptive smoothing procedure over the estimated optimal treatment regime (OTR) to obtain valid inference for the mean outcome under the OTR, even when the OTR is non-unique. It derives a lower bound on the asymptotic variance that applies to the general class of robust asymptotically linear unbiased estimators for this functional and shows that the proposed estimator attains the bound, thereby establishing optimality. The method is illustrated via re-analysis of the ACTG 175 trial.
Significance. If the lower-bound derivation and attainment result are correct, the work supplies a concrete efficiency benchmark for nonregular OTR inference that has been missing from the literature; this could serve as a template for establishing optimality in other nonregular causal functionals.
major comments (2)
- [Abstract and lower-bound derivation section] Abstract and the section deriving the lower bound: the claim that the asymptotic variance lower bound holds for a general class of robust asymptotically linear unbiased estimators whose behavior can be characterized independently of the smoothing procedure is load-bearing for the optimality conclusion. The adaptive smoothing step (introduced to restore regularity when the OTR is non-unique) selects a data-dependent bandwidth that may enter the influence function; without an explicit uniformity argument showing that this dependence does not alter the class of allowable influence functions, the bound may not automatically apply to the proposed estimator.
- [Attainment / optimality section] The section establishing attainment of the bound: the proof that the adaptive-smoothing estimator belongs to the class and achieves the variance lower bound must verify that the smoothing parameter choice preserves asymptotic linearity uniformly over the neighborhoods where non-uniqueness occurs. If the linearity expansion depends on the rate at which the smoothing parameter vanishes, the attainment step requires additional technical conditions that are not yet visible in the abstract.
minor comments (2)
- [Method section] The notation distinguishing the estimated OTR from the true OTR could be made more consistent across the bias-correction and smoothing steps.
- [Numerical studies] The simulation study would be strengthened by reporting coverage probabilities under both regular and non-regular regimes side-by-side.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments. We provide point-by-point responses to the major comments below, and we plan to incorporate clarifications and additional technical details in a revised version of the manuscript.
read point-by-point responses
-
Referee: [Abstract and lower-bound derivation section] Abstract and the section deriving the lower bound: the claim that the asymptotic variance lower bound holds for a general class of robust asymptotically linear unbiased estimators whose behavior can be characterized independently of the smoothing procedure is load-bearing for the optimality conclusion. The adaptive smoothing step (introduced to restore regularity when the OTR is non-unique) selects a data-dependent bandwidth that may enter the influence function; without an explicit uniformity argument showing that this dependence does not alter the class of allowable influence functions, the bound may not automatically apply to the proposed estimator.
Authors: We are grateful to the referee for identifying this key point regarding the applicability of the lower bound to our estimator. The class of robust asymptotically linear unbiased estimators is defined such that their influence functions can be characterized independently of any particular smoothing procedure. Our adaptive smoothing method is constructed to ensure that the data-dependent bandwidth selection does not affect the limiting influence function, as the bandwidth converges to zero at an appropriate rate. To strengthen the argument, we will add an explicit uniformity argument in the revised manuscript to demonstrate that the dependence on the smoothing procedure does not alter the class of allowable influence functions. This will be detailed in the section deriving the lower bound. revision: yes
-
Referee: [Attainment / optimality section] The section establishing attainment of the bound: the proof that the adaptive-smoothing estimator belongs to the class and achieves the variance lower bound must verify that the smoothing parameter choice preserves asymptotic linearity uniformly over the neighborhoods where non-uniqueness occurs. If the linearity expansion depends on the rate at which the smoothing parameter vanishes, the attainment step requires additional technical conditions that are not yet visible in the abstract.
Authors: We acknowledge the referee's concern about the need for uniform preservation of asymptotic linearity in the attainment proof. Our current development shows that the smoothing parameter vanishes at a rate sufficient to maintain the asymptotic expansion uniformly over the relevant neighborhoods. We agree that making the dependence on the vanishing rate explicit would improve clarity. In the revised manuscript, we will include additional technical conditions on the rate at which the smoothing parameter vanishes and provide a detailed uniform asymptotic linearity expansion in the attainment section to confirm that the proposed estimator achieves the variance lower bound. revision: yes
Circularity Check
Variance lower bound derived for general estimator class independently of specific adaptive smoothing
full rationale
The paper derives a lower bound on asymptotic variance for the broad class of robust asymptotically linear unbiased estimators to the mean outcome under OTR, then separately verifies that the proposed adaptive-smoothing estimator belongs to this class and attains the bound. The abstract explicitly states that the class is general and that its asymptotic behavior can be characterized independently of the specific smoothing procedure. No equations or steps in the provided description reduce the bound to a fit from the same data, a self-citation chain, or a redefinition of the proposed estimator itself. The derivation is therefore self-contained against external benchmarks for the class.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard semiparametric assumptions allowing consistent estimation of the optimal treatment regime and asymptotic linearity of estimators
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We establish the optimality of the proposed method by deriving a lower bound of the asymptotic variance for the robust asymptotically linear unbiased estimator... (Theorem 3.1)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Definition 1 (Robust Asymptotically Linear Unbiased Estimator) and the variational decomposition used to attain the variance bound
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
J.-Y. Audibert and A. B. Tsybakov. Fast learning rates for plug-in classifiers. 2007
work page 2007
-
[3]
E. Beutner and H. Z \"a hle. Functional delta-method for the bootstrap of quasi-hadamard differentiable functionals. 2016
work page 2016
-
[4]
P. J. Bickel, C. A. Klaassen, P. J. Bickel, Y. Ritov, J. Klaassen, J. A. Wellner, and Y. Ritov. Efficient and adaptive estimation for semiparametric models, volume 4. Springer, 1993
work page 1993
-
[5]
S. W. Cardoso, T. S. Torres, M. Santini-Oliveira, L. M. S. Marins, V. G. Veloso, and B. Grinsztejn. Aging with hiv: a practical review. Brazilian Journal of Infectious Diseases, 17: 0 464--479, 2013
work page 2013
-
[6]
B. Chakraborty and S. A. Murphy. Dynamic treatment regimes. Annual review of statistics and its application, 1: 0 447--464, 2014
work page 2014
-
[7]
B. Chakraborty, S. Murphy, and V. Strecher. Inference for non-regular parameters in optimal dynamic treatment regimes. Statistical methods in medical research, 19 0 (3): 0 317--343, 2010
work page 2010
-
[8]
B. Chakraborty, E. B. Laber, and Y.-Q. Zhao. Inference about the expected performance of a data-driven dynamic treatment regime. Clinical Trials, 11 0 (4): 0 408--417, 2014
work page 2014
-
[9]
V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters, 2018
work page 2018
- [10]
-
[11]
C. Fan, W. Lu, R. Song, and Y. Zhou. Concordance-assisted learning for estimating optimal individualized treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79 0 (5): 0 1565--1582, 2017
work page 2017
-
[12]
Inference on Directionally Differentiable Functions
Z. Fang and A. Santos. Inference on directionally differentiable functions. arXiv preprint arXiv:1404.3763, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[13]
S. Farahmandpour, N. Dehghani, A. Khalkhalizadeh, P. Hajihossein, and A. Nikdehqan. Personalized medicine for hiv control: A systematic review study. Personalized Medicine Journal, 8 0 (30): 0 9--16, 2023
work page 2023
-
[14]
P. G de, H. Lund-Andersen, H.-H. Parving, and O. Pedersen. Effect of a multifactorial intervention on mortality in type 2 diabetes. New England Journal of Medicine, 358 0 (6): 0 580--591, 2008
work page 2008
-
[15]
Searching results of google scholar
Google. Searching results of google scholar. https://scholar.google.com/scholar?hl=en&as_sdt=0 Accessed: 2025-08-031
work page 2025
-
[16]
S. R. Group. A randomized trial of intensive versus standard blood-pressure control. New England Journal of Medicine, 373 0 (22): 0 2103--2116, 2015
work page 2015
-
[17]
S. M. Hammer, D. A. Katzenstein, M. D. Hughes, H. Gundacker, R. T. Schooley, R. H. Haubrich, W. K. Henry, M. M. Lederman, J. P. Phair, M. Niu, et al. A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine, 335 0 (15): 0 1081--1090, 1996
work page 1996
-
[18]
N. Illenberger, A. J. Spieker, and N. Mitra. Identifying optimally cost-effective dynamic treatment regimes with a q-learning approach. Journal of the Royal Statistical Society Series C: Applied Statistics, 72 0 (2): 0 434--449, 2023
work page 2023
-
[19]
E. H. Kennedy. Semiparametric theory. arXiv preprint arXiv:1709.06418, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
P. W. Lavori and R. Dawson. Dynamic treatment regimes: practical design considerations. Clinical trials, 1 0 (1): 0 9--20, 2004
work page 2004
-
[21]
Z. Li, J. Chen, E. Laber, F. Liu, and R. Baumgartner. Optimal treatment regimes: a review and empirical comparison. International Statistical Review, 91 0 (3): 0 427--463, 2023
work page 2023
-
[22]
A. R. Luedtke and M. J. Van Der Laan. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Annals of statistics, 44 0 (2): 0 713, 2016
work page 2016
-
[23]
D. C. Malone, L. E. Hines, and J. S. Graff. The good, the bad, and the different: a primer on aspects of heterogeneity of treatment effects. Journal of Managed Care Pharmacy, 20 0 (6): 0 555--563, 2014
work page 2014
-
[24]
E. E. Moodie, B. Chakraborty, and M. S. Kramer. Q-learning for estimating optimal dynamic treatment rules from observational data. Canadian Journal of Statistics, 40 0 (4): 0 629--645, 2012
work page 2012
-
[25]
L. Orellana, A. Rotnitzky, and J. M. Robins. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part i: main content. The international journal of biostatistics, 6 0 (2), 2010 a
work page 2010
-
[26]
L. Orellana, A. Rotnitzky, and J. M. Robins. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part ii: proofs of results. The international journal of biostatistics, 6 0 (2), 2010 b
work page 2010
- [27]
-
[28]
J. M. Robins. Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium in Biostatistics: analysis of correlated data, pages 189--326. Springer, 2004
work page 2004
-
[29]
P. E. Sax, C. Tierney, A. C. Collier, M. A. Fischl, K. Mollan, L. Peeples, C. Godfrey, N. C. Jahed, L. Myers, D. Katzenstein, et al. Abacavir--lamivudine versus tenofovir--emtricitabine for initial hiv-1 therapy. New England Journal of Medicine, 361 0 (23): 0 2230--2240, 2009
work page 2009
-
[30]
X. Shen, D. Wolfe, and S. Zhou. Local asymptotics for regression splines and confidence regions. The annals of statistics, 26 0 (5): 0 1760--1782, 1998
work page 1998
-
[31]
C. Shi, W. Lu, and R. Song. Breaking the curse of nonregularity with subagging: inference of the mean outcome under optimal treatment regimes. Journal of Machine Learning Research, 21, 2020
work page 2020
-
[32]
J. A. Sparano, R. J. Gray, D. F. Makower, K. I. Pritchard, K. S. Albain, D. F. Hayes, C. E. Geyer Jr, E. C. Dees, M. P. Goetz, J. A. Olson Jr, et al. Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer. New England Journal of Medicine, 379 0 (2): 0 111--121, 2018
work page 2018
-
[33]
D. Thomassen, S. Cessie, H. C. Houwelingen, and E. W. Steyerberg. Effective sample size: A measure of individual uncertainty in predictions. Statistics in medicine., 0 (7): 0 43, 2024
work page 2024
-
[34]
M. J. Van der Laan and A. R. Luedtke. Targeted learning of an optimal dynamic treatment, and statistical inference for its mean outcome. 2014
work page 2014
-
[35]
M. J. Van der Laan, S. Rose, et al. Targeted learning: causal inference for observational and experimental data, volume 4. Springer, 2011
work page 2011
-
[36]
A. W. Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000
work page 2000
-
[37]
S. Wager and S. Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113 0 (523): 0 1228--1242, 2018
work page 2018
-
[38]
L. Wang, A. Rotnitzky, X. Lin, R. E. Millikan, and P. F. Thall. Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. Journal of the American Statistical Association, 107 0 (498): 0 493--508, 2012
work page 2012
-
[39]
Y. Wang, H. Fu, and D. Zeng. Learning optimal personalized treatment rules in consideration of benefit and risk: with an application to treating type 2 diabetes patients with insulin therapies. Journal of the American Statistical Association, 113 0 (521): 0 1--13, 2018
work page 2018
- [40]
- [41]
- [42]
-
[43]
Y. Zhao, D. Zeng, A. J. Rush, and M. R. Kosorok. Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107 0 (499): 0 1106--1118, 2012
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.