Nonparametric inference for sublevel-set probabilities of conditional average treatment effect functions

arxiv: 2605.15373 · v1 · pith:WGRJXBIQnew · submitted 2026-05-14 · 📊 stat.ME

Nonparametric inference for sublevel-set probabilities of conditional average treatment effect functions

Anders Munch , Thomas A. Gerds This is my paper

Pith reviewed 2026-05-19 15:22 UTC · model grok-4.3

classification 📊 stat.ME

keywords conditional average treatment effectsublevel setstreatment heterogeneitymonotone estimationGrenander estimatornonparametric inferencecausal inferencedebiased machine learning

0 comments p. Extension

pith:WGRJXBIQ Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{WGRJXBIQ}

Prints a linked pith:WGRJXBIQ badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

The probability that a conditional average treatment effect falls below a given threshold produces a monotone curve summarizing treatment heterogeneity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

When individuals respond differently to treatment, the overall average effect can mask important variation. The paper defines the probability that the conditional average treatment effect lies below a chosen threshold as a single interpretable number representing the share of the population meeting that condition. Varying the threshold traces out a univariate monotone curve that visualizes the type and extent of heterogeneity without requiring inspection of a high-dimensional function. The authors prove that this curve is not pathwise differentiable in a nonparametric model, then construct a Grenander-type estimator that combines machine learning with monotone-function techniques and a debiased estimator for its piecewise-linear approximation.

Core claim

We formalize the curve of sublevel-set probabilities of a CATE function as a target parameter. This curve is not pathwise differentiable under a nonparametric model. To address this, we leverage advances in monotone function estimation and develop a Grenander-type estimator that incorporates machine learning. We also show that the best piecewise linear approximation to the curve is pathwise differentiable and develop a debiased machine learning estimator for it. The methods are studied in numerical experiments based on data synthesized from randomized trials and illustrated on a diabetes medication trial.

What carries the argument

The sublevel-set probability of the CATE function, defined as the probability that CATE(X) does not exceed a prespecified threshold, which traces a univariate monotone curve as the threshold varies.

If this is right

Varying the threshold produces a univariate monotone curve that visualizes the overall type and degree of heterogeneity in a population.
The curve can be targeted and estimated via monotone function techniques combined with machine learning.
The best piecewise linear approximation to the curve is pathwise differentiable and admits a debiased machine learning estimator.
Finite-sample performance of the estimators can be assessed in numerical studies based on synthesized randomized trial data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sublevel-probability construction could be applied to other causal functionals that lack pathwise differentiability once monotonicity is imposed.
The resulting curve offers a direct way to communicate the fraction of a population expected to benefit or be harmed by treatment at any chosen effect size.
Extensions to observational data would require only that the identification assumptions for the CATE remain valid and that the monotonicity structure is preserved.

Load-bearing premise

The conditional average treatment effect function is identifiable from observed data under randomized treatment assignment.

What would settle it

A simulation or randomized trial in which the true proportion of individuals whose CATE lies below each threshold is known from the data-generating process and the proposed Grenander-type estimator fails to recover that proportion at the rates predicted by the theory.

Figures

Figures reproduced from arXiv: 2605.15373 by Anders Munch, Thomas A. Gerds.

**Figure 2.** Figure 2: The solid curve is the sublevel function [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The blue line show the best piece-wise linear approximations to the black curve ( [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: The pointwise bias and mean squared error (MSE) of the four estimators of [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: The left panel shows the pointwise coverage for the pointwise confidence intervals pro [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Four different estimates of the sublevel function [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: The true sublevel function (γ) and its best piece-wise linear approximation (γ #) based on a fixed set of knot points in the interval [−0.05, .1] for each of the three data-generating mechanism described in Section 5. The dashed vertical line at α = 0.01 denotes the value at which we evaluated the pointwise performance of the suggested estimators. L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2… view at source ↗

read the original abstract

The average treatment effect can obscure important heterogeneity when individuals respond differently to a treatment. While the conditional average treatment effect (CATE) function captures such heterogeneity, it is difficult to communicate when it depends on many covariates. Sublevels sets of a multivariate CATE function are equally complicated objects, but the probability of a sublevel set of a CATE function is a single number with a simple interpretation as the proportion of individuals whose expected treatment effect does not exceed a prespecified threshold. By varying the threshold, a univariate monotone curve appears which can be used to visualize the overall type and degree of heterogeneity in a population. We formalize this curve as a target parameter and show that it is not pathwise differentiable under a nonparametric model. To address this nonstandard estimation problem, we leverage recent advances in monotone function estimation and develop a Grenander-type estimator that incorporates machine learning. We also show that the best piecewise linear approximation to the curve of interest is a pathwise differentiable parameter, and we develop a debiased machine learning estimator of this approximation. We investigate our proposed estimators' finite sample performance in a sequence of numerical studies based on data synthesized from a randomized trial. The methods are illustrated in data from a randomized trial on diabetes medication.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a sublevel-set probability curve for the CATE as an interpretable univariate summary of heterogeneity and justifies a Grenander-type estimator by claiming non-pathwise differentiability, though that claim may hinge on unstated continuity conditions.

read the letter

The punchline is that this paper defines a sublevel-set probability curve for the conditional average treatment effect and shows it is not pathwise differentiable, which justifies their use of a Grenander-type estimator combined with machine learning. What stands out as new is the formalization of θ(t) = P(τ(X) ≤ t) as a target parameter that produces a simple monotone curve for visualizing heterogeneity. This is paired with an estimator that uses ML to estimate the CATE and then applies monotone function techniques, along with a pathwise differentiable piecewise linear approximation that admits a debiased ML estimator. The numerical studies based on synthesized data from a randomized trial provide some evidence on finite-sample performance, and the diabetes trial illustration shows a potential use case. The paper does well in offering an interpretable summary that avoids the complexity of full multivariate CATE surfaces. The interpretation as the proportion of individuals with treatment effect at or below a threshold is direct and could help in communicating results to non-statisticians in clinical or policy settings. Building on existing work in monotone estimation and debiased machine learning without circularity is a plus. On the soft spots, the non-differentiability claim is central but the stress-test raises a fair point. If the tangent space perturbations do not adequately handle cases where P(τ(X) = t) > 0, a directional derivative might still exist, which could mean the functional is differentiable in some settings and standard methods could apply. The paper likely specifies conditions to rule this out, but it would be good to see explicit discussion of atoms or discrete components in the CATE distribution. The studies are limited to synthesized data, so real-data performance and sensitivity to ML nuisance estimation errors remain to be seen. Overall these are not load-bearing issues if the main theorems hold under the stated assumptions. This is for statisticians and causal inference researchers interested in new functionals of the CATE. Readers working on estimation under non-differentiable parameters or monotone methods would get value from the technical developments. It has enough novelty and grounding to deserve a serious referee who can check the proofs and suggest clarifications on the scope of the non-differentiability result. I recommend putting it through peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript formalizes the sublevel-set probability curve θ(t) = P(τ(X) ≤ t) as a univariate monotone summary of treatment-effect heterogeneity for the conditional average treatment effect function τ. It proves that θ(t) is not pathwise differentiable under a nonparametric model, develops a Grenander-type estimator that incorporates machine learning, and constructs a debiased machine-learning estimator for the best piecewise-linear approximation to the curve. Finite-sample performance is examined in numerical studies on synthesized randomized-trial data, and the methods are illustrated on data from a diabetes-medication trial.

Significance. If the non-differentiability result holds, the work supplies a simple, interpretable univariate curve for visualizing the overall type and degree of heterogeneity that is otherwise difficult to communicate from a high-dimensional CATE. The integration of recent monotone-function estimation techniques with modern machine learning is a methodological contribution, and the dual-estimator strategy (Grenander-type for the original parameter and debiased ML for the differentiable approximation) is a pragmatic response to the non-regularity. The numerical studies on synthesized data and the real-data illustration provide concrete evidence of applicability in randomized-trial settings.

major comments (3)

[Section on target parameter and non-differentiability] Section formalizing the target parameter and the non-differentiability result: the tangent-space argument for non-pathwise differentiability should explicitly treat the case in which the distribution of τ(X) places positive mass at the level t. When P(τ(X)=t)>0, a directional derivative may still exist, which would undermine the claim that the functional is non-differentiable and therefore the justification for abandoning standard debiased ML in favor of the Grenander-type estimator.
[Estimator construction] Description of the Grenander-type estimator (presumably §3 or §4): the precise regularity conditions under which the machine-learning plug-in for the CATE is inserted into the monotone estimator, and the resulting asymptotic distribution, are not fully stated. Without these conditions it is difficult to verify that the proposed estimator attains the expected cube-root rate or that the confidence bands are valid.
[Numerical studies] Numerical studies section: the manuscript states that finite-sample performance is investigated, yet no quantitative summaries (bias, MSE, coverage rates, or comparison to oracle estimators) are provided in the text or tables. This omission prevents assessment of whether the estimators behave as predicted by the theory under the synthesized randomized-trial designs.

minor comments (2)

[Abstract] The abstract could explicitly name the non-differentiability result and the two proposed estimators to give readers an immediate overview of the technical contribution.
[Throughout] Notation for the CATE function and the sublevel probability should be introduced once and used consistently; occasional switches between τ(X) and other symbols for the same object reduce readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each of the three major comments in turn below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Section on target parameter and non-differentiability] Section formalizing the target parameter and the non-differentiability result: the tangent-space argument for non-pathwise differentiability should explicitly treat the case in which the distribution of τ(X) places positive mass at the level t. When P(τ(X)=t)>0, a directional derivative may still exist, which would undermine the claim that the functional is non-differentiable and therefore the justification for abandoning standard debiased ML in favor of the Grenander-type estimator.

Authors: We appreciate the referee highlighting the need to treat atoms explicitly. The manuscript's non-differentiability argument relies on the fact that θ(t) is defined via the distribution function of the random variable τ(X), and the tangent-space calculation shows that no linear representation exists for arbitrary perturbations of the law of (X, Y(0), Y(1)). When P(τ(X)=t)>0 the functional has a discontinuity in t, but this does not restore pathwise differentiability under the nonparametric model; suitable score functions can still produce second-order changes that prevent a first-order representation. To make this transparent, we will revise the relevant section to include a separate paragraph (or lemma) that directly addresses the atomic case and confirms that the directional derivative fails to exist in the full nonparametric tangent space. revision: yes
Referee: [Estimator construction] Description of the Grenander-type estimator (presumably §3 or §4): the precise regularity conditions under which the machine-learning plug-in for the CATE is inserted into the monotone estimator, and the resulting asymptotic distribution, are not fully stated. Without these conditions it is difficult to verify that the proposed estimator attains the expected cube-root rate or that the confidence bands are valid.

Authors: The referee correctly notes that the regularity conditions and limiting distribution for the Grenander-type estimator are stated at a high level. In the revision we will add an explicit theorem (with numbered assumptions) that lists the required convergence rates for the machine-learning estimator of τ, the smoothness conditions on the density of τ(X), and the resulting cube-root-n asymptotic distribution of the estimator and its associated confidence bands. This will make verification of the cube-root rate and band validity straightforward. revision: yes
Referee: [Numerical studies] Numerical studies section: the manuscript states that finite-sample performance is investigated, yet no quantitative summaries (bias, MSE, coverage rates, or comparison to oracle estimators) are provided in the text or tables. This omission prevents assessment of whether the estimators behave as predicted by the theory under the synthesized randomized-trial designs.

Authors: We agree that the numerical studies would be more informative with explicit quantitative summaries. The current version emphasizes visual diagnostics; the revised manuscript will include a table (or set of tables) reporting bias, MSE, coverage probabilities of the confidence bands, and comparisons against oracle estimators that use the true CATE, for each of the simulation designs described in the section. revision: yes

Circularity Check

0 steps flagged

No circularity: target parameter and non-differentiability shown via external tangent-space methods

full rationale

The paper defines the sublevel-set probability θ(t) = P(τ(X) ≤ t) directly from the identifiable CATE function under randomized assignment, then invokes standard nonparametric tangent-space arguments to establish lack of pathwise differentiability. Estimation proceeds by importing Grenander-type monotone estimators and debiased ML from the cited literature on monotone functions and double ML, without any reduction of θ(t) to a fitted parameter or self-referential construction. No self-citation is load-bearing for the core non-differentiability claim, and the piecewise-linear approximation is handled separately with its own influence function. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the paper relies on standard causal identification assumptions for randomized trials and on the mathematical property that the sublevel-set probability is monotone; no free parameters or new entities are mentioned.

axioms (2)

domain assumption The conditional average treatment effect function is identifiable from the observed data under randomized treatment assignment.
Required to define and estimate the sublevel-set probabilities from trial data as done in the numerical studies and diabetes illustration.
standard math The sublevel-set probability curve is monotone non-decreasing in the threshold value.
Follows directly from the definition of sublevel sets and underpins the applicability of Grenander-type monotone estimation.

pith-pipeline@v0.9.0 · 5748 in / 1455 out tokens · 60102 ms · 2026-05-19T15:22:24.254666+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formalize this curve as a target parameter and show that it is not pathwise differentiable under a nonparametric model.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the efficient influence function of Γ(α) is υ_α(P)(O) = 1{τ(P)(W)≤α}(α−φ(P)(O))−Γ(P)(α)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

[1]

Audibert and A

J.-Y. Audibert and A. B. Tsybakov. Fast learning rates for plug-in classifiers. The Annals of Statistics, 2007

work page 2007
[2]

P. J. Bickel, C. A. Klaassen, Y. Ritov, and J. A. Wellner. Efficient and adaptive estimation for semiparametric models, volume 4. Johns Hopkins University Press Baltimore, 1993

work page 1993
[3]

Bonvini, E

M. Bonvini, E. H. Kennedy, and L. J. Keele. Minimax optimal subgroup identification. arXiv preprint arXiv:2306.17464, 2023

work page arXiv 2023
[4]

L. Breiman. Stacked regressions. Machine learning, 24 0 (1): 0 49--64, 1996

work page 1996
[5]

L. Breiman. Random forests. Machine Learning, 45 0 (1): 0 5--32, 2001. doi:10.1023/A:1010933404324

work page doi:10.1023/a:1010933404324 2001
[6]

Cavalier

L. Cavalier. Nonparametric estimation of regression level sets. Statistics A Journal of Theoretical and Applied Statistics, 29 0 (2): 0 131--160, 1997

work page 1997
[7]

XGBoost: A scalable tree boosting system,

T. Chen and C. Guestrin. XGBoost : A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD '16, pages 785--794, New York, NY, USA, 2016. Association for Computing Machinery. doi:10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016
[8]

Y.-C. Chen, C. R. Genovese, and L. Wasserman. Density level sets: Asymptotics, inference, and visualization. Journal of the American Statistical Association, 112 0 (520): 0 1684--1696, 2017

work page 2017
[9]

Chernozhukov, I

V. Chernozhukov, I. Fernandez-Val, and A. Galichon. Improving point and interval estimators of monotone functions by rearrangement. Biometrika, 96 0 (3): 0 559--575, 2009

work page 2009
[10]

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters . The Econometrics Journal, 21 0 (1): 0 C1--C68, 01 2018 a . doi:10.1111/ectj.12097. URL https://doi.org/10.1111/ectj.12097

work page doi:10.1111/ectj.12097 2018
[11]

Chernozhukov, I

V. Chernozhukov, I. Fern \'a ndez-Val, and Y. Luo. The sorted effects method: Discovering heterogeneous effects beyond their averages. Econometrica, 86 0 (6): 0 1911--1938, 2018 b

work page 1911
[12]

Chernozhukov, M

V. Chernozhukov, M. Demirer, E. Duflo, and I. Fern \'a ndez-Val. Fisher-schultz lecture: Generic machine learning inference on heterogenous treatment effects in randomized experiments, with an application to immunization in india. arXiv preprint arXiv:1712.04802, 2023

work page arXiv 2023
[13]

C. De Boor. A practical guide to splines, volume 27. springer New York, 1978

work page 1978
[14]

Devroye, L

L. Devroye, L. Gy \"o rfi, and G. Lugosi. A probabilistic theory of pattern recognition, volume 31. Springer Science & Business Media, 1996

work page 1996
[15]

D. J. Foster and V. Syrgkanis. Orthogonal statistical learning. arXiv preprint arXiv:1901.09036, 2019

work page arXiv 1901
[16]

J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33 0 (1): 0 1--22, 2010. doi:10.18637/jss.v033.i01

work page doi:10.18637/jss.v033.i01 2010
[17]

C. J. Geyer. On the asymptotics of constrained m-estimation. The Annals of statistics, pages 1993--2010, 1994

work page 1993
[18]

R. D. Gill, M. J. Laan, and J. M. Robins. Coarsening at random: Characterizations, conjectures, counter-examples. In Proceedings of the First Seattle Symposium in Biostatistics, pages 255--294. Springer, 1997

work page 1997
[19]

Groeneboom

P. Groeneboom. Estimating a monotone density. In Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, Vol. II, 1983

work page 1983
[20]

Groeneboom and J

P. Groeneboom and J. A. Wellner. Computing chernoff's distribution. Journal of Computational and Graphical Statistics, 10 0 (2): 0 388--400, 2001

work page 2001
[21]

Hernán and J

M. Hernán and J. Robins. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC, 2020

work page 2020
[22]

J. L. Hill. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20 0 (1): 0 217--240, 2011

work page 2011
[23]

Hines, K

O. Hines, K. Diaz-Ordaz, and S. Vansteelandt. Variable importance measures for heterogeneous causal effects. arXiv preprint arXiv:2204.06030, 2022 a

work page arXiv 2022
[24]

Hines, O

O. Hines, O. Dukes, K. Diaz-Ordaz, and S. Vansteelandt. Demystifying statistical learning based on efficient influence functions. The American Statistician, 76 0 (3): 0 292--304, 2022 b

work page 2022
[25]

E. H. Kennedy. Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469, 2022

work page arXiv 2022
[26]

E. H. Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17 0 (2): 0 3008--3049, 2023

work page 2023
[27]

E. H. Kennedy, S. Balakrishnan, and L. Wasserman. Semiparametric counterfactual density estimation. Biometrika, 110 0 (4): 0 875--896, 2023

work page 2023
[28]

S. R. K \"u nzel, J. S. Sekhon, P. J. Bickel, and B. Yu. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116 0 (10): 0 4156--4165, 2019

work page 2019
[29]

J. Levy, M. van der Laan, A. Hubbard, and R. Pirracchio. A fundamental measure of treatment effect heterogeneity. Journal of Causal Inference, 9 0 (1): 0 83--108, 2021

work page 2021
[30]

M. Lu, S. Sadiq, D. J. Feaster, and H. Ishwaran. Estimating individual treatment effect in observational data using random forest methods. Journal of Computational and Graphical Statistics, 27 0 (1): 0 209--219, 2018

work page 2018
[31]

Mammen and W

E. Mammen and W. Polonik. Confidence regions for level sets. Journal of Multivariate Analysis, 122: 0 202--214, 2013

work page 2013
[32]

Mammen and A

E. Mammen and A. B. Tsybakov. Smooth discrimination analysis. The Annals of Statistics, 27 0 (6): 0 1808--1829, 1999

work page 1999
[33]

S. P. Marso, G. H. Daniels, K. Brown-Frandsen, P. Kristensen, J. F. Mann, M. A. Nauck, S. E. Nissen, S. Pocock, N. R. Poulter, L. S. Ravn, et al. Liraglutide and cardiovascular outcomes in type 2 diabetes. New England Journal of Medicine, 375 0 (4): 0 311--322, 2016

work page 2016
[34]

D. M. Mason and W. Polonik. Asymptotic normality of plug-in level set estimates. Annals of Applied Probability, 2009

work page 2009
[35]

J. L. Montiel Olea and M. Plagborg-M ller. Simultaneous confidence bands: Theory, implementation, and an application to svars. Journal of Applied Econometrics, 34 0 (1): 0 1--17, 2019

work page 2019
[36]

J. Neyman. Sur les applications de la th \'e orie des probabilit \'e s aux experiences agricoles: Essai des principes. Roczniki Nauk Rolniczych, 10 0 (1): 0 1--51, 1923

work page 1923
[37]

Nie and S

X. Nie and S. Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108 0 (2): 0 299--319, 2021

work page 2021
[38]

Pfanzagl and W

J. Pfanzagl and W. Wefelmeyer. Contributions to a general asymptotic statistical theory. Springer, 1982

work page 1982
[39]

W. Polonik. Measuring mass concentrations and estimating density contour clusters-an excess mass approach. The annals of Statistics, pages 855--881, 1995

work page 1995
[40]

Qiao and W

W. Qiao and W. Polonik. Nonparametric confidence regions for level sets: Statistical properties and geometry. Electronic Journal of Statistics, 2019

work page 2019
[41]

H. W. Reeve, T. I. Cannings, and R. J. Samworth. Optimal subgroup selection. The Annals of Statistics, 51 0 (6): 0 2342--2365, 2023

work page 2023
[42]

Rigollet and R

P. Rigollet and R. Vert. Optimal rates for plug-in estimators of density level sets. Bernoulli, 2009

work page 2009
[43]

J. Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling, 7 0 (9-12): 0 1393--1512, 1986

work page 1986
[44]

D. B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66 0 (5): 0 688, 1974

work page 1974
[45]

Semenova and V

V. Semenova and V. Chernozhukov. Debiased machine learning of conditional average treatment effects and other causal functions. The Econometrics Journal, 24 0 (2): 0 264--289, 2021

work page 2021
[46]

A. Shapiro. On the asymptotics of constrained local m-estimators. Annals of statistics, pages 948--960, 2000

work page 2000
[47]

Tibshirani, S

J. Tibshirani, S. Athey, E. S. Sverdrup, and S. Wager. grf: Generalized Random Forests, 2024. URL https://CRAN.R-project.org/package=grf. R package version 2.4.0

work page 2024
[48]

A. B. Tsybakov. On nonparametric estimation of density level sets. The Annals of Statistics, 25 0 (3): 0 948--969, 1997

work page 1997
[49]

M. J. van der Laan and A. R. Luedtke. Targeted learning of an optimal dynamic treatment, and statistical inference for its mean outcome. Technical report, Berkeley Division of Biostatistics Working Paper Series, 2014

work page 2014
[50]

M. J. van der Laan and J. M. Robins. Unified methods for censored longitudinal data and causality. Springer Science & Business Media, 2003

work page 2003
[51]

M. J. van der Laan and S. Rose. Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media, 2011

work page 2011
[52]

M. J. van der Laan, E. C. Polley, and A. E. Hubbard. Super learner. Statistical applications in genetics and molecular biology, 6 0 (1), 2007

work page 2007
[53]

A. W. van der Vaart. On differentiable functionals. The Annals of Statistics, pages 178--204, 1991

work page 1991
[54]

A. W. van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000

work page 2000
[55]

A. W. van der Vaart. Semiparametric statistics. In Lectures on probability theory and statistics (Saint-Flour, 1999), pages 331--457. Springer, 2002

work page 1999
[56]

A. W. van der Vaart and M. J. van der Laan. Estimating a survival distribution with current status data and high-dimensional covariates. The International Journal of Biostatistics, 2 0 (1), 2006

work page 2006
[57]

A. W. van der Vaart and J. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Science & Business Media, 1996

work page 1996
[58]

Wager and S

S. Wager and S. Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113 0 (523): 0 1228--1242, 2018

work page 2018
[59]

Wasserman

L. Wasserman. All of nonparametric statistics. Springer Science & Business Media, 2006

work page 2006
[60]

Westling and M

T. Westling and M. Carone. A unified study of nonparametric inference for monotone functions. Annals of statistics, 48 0 (2): 0 1001, 2020

work page 2020
[61]

M. N. Wright and A. Ziegler. ranger: A fast implementation of random forests for high dimensional data in C++ and R . Journal of Statistical Software, 77 0 (1): 0 1--17, 2017. doi:10.18637/jss.v077.i01

work page doi:10.18637/jss.v077.i01 2017
[62]

Yadlowsky, S

S. Yadlowsky, S. Fleming, N. Shah, E. Brunskill, and S. Wager. Evaluating treatment prioritization rules via rank-weighted average treatment effects. Journal of the American Statistical Association, 120 0 (549): 0 38--51, 2025

work page 2025
[63]

S. C. Ziersen and T. Martinussen. Variable importance measures for heterogeneous treatment effects with survival outcome. Scandinavian Journal of Statistics, 2025

work page 2025
[64]

Zou and T

H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67 0 (2): 0 301--320, 2005

work page 2005

[1] [1]

Audibert and A

J.-Y. Audibert and A. B. Tsybakov. Fast learning rates for plug-in classifiers. The Annals of Statistics, 2007

work page 2007

[2] [2]

P. J. Bickel, C. A. Klaassen, Y. Ritov, and J. A. Wellner. Efficient and adaptive estimation for semiparametric models, volume 4. Johns Hopkins University Press Baltimore, 1993

work page 1993

[3] [3]

Bonvini, E

M. Bonvini, E. H. Kennedy, and L. J. Keele. Minimax optimal subgroup identification. arXiv preprint arXiv:2306.17464, 2023

work page arXiv 2023

[4] [4]

L. Breiman. Stacked regressions. Machine learning, 24 0 (1): 0 49--64, 1996

work page 1996

[5] [5]

L. Breiman. Random forests. Machine Learning, 45 0 (1): 0 5--32, 2001. doi:10.1023/A:1010933404324

work page doi:10.1023/a:1010933404324 2001

[6] [6]

Cavalier

L. Cavalier. Nonparametric estimation of regression level sets. Statistics A Journal of Theoretical and Applied Statistics, 29 0 (2): 0 131--160, 1997

work page 1997

[7] [7]

XGBoost: A scalable tree boosting system,

T. Chen and C. Guestrin. XGBoost : A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD '16, pages 785--794, New York, NY, USA, 2016. Association for Computing Machinery. doi:10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016

[8] [8]

Y.-C. Chen, C. R. Genovese, and L. Wasserman. Density level sets: Asymptotics, inference, and visualization. Journal of the American Statistical Association, 112 0 (520): 0 1684--1696, 2017

work page 2017

[9] [9]

Chernozhukov, I

V. Chernozhukov, I. Fernandez-Val, and A. Galichon. Improving point and interval estimators of monotone functions by rearrangement. Biometrika, 96 0 (3): 0 559--575, 2009

work page 2009

[10] [10]

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters . The Econometrics Journal, 21 0 (1): 0 C1--C68, 01 2018 a . doi:10.1111/ectj.12097. URL https://doi.org/10.1111/ectj.12097

work page doi:10.1111/ectj.12097 2018

[11] [11]

Chernozhukov, I

V. Chernozhukov, I. Fern \'a ndez-Val, and Y. Luo. The sorted effects method: Discovering heterogeneous effects beyond their averages. Econometrica, 86 0 (6): 0 1911--1938, 2018 b

work page 1911

[12] [12]

Chernozhukov, M

V. Chernozhukov, M. Demirer, E. Duflo, and I. Fern \'a ndez-Val. Fisher-schultz lecture: Generic machine learning inference on heterogenous treatment effects in randomized experiments, with an application to immunization in india. arXiv preprint arXiv:1712.04802, 2023

work page arXiv 2023

[13] [13]

C. De Boor. A practical guide to splines, volume 27. springer New York, 1978

work page 1978

[14] [14]

Devroye, L

L. Devroye, L. Gy \"o rfi, and G. Lugosi. A probabilistic theory of pattern recognition, volume 31. Springer Science & Business Media, 1996

work page 1996

[15] [15]

D. J. Foster and V. Syrgkanis. Orthogonal statistical learning. arXiv preprint arXiv:1901.09036, 2019

work page arXiv 1901

[16] [16]

J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33 0 (1): 0 1--22, 2010. doi:10.18637/jss.v033.i01

work page doi:10.18637/jss.v033.i01 2010

[17] [17]

C. J. Geyer. On the asymptotics of constrained m-estimation. The Annals of statistics, pages 1993--2010, 1994

work page 1993

[18] [18]

R. D. Gill, M. J. Laan, and J. M. Robins. Coarsening at random: Characterizations, conjectures, counter-examples. In Proceedings of the First Seattle Symposium in Biostatistics, pages 255--294. Springer, 1997

work page 1997

[19] [19]

Groeneboom

P. Groeneboom. Estimating a monotone density. In Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, Vol. II, 1983

work page 1983

[20] [20]

Groeneboom and J

P. Groeneboom and J. A. Wellner. Computing chernoff's distribution. Journal of Computational and Graphical Statistics, 10 0 (2): 0 388--400, 2001

work page 2001

[21] [21]

Hernán and J

M. Hernán and J. Robins. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC, 2020

work page 2020

[22] [22]

J. L. Hill. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20 0 (1): 0 217--240, 2011

work page 2011

[23] [23]

Hines, K

O. Hines, K. Diaz-Ordaz, and S. Vansteelandt. Variable importance measures for heterogeneous causal effects. arXiv preprint arXiv:2204.06030, 2022 a

work page arXiv 2022

[24] [24]

Hines, O

O. Hines, O. Dukes, K. Diaz-Ordaz, and S. Vansteelandt. Demystifying statistical learning based on efficient influence functions. The American Statistician, 76 0 (3): 0 292--304, 2022 b

work page 2022

[25] [25]

E. H. Kennedy. Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469, 2022

work page arXiv 2022

[26] [26]

E. H. Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17 0 (2): 0 3008--3049, 2023

work page 2023

[27] [27]

E. H. Kennedy, S. Balakrishnan, and L. Wasserman. Semiparametric counterfactual density estimation. Biometrika, 110 0 (4): 0 875--896, 2023

work page 2023

[28] [28]

S. R. K \"u nzel, J. S. Sekhon, P. J. Bickel, and B. Yu. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116 0 (10): 0 4156--4165, 2019

work page 2019

[29] [29]

J. Levy, M. van der Laan, A. Hubbard, and R. Pirracchio. A fundamental measure of treatment effect heterogeneity. Journal of Causal Inference, 9 0 (1): 0 83--108, 2021

work page 2021

[30] [30]

M. Lu, S. Sadiq, D. J. Feaster, and H. Ishwaran. Estimating individual treatment effect in observational data using random forest methods. Journal of Computational and Graphical Statistics, 27 0 (1): 0 209--219, 2018

work page 2018

[31] [31]

Mammen and W

E. Mammen and W. Polonik. Confidence regions for level sets. Journal of Multivariate Analysis, 122: 0 202--214, 2013

work page 2013

[32] [32]

Mammen and A

E. Mammen and A. B. Tsybakov. Smooth discrimination analysis. The Annals of Statistics, 27 0 (6): 0 1808--1829, 1999

work page 1999

[33] [33]

S. P. Marso, G. H. Daniels, K. Brown-Frandsen, P. Kristensen, J. F. Mann, M. A. Nauck, S. E. Nissen, S. Pocock, N. R. Poulter, L. S. Ravn, et al. Liraglutide and cardiovascular outcomes in type 2 diabetes. New England Journal of Medicine, 375 0 (4): 0 311--322, 2016

work page 2016

[34] [34]

D. M. Mason and W. Polonik. Asymptotic normality of plug-in level set estimates. Annals of Applied Probability, 2009

work page 2009

[35] [35]

J. L. Montiel Olea and M. Plagborg-M ller. Simultaneous confidence bands: Theory, implementation, and an application to svars. Journal of Applied Econometrics, 34 0 (1): 0 1--17, 2019

work page 2019

[36] [36]

J. Neyman. Sur les applications de la th \'e orie des probabilit \'e s aux experiences agricoles: Essai des principes. Roczniki Nauk Rolniczych, 10 0 (1): 0 1--51, 1923

work page 1923

[37] [37]

Nie and S

X. Nie and S. Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108 0 (2): 0 299--319, 2021

work page 2021

[38] [38]

Pfanzagl and W

J. Pfanzagl and W. Wefelmeyer. Contributions to a general asymptotic statistical theory. Springer, 1982

work page 1982

[39] [39]

W. Polonik. Measuring mass concentrations and estimating density contour clusters-an excess mass approach. The annals of Statistics, pages 855--881, 1995

work page 1995

[40] [40]

Qiao and W

W. Qiao and W. Polonik. Nonparametric confidence regions for level sets: Statistical properties and geometry. Electronic Journal of Statistics, 2019

work page 2019

[41] [41]

H. W. Reeve, T. I. Cannings, and R. J. Samworth. Optimal subgroup selection. The Annals of Statistics, 51 0 (6): 0 2342--2365, 2023

work page 2023

[42] [42]

Rigollet and R

P. Rigollet and R. Vert. Optimal rates for plug-in estimators of density level sets. Bernoulli, 2009

work page 2009

[43] [43]

J. Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling, 7 0 (9-12): 0 1393--1512, 1986

work page 1986

[44] [44]

D. B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66 0 (5): 0 688, 1974

work page 1974

[45] [45]

Semenova and V

V. Semenova and V. Chernozhukov. Debiased machine learning of conditional average treatment effects and other causal functions. The Econometrics Journal, 24 0 (2): 0 264--289, 2021

work page 2021

[46] [46]

A. Shapiro. On the asymptotics of constrained local m-estimators. Annals of statistics, pages 948--960, 2000

work page 2000

[47] [47]

Tibshirani, S

J. Tibshirani, S. Athey, E. S. Sverdrup, and S. Wager. grf: Generalized Random Forests, 2024. URL https://CRAN.R-project.org/package=grf. R package version 2.4.0

work page 2024

[48] [48]

A. B. Tsybakov. On nonparametric estimation of density level sets. The Annals of Statistics, 25 0 (3): 0 948--969, 1997

work page 1997

[49] [49]

M. J. van der Laan and A. R. Luedtke. Targeted learning of an optimal dynamic treatment, and statistical inference for its mean outcome. Technical report, Berkeley Division of Biostatistics Working Paper Series, 2014

work page 2014

[50] [50]

M. J. van der Laan and J. M. Robins. Unified methods for censored longitudinal data and causality. Springer Science & Business Media, 2003

work page 2003

[51] [51]

M. J. van der Laan and S. Rose. Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media, 2011

work page 2011

[52] [52]

M. J. van der Laan, E. C. Polley, and A. E. Hubbard. Super learner. Statistical applications in genetics and molecular biology, 6 0 (1), 2007

work page 2007

[53] [53]

A. W. van der Vaart. On differentiable functionals. The Annals of Statistics, pages 178--204, 1991

work page 1991

[54] [54]

A. W. van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000

work page 2000

[55] [55]

A. W. van der Vaart. Semiparametric statistics. In Lectures on probability theory and statistics (Saint-Flour, 1999), pages 331--457. Springer, 2002

work page 1999

[56] [56]

A. W. van der Vaart and M. J. van der Laan. Estimating a survival distribution with current status data and high-dimensional covariates. The International Journal of Biostatistics, 2 0 (1), 2006

work page 2006

[57] [57]

A. W. van der Vaart and J. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Science & Business Media, 1996

work page 1996

[58] [58]

Wager and S

S. Wager and S. Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113 0 (523): 0 1228--1242, 2018

work page 2018

[59] [59]

Wasserman

L. Wasserman. All of nonparametric statistics. Springer Science & Business Media, 2006

work page 2006

[60] [60]

Westling and M

T. Westling and M. Carone. A unified study of nonparametric inference for monotone functions. Annals of statistics, 48 0 (2): 0 1001, 2020

work page 2020

[61] [61]

M. N. Wright and A. Ziegler. ranger: A fast implementation of random forests for high dimensional data in C++ and R . Journal of Statistical Software, 77 0 (1): 0 1--17, 2017. doi:10.18637/jss.v077.i01

work page doi:10.18637/jss.v077.i01 2017

[62] [62]

Yadlowsky, S

S. Yadlowsky, S. Fleming, N. Shah, E. Brunskill, and S. Wager. Evaluating treatment prioritization rules via rank-weighted average treatment effects. Journal of the American Statistical Association, 120 0 (549): 0 38--51, 2025

work page 2025

[63] [63]

S. C. Ziersen and T. Martinussen. Variable importance measures for heterogeneous treatment effects with survival outcome. Scandinavian Journal of Statistics, 2025

work page 2025

[64] [64]

Zou and T

H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67 0 (2): 0 301--320, 2005

work page 2005