A comparison of methods for model selection when estimating individual treatment effects

Alejandro Schuler; Michael Baiocchi; Nigam Shah; Robert Tibshirani

arxiv: 1804.05146 · v2 · pith:QUI5EDVQnew · submitted 2018-04-14 · 📊 stat.ML · cs.LG

A comparison of methods for model selection when estimating individual treatment effects

Alejandro Schuler , Michael Baiocchi , Robert Tibshirani , Nigam Shah This is my paper

classification 📊 stat.ML cs.LG

keywords treatmentmodelseffectsshouldeffectestimatingotherdata

0 comments

read the original abstract

Practitioners in medicine, business, political science, and other fields are increasingly aware that decisions should be personalized to each patient, customer, or voter. A given treatment (e.g. a drug or advertisement) should be administered only to those who will respond most positively, and certainly not to those who will be harmed by it. Individual-level treatment effects can be estimated with tools adapted from machine learning, but different models can yield contradictory estimates. Unlike risk prediction models, however, treatment effect models cannot be easily evaluated against each other using a held-out test set because the true treatment effect itself is never directly observed. Besides outcome prediction accuracy, several metrics that can leverage held-out data to evaluate treatment effects models have been proposed, but they are not widely used. We provide a didactic framework that elucidates the relationships between the different approaches and compare them all using a variety of simulations of both randomized and observational data. Our results show that researchers estimating heterogenous treatment effects need not limit themselves to a single model-fitting algorithm. Instead of relying on a single method, multiple models fit by a diverse set of algorithms should be evaluated against each other using an objective function learned from the validation set. The model minimizing that objective should be used for estimating the individual treatment effect for future individuals.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation
cs.LG 2026-05 unverdicted novelty 6.0

Counterfactual metrics on semi-simulated benchmarks fail to identify the treatment effect estimators preferred by observable metrics on real datasets, with simple meta-learners outperforming specialized causal models.
Assessing Estimate of CATE from Observational Data via an RCT Study
stat.ME 2026-05 unverdicted novelty 5.0

CAFE assesses the fit of observational CATE estimates by partitioning RCT data via propensity scores and comparing to experimental group averages, with theory and extensions for confounders.