pith. sign in

arxiv: 1804.05146 · v2 · pith:QUI5EDVQnew · submitted 2018-04-14 · 📊 stat.ML · cs.LG

A comparison of methods for model selection when estimating individual treatment effects

classification 📊 stat.ML cs.LG
keywords treatmentmodelseffectsshouldeffectestimatingotherdata
0
0 comments X
read the original abstract

Practitioners in medicine, business, political science, and other fields are increasingly aware that decisions should be personalized to each patient, customer, or voter. A given treatment (e.g. a drug or advertisement) should be administered only to those who will respond most positively, and certainly not to those who will be harmed by it. Individual-level treatment effects can be estimated with tools adapted from machine learning, but different models can yield contradictory estimates. Unlike risk prediction models, however, treatment effect models cannot be easily evaluated against each other using a held-out test set because the true treatment effect itself is never directly observed. Besides outcome prediction accuracy, several metrics that can leverage held-out data to evaluate treatment effects models have been proposed, but they are not widely used. We provide a didactic framework that elucidates the relationships between the different approaches and compare them all using a variety of simulations of both randomized and observational data. Our results show that researchers estimating heterogenous treatment effects need not limit themselves to a single model-fitting algorithm. Instead of relying on a single method, multiple models fit by a diverse set of algorithms should be evaluated against each other using an objective function learned from the validation set. The model minimizing that objective should be used for estimating the individual treatment effect for future individuals.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation

    cs.LG 2026-05 unverdicted novelty 6.0

    Counterfactual metrics on semi-simulated benchmarks fail to identify the treatment effect estimators preferred by observable metrics on real datasets, with simple meta-learners outperforming specialized causal models.

  2. Assessing Estimate of CATE from Observational Data via an RCT Study

    stat.ME 2026-05 unverdicted novelty 5.0

    CAFE assesses the fit of observational CATE estimates by partitioning RCT data via propensity scores and comparing to experimental group averages, with theory and extensions for confounders.