Quantifying Error in the Presence of Confounders for Causal Inference

Amit Sharma; Rathin Desai

arxiv: 1907.04805 · v1 · pith:CERQIXVInew · submitted 2019-07-10 · 💻 cs.LG · stat.ME· stat.ML

Quantifying Error in the Presence of Confounders for Causal Inference

Rathin Desai , Amit Sharma This is my paper

Pith reviewed 2026-05-24 23:42 UTC · model grok-4.3

classification 💻 cs.LG stat.MEstat.ML

keywords causal inferenceaverage causal effecterror boundsrepresentation learningrobust statisticsHuber contaminationinverse propensity weightingconfounding

0 comments

The pith

Causal inference methods receive general error bounds by recasting them as representation learning or weighting algorithms, even under unobserved confounding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to let users evaluate how far any causal estimator strays from the true average causal effect by turning the estimation problem into one of representation learning. Standard techniques such as back-door adjustment and inverse propensity weighting are shown to fit this framing, which yields uniform bounds on their error. The same approach extends the bounds to settings with hidden confounders by treating the confounding as Huber contamination from robust statistics. The bounds are constructed so they can be estimated from data, illustrated by using them to choose a clipping threshold for propensity scores.

Core claim

By framing causal inference as a representation learning problem, many popular methods including back-door adjustments can be viewed as weighting or representation algorithms for which general error bounds against the true average causal effect can be derived. These bounds are further extended to unobserved confounding by modeling it as contamination under the Huber model, and the resulting bounds remain estimable from observed data.

What carries the argument

Representation learning formulation of causal estimators that supports derivation of error bounds on deviation from true ACE, extended by Huber contamination model for unobserved confounders.

If this is right

Back-door and related methods admit error bounds derived from their representation or weighting form.
The bounds quantify loss against true ACE and can be used to select among methods or tune hyperparameters for a given dataset.
The bounds remain valid when unobserved confounding is present if modeled as Huber contamination.
Bounds can be estimated empirically, enabling data-driven choices such as clipping thresholds in IPW.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New estimators could be constructed by directly minimizing the proposed bounds rather than using existing procedures.
The representation learning perspective may link causal inference to similar bounding techniques in domain adaptation.
On real data without ground truth, the bounds could serve as a practical criterion for comparing competing estimators.

Load-bearing premise

That causal methods can be cast uniformly as representation or weighting algorithms in a way that permits non-vacuous bounds on error to the true ACE, and that unobserved confounding can be modeled as Huber contamination without invalidating those bounds.

What would settle it

A dataset with known true ACE from a randomized trial where the actual estimation error of a method such as IPW exceeds the derived bound, or where the Huber-extended bound fails to cover observed error under controlled confounding.

Figures

Figures reproduced from arXiv: 1907.04805 by Amit Sharma, Rathin Desai.

**Figure 2.** Figure 2: L1-error bound and IPW estimate for different levels of confounding by W. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Choosing the clipping threshold for IPW propensity that minimizes L1-error bound. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Estimating average causal effect (ACE) is useful whenever we want to know the effect of an intervention on a given outcome. In the absence of a randomized experiment, many methods such as stratification and inverse propensity weighting have been proposed to estimate ACE. However, it is hard to know which method is optimal for a given dataset or which hyperparameters to use for a chosen method. To this end, we provide a framework to characterize the loss of a causal inference method against the true ACE, by framing causal inference as a representation learning problem. We show that many popular methods, including back-door methods can be considered as weighting or representation learning algorithms, and provide general error bounds for their causal estimates. In addition, we consider the case when unobserved variables can confound the causal estimate and extend proposed bounds using principles of robust statistics, considering confounding as contamination under the Huber contamination model. These bounds are also estimable; as an example, we provide empirical bounds for the Inverse Propensity Weighting (IPW) estimator and show how the bounds can be used to optimize the threshold of clipping extreme propensity scores. Our work provides a new way to reason about competing estimators, and opens up the potential of deriving new methods by minimizing the proposed error bounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper recasts causal estimators as representation learners to derive error bounds and extends them to Huber contamination for unobserved confounding, with a usable IPW example, but the practical tightness of those bounds is the key open question.

read the letter

The central contribution is a representation-learning framing that lets the authors write general error bounds for methods like back-door adjustment and IPW, plus an extension that treats unobserved confounding as Huber contamination so the same style of bound still applies. They also show the bounds can be estimated from data and use that to pick a clipping threshold for IPW propensities. That last part is the most concrete piece: it turns the theory into a tuning device without needing the true ACE.

Referee Report

0 major / 2 minor

Summary. The paper frames causal inference methods (including back-door adjustments) as representation learning or weighting algorithms and derives general error bounds on their deviation from the true average causal effect (ACE). It extends the bounds to unobserved confounding by treating it as Huber contamination and shows the bounds are estimable, with a concrete demonstration on the IPW estimator for choosing a propensity-score clipping threshold.

Significance. If the derivations are rigorous and the bounds non-vacuous, the framework supplies a unified, potentially falsifiable way to compare estimators and tune hyperparameters by minimizing the proposed bounds. The link to representation learning plus the robust-statistics treatment of confounding is a substantive conceptual contribution that could influence both method selection and the design of new estimators.

minor comments (2)

The abstract is dense and would benefit from a single sentence that explicitly states the key technical assumptions required for the representation-learning reduction to hold for back-door methods.
In the IPW example, the manuscript should clarify whether the bound estimation uses held-out data or cross-validation to avoid any appearance of circularity with the propensity-score fit.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their thoughtful summary of our work, the positive assessment of its potential contributions, and the recommendation of minor revision. No specific major comments were enumerated in the report, so we have no points requiring point-by-point rebuttal at this stage. We remain available to address any minor revisions the editor or referee may suggest.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reframes causal estimators (including back-door methods) as representation learning or weighting procedures and derives general error bounds w.r.t. the true ACE; it then extends the bounds to unobserved confounding via the Huber contamination model and states that the resulting bounds are estimable (with an IPW clipping example). No quoted equation or step reduces a claimed prediction or bound to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation chain. The central derivation is presented as independent mathematical content that remains falsifiable against external ACE values. The reader's noted concern about data reuse in bound estimation is a methodological caveat but does not constitute a definitional or self-referential reduction of the claimed results. This is therefore scored as a normal non-circular outcome.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the unproven premise that causal methods admit a uniform representation-learning characterization sufficient for general bounds, plus the modeling choice of Huber contamination for confounding; no free parameters or new entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Causal inference methods (including back-door) can be cast as representation learning or weighting algorithms permitting general error bounds
Invoked in the first paragraph of the abstract as the basis for the framework.
domain assumption Unobserved confounding can be treated as contamination under the Huber model without breaking the bound derivations
Stated when extending the bounds to the confounding case.

pith-pipeline@v0.9.0 · 5749 in / 1306 out tokens · 19029 ms · 2026-05-24T23:42:11.936275+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 2 internal anchors

[1]

J., H UANG , J., S CHMITTFULL , M., B ORGWARDT , K

GRETTON , A., S MOLA , A. J., H UANG , J., S CHMITTFULL , M., B ORGWARDT , K. M., AND SCHÖLKOPF , B. Covariate shift by kernel mean matching

work page
[2]

Estimating the variance of the horvitz-thompson estimator

H ENDERSON , T., A NAKOTTA , T., ET AL . Estimating the variance of the horvitz-thompson estimator

work page
[3]

Causal inference book, 2015

H ERNÁN , M., AND ROBINS , J. Causal inference book, 2015

work page 2015
[4]

HUBER , P. J. Robust estimation of a location parameter. In Breakthroughs in statistics. Springer, 1992, pp. 492–518

work page 1992
[5]

Learning representations for counterfactual inference

JOHANSSON , F., S HALIT , U., AND SONTAG , D. Learning representations for counterfactual inference. In International Conference on Machine Learning (2016), pp. 3020–3029

work page 2016
[6]

Learning Weighted Representations for Generalization Across Designs

JOHANSSON , F. D., K ALLUS , N., S HALIT , U., AND SONTAG , D. Learning weighted representations for generalization across designs. arXiv preprint arXiv:1802.08598 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Generalized Optimal Matching Methods for Causal Inference

KALLUS , N. Generalized optimal matching methods for causal inference.arXiv preprint arXiv:1612.08321 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[8]

A., R AO, A

LAI, K. A., R AO, A. B., AND VEMPALA , S. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on (2016), IEEE, pp. 665–674

work page 2016
[9]

K., L ESSLER , J., AND STUART, E

LEE, B. K., L ESSLER , J., AND STUART, E. A. Weight trimming and propensity score weighting. PloS one 6, 3 (2011), e18174

work page 2011
[10]

K., AND DAVIDIAN , M

LUNCEFORD , J. K., AND DAVIDIAN , M. Stratiﬁcation and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in medicine 23, 19 (2004), 2937–2960

work page 2004
[11]

Domain adaptation: Learning bounds and algorithms

MANSOUR , Y., M OHRI , M., AND ROSTAMIZADEH , A. Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430 (2009)

work page arXiv 2009
[12]

L., AND WINSHIP , C

MORGAN , S. L., AND WINSHIP , C. Counterfactuals and causal inference. Cambridge University Press, 2015

work page 2015
[13]

Causality

P EARL , J. Causality. Cambridge university press, 2009

work page 2009
[14]

Concentration of measure inequalities in information theory, com- munications, and coding

RAGINSKY , M., S ASON , I., ET AL . Concentration of measure inequalities in information theory, com- munications, and coding. Foundations and Trends R⃝ in Communications and Information Theory 10, 1-2 (2013), 1–246

work page 2013
[15]

R OSENBAUM , P. R. Observational studies. In Observational studies. Springer, 2002, pp. 1–17

work page 2002
[16]

R., AND RUBIN , D

ROSENBAUM , P. R., AND RUBIN , D. B. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41–55

work page 1983
[17]

B., AND THOMAS , N

RUBIN , D. B., AND THOMAS , N. Matching using estimated propensity scores: relating theory to practice. Biometrics (1996), 249–264

work page 1996
[18]

R., L AUPACIS , A., H UX, J

SHAH , B. R., L AUPACIS , A., H UX, J. E., AND AUSTIN , P. C. Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. Journal of clinical epidemiology 58, 6 (2005), 550–559

work page 2005
[19]

D., AND SONTAG , D

SHALIT , U., J OHANSSON , F. D., AND SONTAG , D. Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (2017), JMLR. org, pp. 3076–3085. 9 9 Supplementary Materials A Proof of Theorem 4.4 Deﬁnition A.1. (Weighted L1 Distance) Assume R and P are distri...

work page 2017

[1] [1]

J., H UANG , J., S CHMITTFULL , M., B ORGWARDT , K

GRETTON , A., S MOLA , A. J., H UANG , J., S CHMITTFULL , M., B ORGWARDT , K. M., AND SCHÖLKOPF , B. Covariate shift by kernel mean matching

work page

[2] [2]

Estimating the variance of the horvitz-thompson estimator

H ENDERSON , T., A NAKOTTA , T., ET AL . Estimating the variance of the horvitz-thompson estimator

work page

[3] [3]

Causal inference book, 2015

H ERNÁN , M., AND ROBINS , J. Causal inference book, 2015

work page 2015

[4] [4]

HUBER , P. J. Robust estimation of a location parameter. In Breakthroughs in statistics. Springer, 1992, pp. 492–518

work page 1992

[5] [5]

Learning representations for counterfactual inference

JOHANSSON , F., S HALIT , U., AND SONTAG , D. Learning representations for counterfactual inference. In International Conference on Machine Learning (2016), pp. 3020–3029

work page 2016

[6] [6]

Learning Weighted Representations for Generalization Across Designs

JOHANSSON , F. D., K ALLUS , N., S HALIT , U., AND SONTAG , D. Learning weighted representations for generalization across designs. arXiv preprint arXiv:1802.08598 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Generalized Optimal Matching Methods for Causal Inference

KALLUS , N. Generalized optimal matching methods for causal inference.arXiv preprint arXiv:1612.08321 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[8] [8]

A., R AO, A

LAI, K. A., R AO, A. B., AND VEMPALA , S. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on (2016), IEEE, pp. 665–674

work page 2016

[9] [9]

K., L ESSLER , J., AND STUART, E

LEE, B. K., L ESSLER , J., AND STUART, E. A. Weight trimming and propensity score weighting. PloS one 6, 3 (2011), e18174

work page 2011

[10] [10]

K., AND DAVIDIAN , M

LUNCEFORD , J. K., AND DAVIDIAN , M. Stratiﬁcation and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in medicine 23, 19 (2004), 2937–2960

work page 2004

[11] [11]

Domain adaptation: Learning bounds and algorithms

MANSOUR , Y., M OHRI , M., AND ROSTAMIZADEH , A. Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430 (2009)

work page arXiv 2009

[12] [12]

L., AND WINSHIP , C

MORGAN , S. L., AND WINSHIP , C. Counterfactuals and causal inference. Cambridge University Press, 2015

work page 2015

[13] [13]

Causality

P EARL , J. Causality. Cambridge university press, 2009

work page 2009

[14] [14]

Concentration of measure inequalities in information theory, com- munications, and coding

RAGINSKY , M., S ASON , I., ET AL . Concentration of measure inequalities in information theory, com- munications, and coding. Foundations and Trends R⃝ in Communications and Information Theory 10, 1-2 (2013), 1–246

work page 2013

[15] [15]

R OSENBAUM , P. R. Observational studies. In Observational studies. Springer, 2002, pp. 1–17

work page 2002

[16] [16]

R., AND RUBIN , D

ROSENBAUM , P. R., AND RUBIN , D. B. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41–55

work page 1983

[17] [17]

B., AND THOMAS , N

RUBIN , D. B., AND THOMAS , N. Matching using estimated propensity scores: relating theory to practice. Biometrics (1996), 249–264

work page 1996

[18] [18]

R., L AUPACIS , A., H UX, J

SHAH , B. R., L AUPACIS , A., H UX, J. E., AND AUSTIN , P. C. Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. Journal of clinical epidemiology 58, 6 (2005), 550–559

work page 2005

[19] [19]

D., AND SONTAG , D

SHALIT , U., J OHANSSON , F. D., AND SONTAG , D. Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (2017), JMLR. org, pp. 3076–3085. 9 9 Supplementary Materials A Proof of Theorem 4.4 Deﬁnition A.1. (Weighted L1 Distance) Assume R and P are distri...

work page 2017