A discriminative approach for finding and characterizing positivity violations using decision trees

Ehud Karavani; Peter Bak; Yishai Shimoni

arxiv: 1907.08127 · v1 · pith:QLYWAC4Inew · submitted 2019-07-18 · 📊 stat.ML · cs.LG

A discriminative approach for finding and characterizing positivity violations using decision trees

Ehud Karavani , Peter Bak , Yishai Shimoni This is my paper

Pith reviewed 2026-05-24 19:32 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords positivity violationdecision treescausal inferencecovariate overlapcommon supportrandom foresttreatment homogeneity

0 comments

The pith

Decision trees detect positivity violations by partitioning covariates into regions of maximized treatment homogeneity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how decision trees can locate subpopulations where the positivity assumption fails for causal inference. Positivity requires overlap in treatment assignment across all covariate combinations, but checking this becomes hard in high dimensions. The method grows trees to create leaves where one treatment is nearly absent, flagging those as violation regions. A random forest extension then measures how consistently each region shows the violation. This yields both detection and an interpretable description of the problematic subspaces.

Core claim

By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur.

What carries the argument

Decision tree grown to maximize treatment homogeneity within leaves, augmented by random forest for robustness scoring.

If this is right

Scalable detection of positivity violations in high-dimensional covariate spaces.
Automatic identification of violating subpopulations with tree-based rules.
Quantification of violation robustness using random forest.
Visualization of stratification rules and violation severity for each subspace.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be integrated into standard causal analysis pipelines to flag data issues before estimation.
Similar tree-based partitioning might help check other causal assumptions like exchangeability in specific strata.
Interactive visualizations could support domain experts in deciding whether to exclude or reweight violating regions.

Load-bearing premise

Maximizing treatment homogeneity within decision-tree leaves reliably isolates positivity violations instead of other data structures or artifacts.

What would settle it

A synthetic dataset with an artificially created positivity violation in a known covariate subspace where the decision tree does not produce a leaf with near-perfect treatment separation for that subspace.

Figures

Figures reproduced from arXiv: 1907.08127 by Ehud Karavani, Peter Bak, Yishai Shimoni.

**Figure 2.** Figure 2: Similar to Figure 1 but using real data from the NHEFS studying the effect of smoking [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

The assumption of positivity in causal inference (also known as common support and co-variate overlap) is necessary to obtain valid causal estimates. Therefore, confirming it holds in a given dataset is an important first step of any causal analysis. Most common methods to date are insufficient for discovering non-positivity, as they do not scale for modern high-dimensional covariate spaces, or they cannot pinpoint the subpopulation violating positivity. To overcome these issues, we suggest to harness decision trees for detecting violations. By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur. We provide a visualization of the stratification rules that define each subpopulation, combined with the severity of positivity violation within it. We also provide an interactive version of the visualization that allows a deeper dive into the properties of each subspace.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Decision-tree method flags positivity violations via treatment homogeneity but the partitions may reflect sampling artifacts instead.

read the letter

The paper's core idea is to grow decision trees that split on covariates to maximize homogeneity of treatment within leaves, thereby surfacing subspaces where positivity is violated, then layer on a random forest to score robustness of those findings and output visualizations of the rules. This is new as a combined framing for the diagnostic task. It handles the scaling problem in high-dimensional data that standard overlap checks struggle with and gives interpretable subpopulations plus an interactive view, which is genuinely useful for applied causal work. The approach rests on standard tree splitting rather than circular fitting, so that part is clean. The soft spot is the load-bearing premise that maximized treatment homogeneity in leaves will isolate true positivity violations rather than finite-sample imbalances, spurious correlations, or splitting artifacts. The abstract states the claim but shows no derivation, simulations, or real-data checks to confirm the partitions are reliable, so the stress-test concern lands. Without those results it is difficult to know how often the method would flag noise. This is for causal inference practitioners who need practical diagnostics on large observational datasets. It shows straightforward engagement with the positivity problem. Send it to peer review so the experiments and any validation can be examined.

Referee Report

2 major / 2 minor

Summary. The paper claims that decision trees can automatically detect positivity violations in causal inference by partitioning the covariate space into mutually exclusive regions that maximize homogeneity of treatment groups within leaves; augmenting this with a random forest then quantifies the robustness of each detected violation. The approach is presented as scalable to high-dimensional data and interpretable via stratification rules and visualizations (including an interactive version) that characterize the violating subpopulations.

Significance. If the central claim holds, the method would address a practical gap in causal inference by offering a scalable, interpretable alternative to existing positivity checks that struggle with high-dimensional covariates or fail to localize violations. The emphasis on visualization and interactivity is a practical strength for applied work.

major comments (2)

[Abstract / method description] Abstract / method description: the premise that growing trees to maximize treatment-group homogeneity within leaves will isolate regions where positivity is violated (P(T=1|X)=0 or 1) rather than finite-sample imbalances, spurious correlations, or splitting artifacts is stated without derivation, proof, or simulation evidence distinguishing these cases. This assumption is load-bearing because every subsequent step (characterization, robustness scoring via random forest, visualization) inherits the quality of the initial partitions.
[Abstract] Abstract: no empirical results, error analysis, or validation on datasets with known positivity violations are supplied, so it is impossible to assess whether the detected subspaces reflect true support violations or other data patterns. This is load-bearing for the claim of reliable detection.

minor comments (2)

[Title / Abstract] Abstract: the phrase 'discriminative approach' is used in the title but not defined or contrasted with generative alternatives in the provided text.
[Abstract] Abstract: references to standard positivity literature (e.g., common support diagnostics) are absent, making it harder to situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback highlighting the need for stronger justification of the core assumption and empirical validation. We address each major comment below and commit to revisions that directly respond to these concerns.

read point-by-point responses

Referee: [Abstract / method description] the premise that growing trees to maximize treatment-group homogeneity within leaves will isolate regions where positivity is violated (P(T=1|X)=0 or 1) rather than finite-sample imbalances, spurious correlations, or splitting artifacts is stated without derivation, proof, or simulation evidence distinguishing these cases.

Authors: We agree the manuscript would be strengthened by explicit justification. The motivation is that, in sufficiently large samples, leaves achieving perfect treatment homogeneity must correspond to regions with P(T=1|X) exactly 0 or 1, whereas finite-sample imbalances or spurious splits would not systematically maximize the homogeneity criterion across the tree. However, we acknowledge the absence of a formal derivation or targeted simulations. In revision we will add a dedicated subsection deriving the connection under the positivity violation definition and include simulation experiments that inject known violations versus pure finite-sample imbalance to illustrate the distinction. revision: yes
Referee: [Abstract] Abstract: no empirical results, error analysis, or validation on datasets with known positivity violations are supplied, so it is impossible to assess whether the detected subspaces reflect true support violations or other data patterns.

Authors: The current version emphasizes the methodological contribution, interpretability, and visualization aspects without including empirical validation. We accept that this limits assessment of reliability. The revision will incorporate a new results section containing (i) simulation studies on data with controlled positivity violations, (ii) error analysis measuring detection accuracy, and (iii) application to at least one real dataset where violations are known or can be induced, with quantitative metrics of performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method applies standard tree partitioning to observed treatment labels

full rationale

The paper proposes using decision trees to partition covariate space by maximizing treatment-group homogeneity, with pure leaves interpreted as positivity violations. This is a direct application of standard classification-tree splitting (e.g., Gini or entropy on the binary treatment indicator) rather than any derivation that reduces a claimed result to its own fitted inputs or self-citations. No equations, parameters, or uniqueness theorems are presented that would make the output equivalent to the input by construction; the partitions are produced by an external algorithm whose correctness does not depend on redefining positivity in terms of the tree output. The subsequent random-forest robustness step is likewise an independent augmentation. The approach is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the domain assumption that positivity violations manifest as regions of extreme treatment homogeneity and that decision-tree splitting will isolate them; no free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption Positivity (common support) is required for valid causal estimates and can be checked by examining treatment assignment overlap in covariate space.
Stated in the first sentence of the abstract as the motivation for the method.
domain assumption Decision trees that maximize treatment homogeneity within leaves will surface subspaces where positivity fails.
Central modeling choice described in the abstract's method paragraph.

pith-pipeline@v0.9.0 · 5718 in / 1408 out tokens · 18582 ms · 2026-05-24T19:32:42.208418+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

Causal inference

Miguel A Hernan and James M Robins. Causal inference. CRC Boca Raton, FL:, 2010

work page 2010
[2]

Invited commentary: positivity in practice

Daniel Westreich and Stephen R Cole. Invited commentary: positivity in practice. American journal of epidemiology , 171(6):674–677, 2010. 10

work page 2010
[3]

A nonparametric two-sample test applicable to high dimensional data

Munmun Biswas and Anil K Ghosh. A nonparametric two-sample test applicable to high dimensional data. Journal of Multivariate Analysis , 123:160–171, 2014

work page 2014
[4]

Graphical displays for assessing covariate balance in matching studies

Ariel Linden. Graphical displays for assessing covariate balance in matching studies. Journal of evaluation in clinical practice , 21(2):242–247, 2015

work page 2015
[5]

Eﬀects of socioeconomic and racial residential segregation on preterm birth: a cautionary tale of structural confounding

Lynne C Messer, J Michael Oakes, and Susan Mason. Eﬀects of socioeconomic and racial residential segregation on preterm birth: a cautionary tale of structural confounding. American journal of epidemiology , 171(6):664–673, 2010

work page 2010
[6]

The central role of the propensity score in observa- tional studies for causal eﬀects

Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observa- tional studies for causal eﬀects. Biometrika, 70(1):41–55, 1983

work page 1983
[7]

Relative information loss in the pca

Bernhard C Geiger and Gernot Kubin. Relative information loss in the pca. In Information Theory Workshop (ITW), 2012 IEEE , pages 562–566. IEEE, 2012

work page 2012
[8]

Testing statistical hypotheses

Erich L Lehmann and Joseph P Romano. Testing statistical hypotheses. Springer Science & Business Media, 2006

work page 2006
[9]

Revisiting Classifier Two-Sample Tests

David Lopez-Paz and Maxime Oquab. Revisiting classiﬁer two-sample tests. arXiv preprint arXiv:1610.06545, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[10]

Adversarial Bal- ancing for Causal Inference

Michal Ozery-Flato, Pierre Thodoroﬀ, and Tal El-Hay. Adversarial balancing for causal infer- ence. arXiv preprint arXiv:1810.07406 , 2018

work page arXiv 2018
[11]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

work page 2014
[12]

Ross Quinlan

J. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986

work page 1986
[13]

Decision trees

Lior Rokach and Oded Maimon. Decision trees. In Data mining and knowledge discovery handbook, pages 165–192. Springer, 2005

work page 2005
[14]

Data mining with decision trees: theory and applications , volume 69

Lior Rokach and Oded Maimon. Data mining with decision trees: theory and applications , volume 69. World scientiﬁc, 2008

work page 2008
[15]

A study of cross-validation and bootstrap for accuracy estimation and model selection

Ron Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, volume 14, pages 1137–1145. Montreal, Canada, 1995

work page 1995
[16]

Classiﬁcation and regression trees

Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. Classiﬁcation and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, 1984

work page 1984
[17]

NHANES I Epidemiologic Followup Study (NHEFS), 1992

National Center for Health Statistics. NHANES I Epidemiologic Followup Study (NHEFS), 1992

work page 1992
[18]

Visualizing data using t-sne

Laurens van der Maaten and Geoﬀrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008

work page 2008
[19]

Scikit- learn: Machine learning in python

Fabian Pedregosa, Ga¨ el Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit- learn: Machine learning in python. Journal of machine learning research , 12(Oct):2825–2830, 2011. 11

work page 2011
[20]

J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In Science & Engineering , 9(3):90–95, 2007. doi: 10.1109/MCSE.2007.55

work page doi:10.1109/mcse.2007.55 2007
[21]

Bokeh: Python library for interactive visualization , 2014

Bokeh Development Team. Bokeh: Python library for interactive visualization , 2014. URL http://www.bokeh.pydata.org

work page 2014
[22]

A review on evaluation metrics for data classiﬁcation evaluations

M Hossin and MN Sulaiman. A review on evaluation metrics for data classiﬁcation evaluations. International Journal of Data Mining & Knowledge Management Process , 5(2):1, 2015. 12 Supplementary ﬁgures Figure S1: A snapshot of the exact same plot as Figure 1B, only its interactive version, where a box with additional information is shown upon hovering over...

work page 2015

[1] [1]

Causal inference

Miguel A Hernan and James M Robins. Causal inference. CRC Boca Raton, FL:, 2010

work page 2010

[2] [2]

Invited commentary: positivity in practice

Daniel Westreich and Stephen R Cole. Invited commentary: positivity in practice. American journal of epidemiology , 171(6):674–677, 2010. 10

work page 2010

[3] [3]

A nonparametric two-sample test applicable to high dimensional data

Munmun Biswas and Anil K Ghosh. A nonparametric two-sample test applicable to high dimensional data. Journal of Multivariate Analysis , 123:160–171, 2014

work page 2014

[4] [4]

Graphical displays for assessing covariate balance in matching studies

Ariel Linden. Graphical displays for assessing covariate balance in matching studies. Journal of evaluation in clinical practice , 21(2):242–247, 2015

work page 2015

[5] [5]

Eﬀects of socioeconomic and racial residential segregation on preterm birth: a cautionary tale of structural confounding

Lynne C Messer, J Michael Oakes, and Susan Mason. Eﬀects of socioeconomic and racial residential segregation on preterm birth: a cautionary tale of structural confounding. American journal of epidemiology , 171(6):664–673, 2010

work page 2010

[6] [6]

The central role of the propensity score in observa- tional studies for causal eﬀects

Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observa- tional studies for causal eﬀects. Biometrika, 70(1):41–55, 1983

work page 1983

[7] [7]

Relative information loss in the pca

Bernhard C Geiger and Gernot Kubin. Relative information loss in the pca. In Information Theory Workshop (ITW), 2012 IEEE , pages 562–566. IEEE, 2012

work page 2012

[8] [8]

Testing statistical hypotheses

Erich L Lehmann and Joseph P Romano. Testing statistical hypotheses. Springer Science & Business Media, 2006

work page 2006

[9] [9]

Revisiting Classifier Two-Sample Tests

David Lopez-Paz and Maxime Oquab. Revisiting classiﬁer two-sample tests. arXiv preprint arXiv:1610.06545, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[10] [10]

Adversarial Bal- ancing for Causal Inference

Michal Ozery-Flato, Pierre Thodoroﬀ, and Tal El-Hay. Adversarial balancing for causal infer- ence. arXiv preprint arXiv:1810.07406 , 2018

work page arXiv 2018

[11] [11]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

work page 2014

[12] [12]

Ross Quinlan

J. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986

work page 1986

[13] [13]

Decision trees

Lior Rokach and Oded Maimon. Decision trees. In Data mining and knowledge discovery handbook, pages 165–192. Springer, 2005

work page 2005

[14] [14]

Data mining with decision trees: theory and applications , volume 69

Lior Rokach and Oded Maimon. Data mining with decision trees: theory and applications , volume 69. World scientiﬁc, 2008

work page 2008

[15] [15]

A study of cross-validation and bootstrap for accuracy estimation and model selection

Ron Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, volume 14, pages 1137–1145. Montreal, Canada, 1995

work page 1995

[16] [16]

Classiﬁcation and regression trees

Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. Classiﬁcation and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, 1984

work page 1984

[17] [17]

NHANES I Epidemiologic Followup Study (NHEFS), 1992

National Center for Health Statistics. NHANES I Epidemiologic Followup Study (NHEFS), 1992

work page 1992

[18] [18]

Visualizing data using t-sne

Laurens van der Maaten and Geoﬀrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008

work page 2008

[19] [19]

Scikit- learn: Machine learning in python

Fabian Pedregosa, Ga¨ el Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit- learn: Machine learning in python. Journal of machine learning research , 12(Oct):2825–2830, 2011. 11

work page 2011

[20] [20]

J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In Science & Engineering , 9(3):90–95, 2007. doi: 10.1109/MCSE.2007.55

work page doi:10.1109/mcse.2007.55 2007

[21] [21]

Bokeh: Python library for interactive visualization , 2014

Bokeh Development Team. Bokeh: Python library for interactive visualization , 2014. URL http://www.bokeh.pydata.org

work page 2014

[22] [22]

A review on evaluation metrics for data classiﬁcation evaluations

M Hossin and MN Sulaiman. A review on evaluation metrics for data classiﬁcation evaluations. International Journal of Data Mining & Knowledge Management Process , 5(2):1, 2015. 12 Supplementary ﬁgures Figure S1: A snapshot of the exact same plot as Figure 1B, only its interactive version, where a box with additional information is shown upon hovering over...

work page 2015