pith. sign in

arxiv: 1907.08127 · v1 · pith:QLYWAC4Inew · submitted 2019-07-18 · 📊 stat.ML · cs.LG

A discriminative approach for finding and characterizing positivity violations using decision trees

Pith reviewed 2026-05-24 19:32 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords positivity violationdecision treescausal inferencecovariate overlapcommon supportrandom foresttreatment homogeneity
0
0 comments X

The pith

Decision trees detect positivity violations by partitioning covariates into regions of maximized treatment homogeneity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how decision trees can locate subpopulations where the positivity assumption fails for causal inference. Positivity requires overlap in treatment assignment across all covariate combinations, but checking this becomes hard in high dimensions. The method grows trees to create leaves where one treatment is nearly absent, flagging those as violation regions. A random forest extension then measures how consistently each region shows the violation. This yields both detection and an interpretable description of the problematic subspaces.

Core claim

By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur.

What carries the argument

Decision tree grown to maximize treatment homogeneity within leaves, augmented by random forest for robustness scoring.

If this is right

  • Scalable detection of positivity violations in high-dimensional covariate spaces.
  • Automatic identification of violating subpopulations with tree-based rules.
  • Quantification of violation robustness using random forest.
  • Visualization of stratification rules and violation severity for each subspace.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be integrated into standard causal analysis pipelines to flag data issues before estimation.
  • Similar tree-based partitioning might help check other causal assumptions like exchangeability in specific strata.
  • Interactive visualizations could support domain experts in deciding whether to exclude or reweight violating regions.

Load-bearing premise

Maximizing treatment homogeneity within decision-tree leaves reliably isolates positivity violations instead of other data structures or artifacts.

What would settle it

A synthetic dataset with an artificially created positivity violation in a known covariate subspace where the decision tree does not produce a leaf with near-perfect treatment separation for that subspace.

Figures

Figures reproduced from arXiv: 1907.08127 by Ehud Karavani, Peter Bak, Yishai Shimoni.

Figure 1
Figure 1. Figure 1: A synthetic example of applying positivity-detection tree. ( [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Similar to Figure 1 but using real data from the NHEFS studying the effect of smoking [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
read the original abstract

The assumption of positivity in causal inference (also known as common support and co-variate overlap) is necessary to obtain valid causal estimates. Therefore, confirming it holds in a given dataset is an important first step of any causal analysis. Most common methods to date are insufficient for discovering non-positivity, as they do not scale for modern high-dimensional covariate spaces, or they cannot pinpoint the subpopulation violating positivity. To overcome these issues, we suggest to harness decision trees for detecting violations. By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur. We provide a visualization of the stratification rules that define each subpopulation, combined with the severity of positivity violation within it. We also provide an interactive version of the visualization that allows a deeper dive into the properties of each subspace.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that decision trees can automatically detect positivity violations in causal inference by partitioning the covariate space into mutually exclusive regions that maximize homogeneity of treatment groups within leaves; augmenting this with a random forest then quantifies the robustness of each detected violation. The approach is presented as scalable to high-dimensional data and interpretable via stratification rules and visualizations (including an interactive version) that characterize the violating subpopulations.

Significance. If the central claim holds, the method would address a practical gap in causal inference by offering a scalable, interpretable alternative to existing positivity checks that struggle with high-dimensional covariates or fail to localize violations. The emphasis on visualization and interactivity is a practical strength for applied work.

major comments (2)
  1. [Abstract / method description] Abstract / method description: the premise that growing trees to maximize treatment-group homogeneity within leaves will isolate regions where positivity is violated (P(T=1|X)=0 or 1) rather than finite-sample imbalances, spurious correlations, or splitting artifacts is stated without derivation, proof, or simulation evidence distinguishing these cases. This assumption is load-bearing because every subsequent step (characterization, robustness scoring via random forest, visualization) inherits the quality of the initial partitions.
  2. [Abstract] Abstract: no empirical results, error analysis, or validation on datasets with known positivity violations are supplied, so it is impossible to assess whether the detected subspaces reflect true support violations or other data patterns. This is load-bearing for the claim of reliable detection.
minor comments (2)
  1. [Title / Abstract] Abstract: the phrase 'discriminative approach' is used in the title but not defined or contrasted with generative alternatives in the provided text.
  2. [Abstract] Abstract: references to standard positivity literature (e.g., common support diagnostics) are absent, making it harder to situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback highlighting the need for stronger justification of the core assumption and empirical validation. We address each major comment below and commit to revisions that directly respond to these concerns.

read point-by-point responses
  1. Referee: [Abstract / method description] the premise that growing trees to maximize treatment-group homogeneity within leaves will isolate regions where positivity is violated (P(T=1|X)=0 or 1) rather than finite-sample imbalances, spurious correlations, or splitting artifacts is stated without derivation, proof, or simulation evidence distinguishing these cases.

    Authors: We agree the manuscript would be strengthened by explicit justification. The motivation is that, in sufficiently large samples, leaves achieving perfect treatment homogeneity must correspond to regions with P(T=1|X) exactly 0 or 1, whereas finite-sample imbalances or spurious splits would not systematically maximize the homogeneity criterion across the tree. However, we acknowledge the absence of a formal derivation or targeted simulations. In revision we will add a dedicated subsection deriving the connection under the positivity violation definition and include simulation experiments that inject known violations versus pure finite-sample imbalance to illustrate the distinction. revision: yes

  2. Referee: [Abstract] Abstract: no empirical results, error analysis, or validation on datasets with known positivity violations are supplied, so it is impossible to assess whether the detected subspaces reflect true support violations or other data patterns.

    Authors: The current version emphasizes the methodological contribution, interpretability, and visualization aspects without including empirical validation. We accept that this limits assessment of reliability. The revision will incorporate a new results section containing (i) simulation studies on data with controlled positivity violations, (ii) error analysis measuring detection accuracy, and (iii) application to at least one real dataset where violations are known or can be induced, with quantitative metrics of performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method applies standard tree partitioning to observed treatment labels

full rationale

The paper proposes using decision trees to partition covariate space by maximizing treatment-group homogeneity, with pure leaves interpreted as positivity violations. This is a direct application of standard classification-tree splitting (e.g., Gini or entropy on the binary treatment indicator) rather than any derivation that reduces a claimed result to its own fitted inputs or self-citations. No equations, parameters, or uniqueness theorems are presented that would make the output equivalent to the input by construction; the partitions are produced by an external algorithm whose correctness does not depend on redefining positivity in terms of the tree output. The subsequent random-forest robustness step is likewise an independent augmentation. The approach is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the domain assumption that positivity violations manifest as regions of extreme treatment homogeneity and that decision-tree splitting will isolate them; no free parameters or invented entities are named in the abstract.

axioms (2)
  • domain assumption Positivity (common support) is required for valid causal estimates and can be checked by examining treatment assignment overlap in covariate space.
    Stated in the first sentence of the abstract as the motivation for the method.
  • domain assumption Decision trees that maximize treatment homogeneity within leaves will surface subspaces where positivity fails.
    Central modeling choice described in the abstract's method paragraph.

pith-pipeline@v0.9.0 · 5718 in / 1408 out tokens · 18582 ms · 2026-05-24T19:32:42.208418+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

  1. [1]

    Causal inference

    Miguel A Hernan and James M Robins. Causal inference. CRC Boca Raton, FL:, 2010

  2. [2]

    Invited commentary: positivity in practice

    Daniel Westreich and Stephen R Cole. Invited commentary: positivity in practice. American journal of epidemiology , 171(6):674–677, 2010. 10

  3. [3]

    A nonparametric two-sample test applicable to high dimensional data

    Munmun Biswas and Anil K Ghosh. A nonparametric two-sample test applicable to high dimensional data. Journal of Multivariate Analysis , 123:160–171, 2014

  4. [4]

    Graphical displays for assessing covariate balance in matching studies

    Ariel Linden. Graphical displays for assessing covariate balance in matching studies. Journal of evaluation in clinical practice , 21(2):242–247, 2015

  5. [5]

    Effects of socioeconomic and racial residential segregation on preterm birth: a cautionary tale of structural confounding

    Lynne C Messer, J Michael Oakes, and Susan Mason. Effects of socioeconomic and racial residential segregation on preterm birth: a cautionary tale of structural confounding. American journal of epidemiology , 171(6):664–673, 2010

  6. [6]

    The central role of the propensity score in observa- tional studies for causal effects

    Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observa- tional studies for causal effects. Biometrika, 70(1):41–55, 1983

  7. [7]

    Relative information loss in the pca

    Bernhard C Geiger and Gernot Kubin. Relative information loss in the pca. In Information Theory Workshop (ITW), 2012 IEEE , pages 562–566. IEEE, 2012

  8. [8]

    Testing statistical hypotheses

    Erich L Lehmann and Joseph P Romano. Testing statistical hypotheses. Springer Science & Business Media, 2006

  9. [9]

    Revisiting Classifier Two-Sample Tests

    David Lopez-Paz and Maxime Oquab. Revisiting classifier two-sample tests. arXiv preprint arXiv:1610.06545, 2016

  10. [10]

    Adversarial Bal- ancing for Causal Inference

    Michal Ozery-Flato, Pierre Thodoroff, and Tal El-Hay. Adversarial balancing for causal infer- ence. arXiv preprint arXiv:1810.07406 , 2018

  11. [11]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

  12. [12]

    Ross Quinlan

    J. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986

  13. [13]

    Decision trees

    Lior Rokach and Oded Maimon. Decision trees. In Data mining and knowledge discovery handbook, pages 165–192. Springer, 2005

  14. [14]

    Data mining with decision trees: theory and applications , volume 69

    Lior Rokach and Oded Maimon. Data mining with decision trees: theory and applications , volume 69. World scientific, 2008

  15. [15]

    A study of cross-validation and bootstrap for accuracy estimation and model selection

    Ron Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, volume 14, pages 1137–1145. Montreal, Canada, 1995

  16. [16]

    Classification and regression trees

    Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, 1984

  17. [17]

    NHANES I Epidemiologic Followup Study (NHEFS), 1992

    National Center for Health Statistics. NHANES I Epidemiologic Followup Study (NHEFS), 1992

  18. [18]

    Visualizing data using t-sne

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008

  19. [19]

    Scikit- learn: Machine learning in python

    Fabian Pedregosa, Ga¨ el Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit- learn: Machine learning in python. Journal of machine learning research , 12(Oct):2825–2830, 2011. 11

  20. [20]

    J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In Science & Engineering , 9(3):90–95, 2007. doi: 10.1109/MCSE.2007.55

  21. [21]

    Bokeh: Python library for interactive visualization , 2014

    Bokeh Development Team. Bokeh: Python library for interactive visualization , 2014. URL http://www.bokeh.pydata.org

  22. [22]

    A review on evaluation metrics for data classification evaluations

    M Hossin and MN Sulaiman. A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process , 5(2):1, 2015. 12 Supplementary figures Figure S1: A snapshot of the exact same plot as Figure 1B, only its interactive version, where a box with additional information is shown upon hovering over...