A discriminative approach for finding and characterizing positivity violations using decision trees
Pith reviewed 2026-05-24 19:32 UTC · model grok-4.3
The pith
Decision trees detect positivity violations by partitioning covariates into regions of maximized treatment homogeneity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur.
What carries the argument
Decision tree grown to maximize treatment homogeneity within leaves, augmented by random forest for robustness scoring.
If this is right
- Scalable detection of positivity violations in high-dimensional covariate spaces.
- Automatic identification of violating subpopulations with tree-based rules.
- Quantification of violation robustness using random forest.
- Visualization of stratification rules and violation severity for each subspace.
Where Pith is reading between the lines
- The approach could be integrated into standard causal analysis pipelines to flag data issues before estimation.
- Similar tree-based partitioning might help check other causal assumptions like exchangeability in specific strata.
- Interactive visualizations could support domain experts in deciding whether to exclude or reweight violating regions.
Load-bearing premise
Maximizing treatment homogeneity within decision-tree leaves reliably isolates positivity violations instead of other data structures or artifacts.
What would settle it
A synthetic dataset with an artificially created positivity violation in a known covariate subspace where the decision tree does not produce a leaf with near-perfect treatment separation for that subspace.
Figures
read the original abstract
The assumption of positivity in causal inference (also known as common support and co-variate overlap) is necessary to obtain valid causal estimates. Therefore, confirming it holds in a given dataset is an important first step of any causal analysis. Most common methods to date are insufficient for discovering non-positivity, as they do not scale for modern high-dimensional covariate spaces, or they cannot pinpoint the subpopulation violating positivity. To overcome these issues, we suggest to harness decision trees for detecting violations. By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur. We provide a visualization of the stratification rules that define each subpopulation, combined with the severity of positivity violation within it. We also provide an interactive version of the visualization that allows a deeper dive into the properties of each subspace.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that decision trees can automatically detect positivity violations in causal inference by partitioning the covariate space into mutually exclusive regions that maximize homogeneity of treatment groups within leaves; augmenting this with a random forest then quantifies the robustness of each detected violation. The approach is presented as scalable to high-dimensional data and interpretable via stratification rules and visualizations (including an interactive version) that characterize the violating subpopulations.
Significance. If the central claim holds, the method would address a practical gap in causal inference by offering a scalable, interpretable alternative to existing positivity checks that struggle with high-dimensional covariates or fail to localize violations. The emphasis on visualization and interactivity is a practical strength for applied work.
major comments (2)
- [Abstract / method description] Abstract / method description: the premise that growing trees to maximize treatment-group homogeneity within leaves will isolate regions where positivity is violated (P(T=1|X)=0 or 1) rather than finite-sample imbalances, spurious correlations, or splitting artifacts is stated without derivation, proof, or simulation evidence distinguishing these cases. This assumption is load-bearing because every subsequent step (characterization, robustness scoring via random forest, visualization) inherits the quality of the initial partitions.
- [Abstract] Abstract: no empirical results, error analysis, or validation on datasets with known positivity violations are supplied, so it is impossible to assess whether the detected subspaces reflect true support violations or other data patterns. This is load-bearing for the claim of reliable detection.
minor comments (2)
- [Title / Abstract] Abstract: the phrase 'discriminative approach' is used in the title but not defined or contrasted with generative alternatives in the provided text.
- [Abstract] Abstract: references to standard positivity literature (e.g., common support diagnostics) are absent, making it harder to situate the contribution.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback highlighting the need for stronger justification of the core assumption and empirical validation. We address each major comment below and commit to revisions that directly respond to these concerns.
read point-by-point responses
-
Referee: [Abstract / method description] the premise that growing trees to maximize treatment-group homogeneity within leaves will isolate regions where positivity is violated (P(T=1|X)=0 or 1) rather than finite-sample imbalances, spurious correlations, or splitting artifacts is stated without derivation, proof, or simulation evidence distinguishing these cases.
Authors: We agree the manuscript would be strengthened by explicit justification. The motivation is that, in sufficiently large samples, leaves achieving perfect treatment homogeneity must correspond to regions with P(T=1|X) exactly 0 or 1, whereas finite-sample imbalances or spurious splits would not systematically maximize the homogeneity criterion across the tree. However, we acknowledge the absence of a formal derivation or targeted simulations. In revision we will add a dedicated subsection deriving the connection under the positivity violation definition and include simulation experiments that inject known violations versus pure finite-sample imbalance to illustrate the distinction. revision: yes
-
Referee: [Abstract] Abstract: no empirical results, error analysis, or validation on datasets with known positivity violations are supplied, so it is impossible to assess whether the detected subspaces reflect true support violations or other data patterns.
Authors: The current version emphasizes the methodological contribution, interpretability, and visualization aspects without including empirical validation. We accept that this limits assessment of reliability. The revision will incorporate a new results section containing (i) simulation studies on data with controlled positivity violations, (ii) error analysis measuring detection accuracy, and (iii) application to at least one real dataset where violations are known or can be induced, with quantitative metrics of performance. revision: yes
Circularity Check
No significant circularity; method applies standard tree partitioning to observed treatment labels
full rationale
The paper proposes using decision trees to partition covariate space by maximizing treatment-group homogeneity, with pure leaves interpreted as positivity violations. This is a direct application of standard classification-tree splitting (e.g., Gini or entropy on the binary treatment indicator) rather than any derivation that reduces a claimed result to its own fitted inputs or self-citations. No equations, parameters, or uniqueness theorems are presented that would make the output equivalent to the input by construction; the partitions are produced by an external algorithm whose correctness does not depend on redefining positivity in terms of the tree output. The subsequent random-forest robustness step is likewise an independent augmentation. The approach is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Positivity (common support) is required for valid causal estimates and can be checked by examining treatment assignment overlap in covariate space.
- domain assumption Decision trees that maximize treatment homogeneity within leaves will surface subspaces where positivity fails.
Reference graph
Works this paper leans on
-
[1]
Miguel A Hernan and James M Robins. Causal inference. CRC Boca Raton, FL:, 2010
work page 2010
-
[2]
Invited commentary: positivity in practice
Daniel Westreich and Stephen R Cole. Invited commentary: positivity in practice. American journal of epidemiology , 171(6):674–677, 2010. 10
work page 2010
-
[3]
A nonparametric two-sample test applicable to high dimensional data
Munmun Biswas and Anil K Ghosh. A nonparametric two-sample test applicable to high dimensional data. Journal of Multivariate Analysis , 123:160–171, 2014
work page 2014
-
[4]
Graphical displays for assessing covariate balance in matching studies
Ariel Linden. Graphical displays for assessing covariate balance in matching studies. Journal of evaluation in clinical practice , 21(2):242–247, 2015
work page 2015
-
[5]
Lynne C Messer, J Michael Oakes, and Susan Mason. Effects of socioeconomic and racial residential segregation on preterm birth: a cautionary tale of structural confounding. American journal of epidemiology , 171(6):664–673, 2010
work page 2010
-
[6]
The central role of the propensity score in observa- tional studies for causal effects
Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observa- tional studies for causal effects. Biometrika, 70(1):41–55, 1983
work page 1983
-
[7]
Relative information loss in the pca
Bernhard C Geiger and Gernot Kubin. Relative information loss in the pca. In Information Theory Workshop (ITW), 2012 IEEE , pages 562–566. IEEE, 2012
work page 2012
-
[8]
Testing statistical hypotheses
Erich L Lehmann and Joseph P Romano. Testing statistical hypotheses. Springer Science & Business Media, 2006
work page 2006
-
[9]
Revisiting Classifier Two-Sample Tests
David Lopez-Paz and Maxime Oquab. Revisiting classifier two-sample tests. arXiv preprint arXiv:1610.06545, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[10]
Adversarial Bal- ancing for Causal Inference
Michal Ozery-Flato, Pierre Thodoroff, and Tal El-Hay. Adversarial balancing for causal infer- ence. arXiv preprint arXiv:1810.07406 , 2018
-
[11]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014
work page 2014
-
[12]
J. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986
work page 1986
-
[13]
Lior Rokach and Oded Maimon. Decision trees. In Data mining and knowledge discovery handbook, pages 165–192. Springer, 2005
work page 2005
-
[14]
Data mining with decision trees: theory and applications , volume 69
Lior Rokach and Oded Maimon. Data mining with decision trees: theory and applications , volume 69. World scientific, 2008
work page 2008
-
[15]
A study of cross-validation and bootstrap for accuracy estimation and model selection
Ron Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, volume 14, pages 1137–1145. Montreal, Canada, 1995
work page 1995
-
[16]
Classification and regression trees
Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, 1984
work page 1984
-
[17]
NHANES I Epidemiologic Followup Study (NHEFS), 1992
National Center for Health Statistics. NHANES I Epidemiologic Followup Study (NHEFS), 1992
work page 1992
-
[18]
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008
work page 2008
-
[19]
Scikit- learn: Machine learning in python
Fabian Pedregosa, Ga¨ el Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit- learn: Machine learning in python. Journal of machine learning research , 12(Oct):2825–2830, 2011. 11
work page 2011
-
[20]
J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In Science & Engineering , 9(3):90–95, 2007. doi: 10.1109/MCSE.2007.55
-
[21]
Bokeh: Python library for interactive visualization , 2014
Bokeh Development Team. Bokeh: Python library for interactive visualization , 2014. URL http://www.bokeh.pydata.org
work page 2014
-
[22]
A review on evaluation metrics for data classification evaluations
M Hossin and MN Sulaiman. A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process , 5(2):1, 2015. 12 Supplementary figures Figure S1: A snapshot of the exact same plot as Figure 1B, only its interactive version, where a box with additional information is shown upon hovering over...
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.