pith. sign in

arxiv: 2510.16975 · v3 · submitted 2025-10-19 · 📊 stat.ME

Causal Variance Decompositions for Measuring Health Inequalities

Pith reviewed 2026-05-18 05:58 UTC · model grok-4.3

classification 📊 stat.ME
keywords causal inferencevariance decompositionhealth inequalitieshealthcare disparitieshospital effectseffect modificationselectionSEER data
0
0 comments X

The pith

A new causal framework decomposes observed variance in healthcare outcomes into eight components to quantify sources of inequalities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pairwise comparisons fall short when hospitals and sociodemographic groups are multiple categories rather than two. The paper shifts focus to total observed variance in care delivery outcomes as the target quantity. It introduces a decomposition that splits this variance into eight parts, with new terms that isolate how hospital effects differ by group, how patients select hospitals, and the link between those two forms of heterogeneity. Estimators are developed for both parametric and nonparametric settings, tested in simulations, and applied to SEER data on cervical cancer care.

Core claim

The observed variance in outcomes is attributed to eight components that include the marginal effects of hospitals and groups plus novel terms for effect modification by sociodemographic membership, hospital access or selection, and the correlation between these heterogeneity sources, each carrying a causal interpretation under standard identification assumptions.

What carries the argument

Causal variance decomposition framework that partitions total outcome variance into eight additive components capturing direct effects, modification, selection, and correlations.

If this is right

  • Quantifies the share of outcome variation due to hospital effects that differ across sociodemographic groups.
  • Separates the contribution of differential hospital access or selection from treatment differences.
  • Supports both model-based and nonparametric estimation of the eight terms.
  • Applies directly to polytomous hospital and group settings common in health data.
  • Enables decomposition of disparities in real datasets such as SEER cancer records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same variance decomposition could be applied to other multi-category settings such as schools and student subgroups.
  • Policy work could use the components to prioritize interventions on access versus quality differences.
  • Sensitivity analyses for unmeasured confounding would be needed before treating the components as fully causal.

Load-bearing premise

Standard identification assumptions such as conditional ignorability and positivity allow the eight variance components to be interpreted as reflecting causal modification, selection, and correlation.

What would settle it

In simulated data with known zero modification and selection, the corresponding variance components should estimate near zero; large nonzero estimates would indicate the decomposition fails to isolate those sources.

Figures

Figures reproduced from arXiv: 2510.16975 by Kathy Han, Lin Yu, Olli Saarela, Zhihui Liu.

Figure 1
Figure 1. Figure 1: Causal directed acyclic graph representing relationships between case-mix ( [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sampling distributions for the estimated percentage of total variance explained by each source [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sampling distributions for the estimated percentage of total variance explained by each source [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Difference between Monte Carlo standard deviation and the averaged standard error (Monte Carlo [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
read the original abstract

Recent causal inference literature has introduced causal effect decompositions to quantify sources of observed inequalities or disparities in outcomes, but these approaches are typically limited to pairwise comparisons. In healthcare delivery settings, both the exposure of interest-hospital or healthcare unit-and sociodemographic group membership may be polytomous, making pairwise contrasts inadequate. We therefore take the observed variance in care delivery outcomes as the quantity of interest and develop a new causal variance decomposition framework for this setting. The proposed framework attributes the observed variation to eight components, including novel terms characterizing modification of hospital effects by sociodemographic group membership, hospital access or selection, and the correlation between these two sources of heterogeneity. We discuss the causal interpretation of these components, propose both parametric and nonparametric model-based estimators, and study their performance through simulation. Finally, we illustrate the method using data from the SEER program in an application to cervical cancer care delivery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript develops a causal variance decomposition framework to attribute observed variation in care delivery outcomes to eight components in settings where both hospital units and sociodemographic groups are polytomous. It introduces novel terms for effect modification of hospital effects by group membership, hospital access/selection, and the correlation between these sources of heterogeneity. The authors discuss causal interpretations under standard assumptions, propose parametric and nonparametric estimators, evaluate performance in simulations, and illustrate the method with SEER data on cervical cancer care delivery.

Significance. If the identification strategy and estimators are shown to be valid, the framework extends pairwise causal decompositions to a variance-based attribution that can handle multi-category exposures and groups, potentially aiding nuanced analysis of health inequalities. The simulation study and empirical application are positive features that ground the method, though the distinct value of the new correlation term relative to existing variance decompositions remains to be fully demonstrated.

major comments (3)
  1. [§3] §3 (Causal Interpretation): The formal statement and extension of conditional ignorability (no unmeasured confounding) to the joint polytomous distribution of hospital assignment and sociodemographic group is not provided; without explicit counterfactual definitions isolating the novel correlation term, it is unclear whether this component carries independent causal meaning or reduces to associational quantities.
  2. [§4] §4 (Estimation): The nonparametric estimator invokes positivity over the joint support of hospital and group indicators, but no discussion addresses how this is maintained or diagnosed when the number of hospitals is large and sociodemographic strata are sparse; violation for even one stratum would undermine causal attribution to the modification-selection-correlation terms.
  3. [§5, Table 1] Simulation study (§5, Table 1): Scenarios do not include violations of the identification assumptions (e.g., unmeasured confounding between outcome, hospital, and group); this limits the ability to assess whether the eight-component attribution remains reliable when the causal claims are stressed.
minor comments (2)
  1. [Abstract] Abstract: The eight components are referenced but not enumerated; adding a short list would clarify the scope for readers.
  2. [§2] Notation: The distinction between observed variance and counterfactual variance components could be made more explicit in the main equations to avoid potential confusion with standard ANOVA decompositions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify important aspects of the causal variance decomposition. We respond to each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [§3] §3 (Causal Interpretation): The formal statement and extension of conditional ignorability (no unmeasured confounding) to the joint polytomous distribution of hospital assignment and sociodemographic group is not provided; without explicit counterfactual definitions isolating the novel correlation term, it is unclear whether this component carries independent causal meaning or reduces to associational quantities.

    Authors: We agree that greater formality would strengthen the presentation. In the revision we will add an explicit statement of the conditional ignorability assumption extended to the joint distribution of hospital assignment and sociodemographic group. We will also supply counterfactual definitions for all eight components that isolate the correlation term, showing that it represents the counterfactual covariance between group-specific hospital effects and group-specific selection probabilities and therefore retains independent causal content under the maintained assumptions. revision: yes

  2. Referee: [§4] §4 (Estimation): The nonparametric estimator invokes positivity over the joint support of hospital and group indicators, but no discussion addresses how this is maintained or diagnosed when the number of hospitals is large and sociodemographic strata are sparse; violation for even one stratum would undermine causal attribution to the modification-selection-correlation terms.

    Authors: The referee correctly notes a practical gap. We will revise §4 to discuss the positivity requirement for the joint support, including diagnostics (e.g., empirical support checks and overlap plots) and remedies such as trimming or sensitivity analyses when strata are sparse or the number of hospitals is large. These additions will directly address the risk that positivity violations could affect attribution to the modification, selection, and correlation terms. revision: yes

  3. Referee: [§5, Table 1] Simulation study (§5, Table 1): Scenarios do not include violations of the identification assumptions (e.g., unmeasured confounding between outcome, hospital, and group); this limits the ability to assess whether the eight-component attribution remains reliable when the causal claims are stressed.

    Authors: We accept that the current simulations evaluate performance only under correct specification. In the revised version we will augment the simulation study with scenarios that introduce unmeasured confounding between the outcome, hospital assignment, and sociodemographic group. These new scenarios will quantify the sensitivity of the eight-component decomposition and thereby demonstrate the conditions under which the attribution remains reliable. revision: yes

Circularity Check

0 steps flagged

No significant circularity in causal variance decomposition framework

full rationale

The paper constructs a new causal variance decomposition that attributes observed outcome variation to eight components derived from standard causal models under conditional ignorability and positivity. No derivation step reduces by construction to fitted parameters renamed as predictions, self-citations that bear the central load, or ansatzes imported from prior author work. The novel terms for effect modification, hospital selection, and their correlation follow directly from the polytomous causal structure and counterfactual definitions rather than tautological re-expression of inputs. The framework is self-contained against external benchmarks in causal inference literature, with estimators proposed and evaluated via simulation independent of the target application data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests primarily on established causal inference principles rather than new fitted constants or invented physical entities; the eight components are derived statistical quantities.

axioms (1)
  • domain assumption Standard causal identification assumptions including conditional ignorability and positivity hold so that variance components can be interpreted as causal quantities.
    Invoked to give the decomposition its claimed causal meaning for modification, selection, and correlation terms.

pith-pipeline@v0.9.0 · 5680 in / 1372 out tokens · 50458 ms · 2026-05-18T05:58:34.990718+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    Health disparities and health eq- uity: the issue is justice.American journal of public health, 101(S1):S149–S155, 2011

    Paula A Braveman, Shiriki Kumanyika, Jonathan Fielding, Thomas LaVeist, Luisa N Borrell, Ron Manderscheid, and Adewale Troutman. Health disparities and health eq- uity: the issue is justice.American journal of public health, 101(S1):S149–S155, 2011

  2. [2]

    Racial and ethnic disparities in the quality of health care.Annual review of public health, 37(1):375–394, 2016

    Kevin Fiscella and Mechelle R Sanders. Racial and ethnic disparities in the quality of health care.Annual review of public health, 37(1):375–394, 2016

  3. [3]

    Meaningful causal decompositions in health equity research: definition, identification, and estimation through a weighting framework.Epidemiology, 32(2):282– 290, 2021

    John W Jackson. Meaningful causal decompositions in health equity research: definition, identification, and estimation through a weighting framework.Epidemiology, 32(2):282– 290, 2021

  4. [4]

    Choosing an optimal method for causal decomposition analysis: A better practice for identifying contributing factors to health disparities.arXiv preprint arXiv:2109.06940, 2021

    Soojin Park, Suyeon Kang, and Chioun Lee. Choosing an optimal method for causal decomposition analysis: A better practice for identifying contributing factors to health disparities.arXiv preprint arXiv:2109.06940, 2021

  5. [5]

    Nonparametric causal decomposition of group disparities

    Ang Yu and Felix Elwert. Nonparametric causal decomposition of group disparities. The Annals of Applied Statistics, 19(1):821–845, 2025

  6. [6]

    Hines, K

    Oliver Hines, Karla Diaz-Ordaz, and Stijn Vansteelandt. Variable importance measures for heterogeneous causal effects.arXiv preprint arXiv:2204.06030, 2022

  7. [7]

    Statistical methods for profiling providers of medical care: issues and applications.Journal of the American Statistical Association, 92(439):803–814, 1997

    Sharon-Lise T Normand, Mark E Glickman, and Constantine A Gatsonis. Statistical methods for profiling providers of medical care: issues and applications.Journal of the American Statistical Association, 92(439):803–814, 1997

  8. [8]

    On shrinkage and model extrapolation in the evaluation of clinical center performance

    Machteld Varewyck, Els Goetghebeur, Marie Eriksson, and Stijn Vansteelandt. On shrinkage and model extrapolation in the evaluation of clinical center performance. Biostatistics, 15(4):651–664, 2014

  9. [9]

    Doubly robust estimator for indirectly standard- ized mortality ratios.Epidemiologic methods, 6(1):20160016, 2017

    Katherine Daignault and Olli Saarela. Doubly robust estimator for indirectly standard- ized mortality ratios.Epidemiologic methods, 6(1):20160016, 2017

  10. [10]

    Causal medi- ation analysis for standardized mortality ratios.Epidemiology, 30(4):532–540, 2019

    Katherine Daignault, Keith A Lawson, Antonio Finelli, and Olli Saarela. Causal medi- ation analysis for standardized mortality ratios.Epidemiology, 30(4):532–540, 2019

  11. [11]

    Evaluating medical providers in terms of pa- tient health disparities: a statistical framework.Health Services and Outcomes Research Methodology, 24(4):440–457, 2024

    Nicholas Hartman and Claudia Dahlerus. Evaluating medical providers in terms of pa- tient health disparities: a statistical framework.Health Services and Outcomes Research Methodology, 24(4):440–457, 2024

  12. [12]

    Causal variance decom- positions for institutional comparisons in healthcare.Statistical methods in medical research, 29(7):1972–1986, 2020

    Bo Chen, Keith A Lawson, Antonio Finelli, and Olli Saarela. Causal variance decom- positions for institutional comparisons in healthcare.Statistical methods in medical research, 29(7):1972–1986, 2020

  13. [13]

    Hi- erarchical causal variance decomposition for institution and provider comparisons in healthcare.Health Services and Outcomes Research Methodology, 23(4):391–415, 2023

    Bo Chen, Kristen McAlpine, Keith A Lawson, Antonio Finelli, and Olli Saarela. Hi- erarchical causal variance decomposition for institution and provider comparisons in healthcare.Health Services and Outcomes Research Methodology, 23(4):391–415, 2023. 22

  14. [14]

    Causal mediation analysis decomposition of between-hospital variance.Health Services and Outcomes Research Methodology, pages 1–27, 2022

    Bo Chen, Keith A Lawson, Antonio Finelli, and Olli Saarela. Causal mediation analysis decomposition of between-hospital variance.Health Services and Outcomes Research Methodology, pages 1–27, 2022

  15. [15]

    Effect decomposition in the presence of an exposure-induced mediator-outcome confounder.Epidemiology, 25 (2):300–306, 2014

    Tyler J VanderWeele, Stijn Vansteelandt, and James M Robins. Effect decomposition in the presence of an exposure-induced mediator-outcome confounder.Epidemiology, 25 (2):300–306, 2014

  16. [16]

    Hyperparameters and tuning strategies for random forest.Wiley Interdisciplinary Reviews: data mining and knowledge discovery, 9(3):e1301, 2019

    Philipp Probst, Marvin N Wright, and Anne-Laure Boulesteix. Hyperparameters and tuning strategies for random forest.Wiley Interdisciplinary Reviews: data mining and knowledge discovery, 9(3):e1301, 2019

  17. [17]

    Surveillance, epidemiology, and end results (SEER) program

    National Cancer Institute. Surveillance, epidemiology, and end results (SEER) program. https://seer.cancer.gov/, 2025. Accessed: 2025-08-13

  18. [18]

    The surveillance, epidemi- ology, and end results program: a national resource.Cancer Epidemiology Biomarkers & Prevention, 8(12):1117–1121, 1999

    Benjamin F Hankey, Lynn A Ries, and Brenda K Edwards. The surveillance, epidemi- ology, and end results program: a national resource.Cancer Epidemiology Biomarkers & Prevention, 8(12):1117–1121, 1999

  19. [19]

    Kathy Han, Darien Colson-Fearon, Zhihui Amy Liu, and Akila N Viswanathan. Updated trends in the utilization of brachytherapy in cervical cancer in the united states: A surveillance, epidemiology, and end-results study.International Journal of Radiation Oncology* Biology* Physics, 119(1):143–153, 2024

  20. [20]

    Linda Valeri, Cecile Proust-Lima, Weijia Fan, Jarvis T Chen, and Helene Jacqmin- Gadda. A multistate approach for the study of interventions on an intermediate time- to-event in health disparities research.Statistical methods in medical research, 32(8): 1445–1460, 2023

  21. [21]

    Mediation analysis for health disparities research.American journal of epidemiology, 184(4):315– 324, 2016

    Ashley I Naimi, Mireille E Schnitzer, Erica EM Moodie, and Lisa M Bodnar. Mediation analysis for health disparities research.American journal of epidemiology, 184(4):315– 324, 2016

  22. [22]

    Simulating counterfactuals.Journal of Artificial Intelligence Research, 80:835–857, 2024

    Juha Karvanen, Santtu Tikka, and Matti Vihola. Simulating counterfactuals.Journal of Artificial Intelligence Research, 80:835–857, 2024

  23. [23]

    A novel measure of effect size for mediation analysis.Psychological methods, 23(2):244, 2018

    Mark J Lachowicz, Kristopher J Preacher, and Ken Kelley. A novel measure of effect size for mediation analysis.Psychological methods, 23(2):244, 2018

  24. [24]

    Bayesian mediation analysis.Psychological meth- ods, 14(4):301, 2009

    Ying Yuan and David P MacKinnon. Bayesian mediation analysis.Psychological meth- ods, 14(4):301, 2009

  25. [25]

    Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018

    Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018

  26. [26]

    Semiparametric posterior corrections.Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkaf005, 2025

    Alan Yiu, Edwin Fong, Chris Holmes, and Judith Rousseau. Semiparametric posterior corrections.Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkaf005, 2025. 23

  27. [27]

    Restricted mean survival time for survival analysis: a quick guide for clinical researchers.Korean Journal of Radiology, 23(5):495, 2022

    Kyunghwa Han and Inkyung Jung. Restricted mean survival time for survival analysis: a quick guide for clinical researchers.Korean Journal of Radiology, 23(5):495, 2022

  28. [28]

    When moderation is mediated and mediation is moderated.Journal of personality and social psychology, 89 (6):852, 2005

    Dominique Muller, Charles M Judd, and Vincent Y Yzerbyt. When moderation is mediated and mediation is moderated.Journal of personality and social psychology, 89 (6):852, 2005

  29. [29]

    Marginal and condi- tional importance measures from machine learning models and their relationship with conditional average treatment effect.arXiv preprint arXiv:2501.16988, 2025

    Mohammad Kaviul Anam Khan, Olli Saarela, and Rafal Kustra. Marginal and condi- tional importance measures from machine learning models and their relationship with conditional average treatment effect.arXiv preprint arXiv:2501.16988, 2025

  30. [30]

    µ(z, z,X)− X z∗ µ(z, z∗,X)P(Z=z ∗ |X) # P(Z=z|X) = X z

    Judea Pearl.Causality. Cambridge university press, 2009. 24 Appendix A Proof of Equation(2) The variance ofY(A) conditional on the vector of case-mix covariatesXcan be expressed using the law of total variance conditioning onZas follows: V[Y(A)|X] =V Z|X [E(Y(A)|Z,X)] +E Z|X [V(Y(A)|Z,X)].(11) The inner expectation in the first term of Equation (11) can b...