pith. machine review for the scientific record. sign in

arxiv: 2511.20985 · v4 · submitted 2025-11-26 · 📊 stat.ME

Two-stage Estimation for Causal Inference Involving a Semi-continuous Exposure

Pith reviewed 2026-05-17 05:29 UTC · model grok-4.3

classification 📊 stat.ME
keywords causal inferencesemi-continuous exposuretwo-stage estimationpropensity scoremarginal structural modeldose-responseexposure status
0
0 comments X

The pith

A two-stage estimator separates the causal effect of exposure status from the dose-response among the exposed for semi-continuous variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework for causal inference when the exposure variable has a point mass at zero together with a continuous positive component. It combines a two-part propensity score with a marginal structural model so that the effect of being exposed at a reference level can be estimated separately from the effect of increasing the exposure level among those who are exposed. A sequential two-stage procedure targets these quantities in turn and the authors derive consistency, asymptotic normality, and the limiting behavior under misspecification of the propensity models. This matters for observational studies in which exposures such as alcohol intake, pollutant levels, or medication doses are zero for many individuals yet vary continuously for others. A reader who accepts the framework gains the ability to ask distinct scientific questions about initiation versus intensity without forcing the exposure into a purely binary or purely continuous mold.

Core claim

For semi-continuous exposures the authors introduce a two-part propensity structure—one binary model for exposure status and one conditional model for the positive exposure level—embedded in a marginal structural model that identifies both the causal effect of exposure status at a reference dose and the causal dose-response function among the exposed. A two-stage estimation procedure sequentially estimates these quantities, permits flexible choice of propensity methods in the second stage, and yields estimators that are consistent and asymptotically normal when the propensity models are correct while converging to well-characterized limits under misspecification.

What carries the argument

Two-part propensity score model (binary status component plus conditional exposure-level component) inside a marginal structural model that disentangles status and dose effects.

If this is right

  • The estimators remain consistent and asymptotically normal when the two-part propensity models are correctly specified.
  • Under misspecification the estimators converge to explicit limiting values that can be interpreted.
  • The procedure allows different propensity methods to be used in the second stage without losing the overall consistency property.
  • Applied to prenatal alcohol exposure, the method can answer separate questions about whether any exposure occurred and about the effect of higher amounts among exposed pregnancies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same two-part structure could be applied to other zero-inflated or count-valued exposures once suitable conditional models are chosen.
  • Replacing parametric propensity models with machine-learning alternatives in the second stage would be a direct extension that preserves the paper's asymptotic guarantees.
  • Policy analyses could use the separated parameters to evaluate interventions aimed only at preventing initiation versus those aimed at reducing intensity among users.

Load-bearing premise

The two-part propensity score models are either correctly specified or their misspecification leaves the target causal parameters unbiased within the limits the paper characterizes.

What would settle it

A simulation in which the propensity models are misspecified in a manner outside the paper's robustness analysis and the resulting estimators fail to recover the known true causal effects.

Figures

Figures reproduced from arXiv: 2511.20985 by Joseph L. Jacobson, Louise M. Ryan, R. Colin Carter, Richard J. Cook, Sandra W. Jacobson, Tugba Akkaya-Hocagil, Xiaoya Wang, Yeying Zhu.

Figure 1
Figure 1. Figure 1: A directed acyclic graph for a two-part exposure model [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Methods for causal inference are well developed for binary and continuous exposures, but in many settings, the exposure has a substantial mass at zero-such exposures are called semi-continuous. We propose a general causal framework for such semi-continuous exposures, together with a novel two-stage estimation strategy. A two-part propensity structure is introduced for the semi-continuous exposure, with one component for exposure status (exposed vs unexposed) and another for the exposure level among those exposed, and incorporates both into a marginal structural model that disentangles the effects of exposure status and dose. The two-stage procedure sequentially targets the causal dose-response among exposed individuals and the causal effect of exposure status at a reference dose, allowing flexibility in the choice of propensity score methods in the second stage. We establish consistency and asymptotic normality for the resulting estimators, and characterise their limiting values under misspecification of the propensity score models. Simulation studies evaluate finite sample performance and robustness, and an application to a study of prenatal alcohol exposure and child cognition demonstrates how the proposed methods can be used to address a range of scientific questions about both exposure status and exposure intensity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a general causal framework for semi-continuous exposures that incorporates a two-part propensity score structure (one component for exposure status and one for dose intensity among the exposed) into a marginal structural model separating the effects of status and dose. It introduces a novel two-stage estimation procedure, establishes consistency and asymptotic normality of the resulting estimators, and explicitly characterizes their limiting values under misspecification of the propensity models. The work includes simulation studies assessing finite-sample performance and robustness, plus a real-data application to prenatal alcohol exposure and child cognition.

Significance. If the results hold, the paper offers a practically useful extension of causal methods to a common but under-served exposure type, with theoretical guarantees that include robustness characterizations and a flexible two-stage procedure. The explicit limiting-value analysis under misspecification and the provision of simulation studies are particular strengths that support reliable use in applications where propensity models may be imperfect.

major comments (1)
  1. [§6] §6 (Application): No propensity-score model diagnostics, balance checks, or sensitivity analyses are reported for either component of the two-part propensity model in the prenatal alcohol exposure analysis. Because the central claim includes a demonstration that the method addresses scientific questions about exposure status and intensity in this specific dataset, and because the paper derives limiting values under misspecification, the lack of empirical assessment of model adequacy in the observed data is load-bearing for interpreting the reported causal effects.
minor comments (2)
  1. [§3 and §4] Clarify in the main text how the reference dose is chosen in the two-stage procedure and whether results are sensitive to that choice; a brief sensitivity table would help.
  2. [§5] In the simulation section, label the panels or legends to distinguish clearly between the binary-status effect and the dose-response effect among the exposed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments and for recognizing the strengths of our proposed two-stage estimator and theoretical results for semi-continuous exposures. We address the single major comment below and will incorporate the requested additions in the revised manuscript.

read point-by-point responses
  1. Referee: [§6] §6 (Application): No propensity-score model diagnostics, balance checks, or sensitivity analyses are reported for either component of the two-part propensity model in the prenatal alcohol exposure analysis. Because the central claim includes a demonstration that the method addresses scientific questions about exposure status and intensity in this specific dataset, and because the paper derives limiting values under misspecification, the lack of empirical assessment of model adequacy in the observed data is load-bearing for interpreting the reported causal effects.

    Authors: We agree that the application would be strengthened by explicit diagnostics. In the revised Section 6 we will add: (i) overlap and positivity diagnostics for both the binary exposure-status model and the continuous dose model among the exposed; (ii) covariate balance tables (standardized mean differences) before and after weighting for each component; (iii) goodness-of-fit summaries (e.g., Hosmer-Lemeshow for the logistic part and residual diagnostics for the linear part); and (iv) sensitivity analyses that vary the specification of each propensity component and report the resulting changes in the estimated status and dose effects. These additions will directly support interpretation of the reported causal contrasts and will be cross-referenced to the limiting-value characterizations already derived in the theoretical sections. revision: yes

Circularity Check

0 steps flagged

No circularity: estimators derived from standard causal identification plus explicit two-part modeling

full rationale

The paper introduces a two-part propensity score for semi-continuous exposures and a two-stage procedure that sequentially targets dose-response among the exposed and the effect of exposure status. Consistency, asymptotic normality, and explicit limiting values under misspecification are derived from standard marginal structural model identification results together with regularity conditions on the estimating equations. These steps are self-contained against external benchmarks (causal identification theory and M-estimation), with no reduction of the target parameters to fitted inputs by construction, no load-bearing self-citations, and no ansatz smuggled via prior work. The application and simulations serve as illustration rather than the source of the claimed properties.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard causal assumptions plus modeling choices for the semi-continuous exposure; no new free parameters or invented entities are introduced beyond conventional propensity and outcome models.

axioms (2)
  • domain assumption No unmeasured confounding between exposure and outcome conditional on observed covariates.
    Invoked to identify causal effects in the marginal structural model.
  • domain assumption Positivity of the two-part propensity scores.
    Required for the two-stage estimators to be well-defined.

pith-pipeline@v0.9.0 · 5525 in / 1287 out tokens · 51469 ms · 2026-05-17T05:29:05.831639+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66 0 (5): 0 688--701, 1974

  2. [2]

    The central role of the propensity score in observational studies for causal effects

    Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70 0 (1): 0 41--55, 1983

  3. [3]

    Reducing bias in observational studies using subclassification on the propensity score

    Paul R Rosenbaum and Donald B Rubin. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association, 79 0 (387): 0 516--524, 1984

  4. [4]

    The propensity score with continuous treatments

    Keisuke Hirano and Guido W Imbens. The propensity score with continuous treatments. Applied Bayesian Modeling and Causal Inference from Incomplete-data Perspectives, 226164: 0 73--84, 2004

  5. [5]

    Causal inference with general treatment regimes: generalizing the propensity score

    Kosuke Imai and David A Van Dyk. Causal inference with general treatment regimes: generalizing the propensity score. Journal of the American Statistical Association, 99 0 (467): 0 854--866, 2004

  6. [6]

    Constructing inverse probability weights for continuous exposures: a comparison of methods

    Ashley I Naimi, Erica EM Moodie, Nathalie Auger, and Jay S Kaufman. Constructing inverse probability weights for continuous exposures: a comparison of methods. Epidemiology, 25 0 (2): 0 292--299, 2014

  7. [7]

    A method for semi-continuous measurement of dissolved elemental mercury in industrial and natural waters

    Ermira Begu, Yaroslav Shlyapnikov, Andrej Stergarsek, Peter Frkal, Jo z e Kotnik, and Milena Horvat. A method for semi-continuous measurement of dissolved elemental mercury in industrial and natural waters. International Journal of Environmental Analytical Chemistry, 96 0 (7): 0 609--626, 2016

  8. [8]

    Validity of maternal report of prenatal alcohol, cocaine, and smoking in relation to neurobehavioral outcome

    Sandra W Jacobson, Lisa M Chiodo, Robert J Sokol, and Joseph L Jacobson. Validity of maternal report of prenatal alcohol, cocaine, and smoking in relation to neurobehavioral outcome. Pediatrics, 109 0 (5): 0 815--825, 2002

  9. [9]

    Maternal age, alcohol abuse history, and quality of parenting as moderators of the effects of prenatal alcohol exposure on 7.5-year intellectual function

    Sandra W Jacobson, Joseph L Jacobson, Robert J Sokol, Lisa M Chiodo, and Raluca Corobana. Maternal age, alcohol abuse history, and quality of parenting as moderators of the effects of prenatal alcohol exposure on 7.5-year intellectual function. Alcoholism: Clinical and Experimental Research, 28 0 (11): 0 1732--1745, 2004

  10. [10]

    Number processing in adolescents with prenatal alcohol exposure and adhd: differences in the neurobehavioral phenotype

    Joseph L Jacobson, Neil C Dodge, Matthew J Burden, Rafael Klorman, and Sandra W Jacobson. Number processing in adolescents with prenatal alcohol exposure and adhd: differences in the neurobehavioral phenotype. Alcoholism: Clinical and Experimental Research, 35 0 (3): 0 431--442, 2011

  11. [11]

    Verbal learning and memory impairment in children with fetal alcohol spectrum disorders

    Catherine E Lewis, Kevin GF Thomas, Neil C Dodge, Christopher D Molteno, Ernesta M Meintjes, Joseph L Jacobson, and Sandra W Jacobson. Verbal learning and memory impairment in children with fetal alcohol spectrum disorders. Alcoholism: Clinical and Experimental Research, 39 0 (4): 0 724--732, 2015

  12. [12]

    Prospective memory impairment in children with prenatal alcohol exposure

    Catherine E Lewis, Kevin GF Thomas, Christopher D Molteno, Matthias Kliegel, Ernesta M Meintjes, Joseph L Jacobson, and Sandra W Jacobson. Prospective memory impairment in children with prenatal alcohol exposure. Alcoholism: Clinical and Experimental Research, 40 0 (5): 0 969--978, 2016

  13. [13]

    Propensity score analysis for a semi-continuous exposure variable: a study of gestational alcohol exposure and childhood cognition

    Tugba Akkaya Hocagil, Richard J Cook, Sandra W Jacobson, Joseph L Jacobson, and Louise M Ryan. Propensity score analysis for a semi-continuous exposure variable: a study of gestational alcohol exposure and childhood cognition. Journal of the Royal Statistical Society Series A: Statistics in Society, 184 0 (4): 0 1390--1413, 2021

  14. [14]

    Use of generalized propensity scores for assessing effects of multiple exposures

    Kecheng Li, Tugba Akkaya-Hocagil, Richard J Cook, Louise M Ryan, R Colin Carter, Khue-Dung Dang, Joseph L Jacobson, and Sandra W Jacobson. Use of generalized propensity scores for assessing effects of multiple exposures. Statistics in Biosciences, pages 1--30, 2023

  15. [15]

    Doubly robust estimation in missing data and causal inference models

    Heejung Bang and James M Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61 0 (4): 0 962--973, 2005

  16. [16]

    Comment: performance of double-robust estimators when ``inverse probability" weights are highly variable

    James Robins, Mariela Sued, Quanhong Lei-Gomez, and Andrea Rotnitzky. Comment: performance of double-robust estimators when ``inverse probability" weights are highly variable. Statistical Science, 22 0 (4): 0 544--559, 2007

  17. [17]

    On the application of probability theory to agricultural experiments

    Jerzy Splawa-Neyman, Dorota M Dabrowska, and Terrence P Speed. On the application of probability theory to agricultural experiments. essay on principles. section 9. Statistical Science, pages 465--472, 1990

  18. [18]

    Randomization analysis of experimental data: the fisher randomization test comment

    Donald B Rubin. Randomization analysis of experimental data: the fisher randomization test comment. Journal of the American Statistical Association, 75 0 (371): 0 591--593, 1980

  19. [19]

    Comment: Neyman (1923) and causal inference in experiments and observational studies

    Donald B Rubin. Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science, 5 0 (4): 0 472--480, 1990

  20. [20]

    Constructing inverse probability weights for marginal structural models

    Stephen R Cole and Miguel A Hern \'a n. Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, 168 0 (6): 0 656--664, 2008

  21. [21]

    Two parts are better than one: modeling marginal means of semicontinuous data

    Valerie A Smith, Brian Neelon, Matthew L Maciejewski, and John S Preisser. Two parts are better than one: modeling marginal means of semicontinuous data. Health Services and Outcomes Research Methodology, 17: 0 198--218, 2017

  22. [22]

    Marginal structural models versus structural nested models as tools for causal inference

    James M Robins. Marginal structural models versus structural nested models as tools for causal inference. Statistical Models in Epidemiology, the Environment, and Clinical Trials, pages 95--133, 2000

  23. [23]

    On regression adjustment for the propensity score

    Stijn Vansteelandt and Rhian M Daniel. On regression adjustment for the propensity score. Statistics in Medicine, 33 0 (23): 0 4053--4072, 2014

  24. [24]

    Marginal structural models to estimate the causal effect of zidovudine on the survival of hiv-positive men

    Miguel \'A ngel Hern \'a n, Babette Brumback, and James M Robins. Marginal structural models to estimate the causal effect of zidovudine on the survival of hiv-positive men. Epidemiology, 11 0 (5): 0 561--570, 2000

  25. [25]

    Doubly robust estimation of causal effects

    Michele Jonsson Funk, Daniel Westreich, Chris Wiesen, Til St \"u rmer, M Alan Brookhart, and Marie Davidian. Doubly robust estimation of causal effects. American Journal of Epidemiology, 173 0 (7): 0 761--767, 2011

  26. [26]

    Asymptotic statistics, volume 3

    Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000

  27. [27]

    Maximum likelihood estimation of misspecified models

    Halbert White. Maximum likelihood estimation of misspecified models. Econometrica: Journal of the Econometric Society, pages 1--25, 1982

  28. [28]

    Variable selection for propensity score models

    M Alan Brookhart, Sebastian Schneeweiss, Kenneth J Rothman, Robert J Glynn, Jerry Avorn, and Til St \"u rmer. Variable selection for propensity score models. American Journal of Epidemiology, 163 0 (12): 0 1149--1156, 2006

  29. [29]

    Inference and missing data

    Donald B Rubin. Inference and missing data. Biometrika, 63 0 (3): 0 581--592, 1976

  30. [30]

    Statistical analysis with missing data, volume 793

    Roderick JA Little and Donald B Rubin. Statistical analysis with missing data, volume 793. John Wiley & Sons, 2019

  31. [31]

    Causal diagrams for epidemiologic research

    Sander Greenland, Judea Pearl, and James M Robins. Causal diagrams for epidemiologic research. Epidemiology, 10 0 (1): 0 37--48, 1999

  32. [32]

    A boosting algorithm for estimating generalized propensity scores with continuous treatments

    Yeying Zhu, Donna L Coffman, and Debashis Ghosh. A boosting algorithm for estimating generalized propensity scores with continuous treatments. Journal of Causal Inference, 3 0 (1): 0 25--40, 2015

  33. [33]

    Non-parametric methods for doubly robust estimation of continuous treatment effects

    Edward H Kennedy, Zongming Ma, Matthew D McHugh, and Dylan S Small. Non-parametric methods for doubly robust estimation of continuous treatment effects. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79 0 (4): 0 1229--1245, 2017

  34. [34]

    Propensity score-based methods for causal inference in observational studies with non-binary treatments

    Shandong Zhao, David A van Dyk, and Kosuke Imai. Propensity score-based methods for causal inference in observational studies with non-binary treatments. Statistical Methods in Medical Research, 29 0 (3): 0 709--727, 2020

  35. [35]

    A hierarchical meta-analysis for settings involving multiple outcomes across multiple cohorts

    Tugba Akkaya Hocagil, Louise M Ryan, Richard J Cook, Sandra W Jacobson, Gale A Richardson, Nancy L Day, Claire D Coles, Heather Carmichael Olson, and Joseph L Jacobson. A hierarchical meta-analysis for settings involving multiple outcomes across multiple cohorts. Stat, 11 0 (1): 0 e462, 2022

  36. [36]

    Bayesian modelling of effects of prenatal alcohol exposure on child cognition based on data from multiple cohorts

    Khue-Dung Dang, Louise M Ryan, Tugba Akkaya Hocagil, Richard J Cook, Gale A Richardson, Nancy L Day, Claire D Coles, Heather Carmichael Olson, Sandra W Jacobson, and Joseph L Jacobson. Bayesian modelling of effects of prenatal alcohol exposure on child cognition based on data from multiple cohorts. Australian & New Zealand Journal of Statistics, 65 0 (3):...

  37. [37]

    mice: Multivariate imputation by chained equations in r

    Stef Van Buuren and Karin Groothuis-Oudshoorn. mice: Multivariate imputation by chained equations in r. Journal of Statistical Software, 45: 0 1--67, 2011

  38. [38]

    Multiple imputation using chained equations: issues and guidance for practice

    Ian R White, Patrick Royston, and Angela M Wood. Multiple imputation using chained equations: issues and guidance for practice. Statistics in Medicine, 30 0 (4): 0 377--399, 2011

  39. [39]

    Flexible imputation of missing data, volume 10

    Stef Van Buuren and Stef Van Buuren. Flexible imputation of missing data, volume 10. CRC press Boca Raton, FL, 2012