Everything all at once: On choosing an estimand for multi-component environmental exposures
Pith reviewed 2026-05-18 14:30 UTC · model grok-4.3
The pith
An estimand quantifies how shifting a mix of environmental exposures affects outcomes like hypertension, using data-supported shifts and nonparametric machine learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose an approach to quantify a relationship between a shift in the exposure mixture and the outcome in either single-timepoint or longitudinal settings. The shift can be defined flexibly by shifting one or more components, including interactions between mixture components, and by shifting the same or different amounts across components. The estimand has a similar interpretation to a main-effect regression coefficient. We focus on choosing a shift supported by observed data to assess and minimize extrapolation, and we estimate the relationship completely nonparametrically using machine learning rather than parametric modeling.
What carries the argument
The mixture-shift estimand, which measures the expected change in outcome under a flexibly defined shift to one or more exposure components and is estimated via nonparametric conditional expectations.
If this is right
- The estimand permits direct examination of interactions between specific mixture components.
- The same framework applies to both cross-sectional and longitudinal exposure data.
- Choosing shifts supported by the data reduces the need for extrapolation beyond observed values.
- Completely nonparametric estimation avoids reliance on parametric modeling assumptions that may be tenuous in nonrandomized settings.
Where Pith is reading between the lines
- The same shift-selection and estimation strategy could be used to study mixtures in other observational domains such as nutrition or air pollution.
- Policy analyses could apply the estimand to compare the health impact of regulating different subsets of a mixture.
- The approach naturally lends itself to sensitivity checks that vary the magnitude or direction of the chosen shift while keeping the same nonparametric machinery.
Load-bearing premise
That a practically relevant shift in the exposure mixture can be chosen so that it is supported by observed data and that machine learning can estimate the required conditional expectations completely nonparametrically in complex nonrandomized longitudinal settings.
What would settle it
Applying the method to the CHAMACOS pesticide data and finding that the selected shift lies substantially outside the observed joint distribution of exposures, or that the machine learning estimators show poor cross-validated performance, would indicate the approach relies on unsupported extrapolation or unstable estimation.
Figures
read the original abstract
Many research questions -- particularly those in environmental health -- do not involve binary exposures. In environmental epidemiology, this includes multivariate exposure mixtures with nondiscrete components. Causal inference estimands and estimators to quantify the relationship between an exposure mixture and an outcome are relatively few. We propose an approach to quantify a relationship between a shift in the exposure mixture and the outcome -- either in the single timepoint or longitudinal setting. The shift in the exposure mixture can be defined flexibly in terms of shifting one or more components, including examining interaction between mixture components, and in terms of shifting the same or different amounts across components. The estimand we discuss has a similar interpretation as a main effect regression coefficient. First, we focus on choosing a shift in the exposure mixture supported by observed data. We demonstrate how to assess extrapolation and modify the shift to minimize reliance on extrapolation. Second, we propose estimating the relationship between the exposure mixture shift and outcome completely nonparametrically, using machine learning in model-fitting. This is in contrast to other current approaches, which employ parametric modeling for at least some relationships, which we would like to avoid because parametric modeling assumptions in complex, nonrandomized settings are tenuous at best. We are motivated by longitudinal data on pesticide exposures among participants in the CHAMACOS Maternal Cognition cohort. We examine the relationship between longitudinal exposure to agricultural pesticides and risk of hypertension. We provide step-by-step code to facilitate the easy replication and adaptation of the approaches we use.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a flexible estimand for the effect of a shift in a multi-component environmental exposure mixture on an outcome, applicable in both single-timepoint and longitudinal settings. Shifts can involve one or more components, allow for interactions, and use equal or unequal magnitudes across components; the resulting contrast is interpreted similarly to a main-effect regression coefficient. The approach first selects a data-supported shift that minimizes extrapolation, then estimates the contrast completely nonparametrically via machine learning rather than parametric models. The method is motivated by and applied to longitudinal pesticide exposure data from the CHAMACOS Maternal Cognition cohort in relation to hypertension risk, with replication code provided.
Significance. If the nonparametric estimator can be shown to recover the target shift estimand consistently in high-dimensional longitudinal data with time-varying confounding, the work would supply a practical tool for environmental epidemiology that avoids strong parametric assumptions while retaining an interpretable, regression-like summary. The explicit focus on data-supported shifts and the provision of replication code are strengths that would increase the method's usability if the technical conditions for consistency are clarified.
major comments (2)
- The central claim that the shift contrast can be estimated completely nonparametrically using machine learning in longitudinal settings requires additional justification. In the presence of multi-component exposures and time-varying confounding, the plug-in estimator for the sequence of conditional expectations may not achieve consistency or n^{-1/2} rates without smoothness, sparsity, or other regularity conditions; the manuscript should supply either convergence rates, double-robustness arguments, or finite-sample diagnostics that address this for the CHAMACOS-style data.
- The procedure for choosing a data-supported shift to minimize extrapolation is load-bearing for the claim of reduced reliance on model extrapolation. The manuscript should demonstrate, perhaps via a simulation or sensitivity analysis in the application section, that the selected shift indeed keeps the required conditional expectations within regions of good data support and does not inadvertently reintroduce extrapolation bias when combined with the nonparametric estimator.
minor comments (2)
- Notation for the longitudinal g-computation-style functional and the flexible shift operator should be introduced earlier and used consistently to improve readability for readers unfamiliar with mixture-shift estimands.
- The abstract states that the estimand has a similar interpretation to a main-effect regression coefficient; a brief explicit comparison (e.g., to a coefficient in a linear model for a single-component exposure) would help readers understand the precise sense in which this holds.
Simulated Author's Rebuttal
We are grateful to the referee for their constructive feedback on our manuscript. We have carefully considered the major comments and provide point-by-point responses below. We believe these revisions strengthen the paper's technical rigor and practical applicability.
read point-by-point responses
-
Referee: The central claim that the shift contrast can be estimated completely nonparametrically using machine learning in longitudinal settings requires additional justification. In the presence of multi-component exposures and time-varying confounding, the plug-in estimator for the sequence of conditional expectations may not achieve consistency or n^{-1/2} rates without smoothness, sparsity, or other regularity conditions; the manuscript should supply either convergence rates, double-robustness arguments, or finite-sample diagnostics that address this for the CHAMACOS-style data.
Authors: We agree that the consistency of the nonparametric estimator merits further discussion, particularly in longitudinal settings with time-varying confounding. While the manuscript emphasizes the use of machine learning to avoid parametric assumptions, we acknowledge that additional regularity conditions are typically required for root-n consistency. In the revision, we have expanded the methods section to include a discussion of double-robustness properties when using cross-validated machine learning estimators, drawing on results from targeted learning literature. We also provide finite-sample diagnostics in the CHAMACOS application, including checks for positivity and overlap in the estimated conditional expectations. revision: yes
-
Referee: The procedure for choosing a data-supported shift to minimize extrapolation is load-bearing for the claim of reduced reliance on model extrapolation. The manuscript should demonstrate, perhaps via a simulation or sensitivity analysis in the application section, that the selected shift indeed keeps the required conditional expectations within regions of good data support and does not inadvertently reintroduce extrapolation bias when combined with the nonparametric estimator.
Authors: We appreciate this point, as the data-supported shift selection is indeed central to our approach. To address this, we have added a sensitivity analysis in the revised application section. This analysis varies the shift magnitudes and components, compares the resulting data support metrics (such as the proportion of observations with positive density in the relevant regions), and demonstrates that the selected shift maintains good overlap without introducing substantial extrapolation. We also include a brief simulation study in the supplementary materials to illustrate the impact of shift selection on estimator performance. revision: yes
Circularity Check
No circularity: estimand and estimator defined directly from observable shifts and nonparametric targets
full rationale
The paper introduces a new causal estimand for flexible shifts in multi-component exposures (single-time or longitudinal) and proposes to estimate it via machine-learning plug-ins for the required conditional expectations. No step reduces a claimed prediction or uniqueness result to a prior fit, self-citation, or ansatz imported from the authors' own work; the central objects are defined in terms of observable data-supported contrasts and standard nonparametric identification, without circular re-use of fitted quantities as 'predictions.' The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard causal inference assumptions (consistency, positivity, no unmeasured confounding) hold for the observational longitudinal data.
Reference graph
Works this paper leans on
-
[1]
Brenda Eskenazi et al. “Association of in utero organophosphate pesticide exposure and fetal growth and length of gestation in an agricultural population”. In:Environmental health perspectives112.10 (2004), pp. 1116–1124
work page 2004
-
[2]
https://www.cdpr.ca.gov/docs/pur/purmain.htm
California Department of Pesticide Regulation.Pesticide Use Reporting (PUR). https://www.cdpr.ca.gov/docs/pur/purmain.htm. Accessed: 2023-05-01. 2023
work page 2023
-
[3]
Martha Harnly et al. “Correlating agricultural use of organophosphates with outdoor air concentrations: a particular concern for children”. In:Environmental health perspectives113.9 (2005), pp. 1184–1189
work page 2005
-
[4]
Contributions of nearby agricultural insecticide applications to indoor residential exposures
Jessica M Madrigal et al. “Contributions of nearby agricultural insecticide applications to indoor residential exposures”. In:Environment international171 (2023), p. 107657. Supporting Information 6
work page 2023
-
[5]
Pesticides in dust from homes in an agricultural area
Martha E Harnly et al. “Pesticides in dust from homes in an agricultural area”. In:Environmental science & technology43.23 (2009), pp. 8767–8774
work page 2009
-
[6]
Determinants of agricultural pesticide concentrations in carpet dust
Robert B Gunier et al. “Determinants of agricultural pesticide concentrations in carpet dust”. In: Environmental health perspectives119.7 (2011), pp. 970–976
work page 2011
-
[7]
John R Nuckols et al. “Linkage of the California Pesticide Use Reporting Database with spatial land use data for exposure assessment”. In:Environmental health perspectives115.5 (2007), pp. 684–689
work page 2007
-
[8]
Prenatal residential proximity to agricultural pesticide use and IQ in 7-year-old children
Robert B Gunier et al. “Prenatal residential proximity to agricultural pesticide use and IQ in 7-year-old children”. In:Environmental health perspectives125.5 (2017), p. 057002
work page 2017
-
[9]
Estimation of the effect of interventions that modify the received treatment
Sebastian Haneuse and Andrea Rotnitzky. “Estimation of the effect of interventions that modify the received treatment”. In:Stat Med32.30 (2013), pp. 5260–5277
work page 2013
-
[10]
Jessica G Young, Miguel A Hernán, and James M Robins. “Identification, estimation and approximation of risk under interventions that depend on the natural value of treatment using observational data”. In: Epidemiologic methods3.1 (2014), pp. 1–19
work page 2014
-
[11]
Nonparametric causal effects based on longitudinal modified treatment policies
Iván Díaz et al. “Nonparametric causal effects based on longitudinal modified treatment policies”. In:Journal of the American Statistical Association118.542 (2023), pp. 846–857
work page 2023
-
[12]
Katherine L Hoffman et al. “Studying continuous, time-varying, and/or complex exposures using longitudinal modified treatment policies”. In:Epidemiology35.5 (2024), pp. 667–675
work page 2024
-
[13]
Targeted minimum loss based estimation of causal effects of multiple time point interventions
Mark J van der Laan and Susan Gruber. “Targeted minimum loss based estimation of causal effects of multiple time point interventions”. In:The international journal of biostatistics8.1 (2012)
work page 2012
-
[14]
Sequential Double Robustness in Right-Censored Longitudinal Models
Alexander R Luedtke et al. “Sequential double robustness in right-censored longitudinal models”. In:arXiv preprint arXiv:1705.02459(2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
On the multiply robust estimation of the mean of the g-functional
Andrea Rotnitzky, James Robins, and Lucia Babino. “On the multiply robust estimation of the mean of the g-functional”. In:arXiv preprint arXiv:1705.08582(2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
Semiparametric doubly robust targeted double machine learning: a review
Edward H Kennedy. “Semiparametric doubly robust targeted double machine learning: a review”. In: Handbook of statistical methods for precision medicine(2024), pp. 207–236
work page 2024
-
[17]
lmtp: An R package for estimating the causal effects of modified treatment policies
Nicholas Williams and Iván Díaz. “lmtp: An R package for estimating the causal effects of modified treatment policies”. In:Obs Stud9(2) (2023), pp. 103–122.URL:https://muse.jhu.edu/article/883479
work page 2023
-
[18]
Nicholas Williams.ife: Autodiff for Influence Function Based Estimates. R package version 0.2.0. 2025
work page 2025
-
[19]
Addressing Positivity Violations in Continuous Interventions through Data-Adaptive Strategies
Han Bao and Michael Schomaker. “Addressing Positivity Violations in Continuous Interventions through Data-Adaptive Strategies”. In:arXiv preprint arXiv:2502.14566(2025)
-
[20]
Herbert Susmann et al. “Longitudinal generalizations of the average treatment effect on the treated for multi-valued and continuous treatments”. In:arXiv preprint arXiv:2405.06135v2(2024)
-
[21]
Mark J van der Laan, Eric C Polley, and Alan E Hubbard. “Super Learner”. In:Stat Appl Genet Mol Biol6.1 (2007).DOI:10.2202/1544-6115.1309
-
[22]
Regression shrinkage and selection via the lasso
Robert Tibshirani. “Regression shrinkage and selection via the lasso”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology58.1 (1996), pp. 267–288
work page 1996
-
[23]
Multivariate adaptive regression splines
Jerome H Friedman. “Multivariate adaptive regression splines”. In:Ann Stat19.1 (1991), pp. 1–67
work page 1991
-
[24]
Leo Breiman. “Random forests”. In:Machine learning45.1 (2001), pp. 5–32
work page 2001
-
[25]
BART: Bayesian additive regression trees
Hugh A Chipman, Edward I George, and Robert E McCulloch. “BART: Bayesian additive regression trees”. In: (2010)
work page 2010
-
[26]
Xgboost: A scalable tree boosting system
Tianqi Chen and Carlos Guestrin. “Xgboost: A scalable tree boosting system”. In:Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016, pp. 785–794. [27]R Core Team. R: A Language and Environment for Statistical Computing. Version 4.4.2. R Foundation for Statistical Computing; 2024.https://www.R-project.org/
work page 2016
-
[27]
Nicholas Williams.mlr3superlearner: Super Learner Fitting and Prediction. R package version 0.1.2. 2024. DOI:10.32614/CRAN.package.mlr3superlearner.URL: https://CRAN.R-project.org/package=mlr3superlearner. Supporting Information 7
work page doi:10.32614/cran.package.mlr3superlearner.url: 2024
-
[28]
Accessed May 8, 2025.https://CRAN.R-project.org/package=torch
Daniel Falbel and Javier Luraschi.torch: Tensors and Neural Networks with ‘GPU’ Acceleration. Accessed May 8, 2025.https://CRAN.R-project.org/package=torch. Version 0.13.0. 2024. Supporting Information 8
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.