Estimation Strategies for Causal Decomposition Analysis with Allowability Specifications

Aster Meche; John W. Jackson; Ting-Hsuan Chang; Trang Q. Nguyen

arxiv: 2602.07825 · v2 · submitted 2026-02-08 · 📊 stat.ME

Estimation Strategies for Causal Decomposition Analysis with Allowability Specifications

John W. Jackson , Ting-Hsuan Chang , Aster Meche , Trang Q. Nguyen This is my paper

Pith reviewed 2026-05-16 06:44 UTC · model grok-4.3

classification 📊 stat.ME

keywords causal decompositiondisparitiesbridging estimatorsmultiply robustpotential outcomesallowabilityhealth disparities

0 comments

The pith

Bridging estimators for causal decomposition let analysts quantify disparity reductions from allowable interventions without modeling densities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops practical estimation tools for causal decomposition analysis, which models how interventions on selected covariates could shrink group differences in an outcome under the potential outcomes framework. It introduces bridging estimators that connect distributions across allowability specifications by avoiding explicit density modeling altogether, alongside sequential weighted regression estimators that remain consistent if any of several nuisance models is correct. These methods also come with diagnostics for the quality of the weighting functions they use and are shown to be robust to misspecification through formal proofs and real-data-based simulations. The work clarifies how this causal approach differs from traditional econometric decompositions while preserving interpretability when allowability choices are made explicit.

Core claim

Bridging estimators connect the covariate distributions under different allowability rules without modeling any density, while sequential weighted regression estimators achieve multiple robustness for the causal parameters that represent disparity reductions achievable by intervening on allowable factors.

What carries the argument

Bridging estimators that link allowable and full covariate distributions via reweighting or regression without density estimation.

If this is right

Analysts can estimate causal effects of interventions on allowable covariates even when outcome densities are difficult to model correctly.
Multiple robustness protects against misspecification of any single nuisance function used in the estimation.
Diagnostics allow direct checks on whether the weighting functions or density models are adequate for the observed data.
The methods support evaluation of multilevel or multimodal interventions by varying which factors are treated as allowable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bridging construction could be adapted to other settings that require distribution shifts under constrained covariate sets, such as fairness constraints in predictive modeling.
Because the estimators avoid density modeling, they may pair naturally with flexible machine-learning nuisance estimators while retaining the multiple-robustness guarantee.
Application to electronic health records suggests the approach can handle the high-dimensional covariate structures typical of real disparities data.

Load-bearing premise

The identifying assumptions of the potential outcomes framework hold and the allowability specifications correctly identify which covariates count as fair to use for defining the disparity and the intervention.

What would settle it

In a controlled simulation with a known true disparity-reduction value, the bridging or sequential estimators produce systematic bias even when all nuisance models are correctly specified.

read the original abstract

Causal decomposition analysis (CDA) is an approach for modeling the impact of hypothetical interventions to reduce disparities. It is useful for identifying foci that future interventions, including multilevel and multimodal interventions, could focus on to reduce disparities. Based within the potential outcomes framework, CDA has a causal interpretation when the identifying assumptions are met. CDA also allows an analyst to consider which covariates are allowable (i.e., fair) for defining the disparity in the outcome and in the point of intervention, so that its interpretation is also meaningful. While the incorporation of causal inference and allowability promotes robustness, transparency, and dialogue in disparities research, it can lead to challenges in estimation such as the need to correctly model densities. Also, how CDA differs from commonly used statistical decomposition estimators from the econometrics literature may not be clear, which may limit its uptake. To address these challenges, we provide a tour of estimation strategies for CDA, reviewing existing proposals and introducing novel estimators that overcome key estimation challenges. Among them we introduce what we call "bridging" estimators that avoid modeling any density, and sequential weighted regression estimators that are multiply robust. Additionally, we provide diagnostics to assess the quality of the nuisance density models and weighting functions they rely on. We formally establish the estimators' robustness to model mis-specification, demonstrate their performance through a simulation study based on real data, and apply them to study disparities in uncontrolled hypertension using electronic health records in a large healthcare system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The bridging estimators and multiply robust sequential regressions are the real addition here for making CDA estimation more practical.

read the letter

The paper's main move is introducing bridging estimators that skip density modeling entirely and sequential weighted regression estimators that achieve multiple robustness for causal decomposition analysis. These target the estimation headaches that come with allowability specifications in disparities work, and the abstract plus stress-test notes show they follow standard causal inference patterns for consistency under partial model correctness. It reviews existing proposals, adds diagnostics for the nuisance functions, runs a simulation grounded in real data, and applies the methods to hypertension disparities in EHR records. That package gives applied users concrete tools without forcing them into full density estimation. The formal robustness claims line up with the potential outcomes setup and do not appear to collapse into circularity or hidden positivity issues. The weakest part is that practical guidance on choosing allowability specs still rests on domain judgment rather than new diagnostics, which is a minor limitation rather than a flaw in the estimators themselves. This work is aimed at health and social science researchers who already use causal decomposition but need more stable estimation options. It is worth sending to peer review because the methods are grounded, the simulation provides some check, and the contribution addresses a documented estimation gap without overclaiming.

Referee Report

2 major / 3 minor

Summary. The manuscript develops estimation strategies for causal decomposition analysis (CDA) under allowability specifications within the potential outcomes framework. It reviews existing proposals, introduces bridging estimators that avoid any density modeling and sequential weighted regression estimators that are multiply robust, supplies diagnostics for nuisance density and weighting models, formally establishes robustness to misspecification, and illustrates performance via a simulation study based on real data together with an application to uncontrolled hypertension disparities in electronic health records.

Significance. If the formal robustness results hold, the work meaningfully advances CDA by removing the need for density estimation and providing multiply robust alternatives, thereby improving practicality and reliability for disparities research that incorporates fairness considerations via allowability. The combination of theoretical guarantees, diagnostics, and real-data illustration strengthens the contribution.

major comments (2)

[§3.2, Theorem 2] §3.2, Theorem 2 (multiply robustness): the statement that consistency holds when at least one of the sequential nuisance models is correct should explicitly derive the influence function under the allowability partition; without this step the claim that allowability does not alter the robustness conditions remains unverified.
[Simulation section, Table 2] Simulation section, Table 2: the reported bias and coverage for the bridging estimator under partial allowability (row 3) are close to the fully allowable case, but the data-generating process does not vary the strength of the allowability violation; this limits the ability to confirm that the estimator remains stable when allowability is misspecified.

minor comments (3)

[§2.1] §2.1: the definition of the allowable set A is introduced without a running numerical example; adding one early would clarify how the partition affects both the disparity and intervention functionals.
[Figure 3] Figure 3: the diagnostic plot for the weighting function lacks a reference line at the target value; this makes visual assessment of calibration harder.
[References] References: several key papers on multiply robust estimation in causal inference (e.g., recent work on sequential g-estimation) are cited only in passing; a dedicated comparison paragraph would help readers situate the new estimators.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive evaluation and constructive comments, which have helped clarify key aspects of our work. We address each major comment point by point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2, Theorem 2] §3.2, Theorem 2 (multiply robustness): the statement that consistency holds when at least one of the sequential nuisance models is correct should explicitly derive the influence function under the allowability partition; without this step the claim that allowability does not alter the robustness conditions remains unverified.

Authors: We agree that an explicit derivation of the influence function under the allowability partition is needed to fully verify that allowability specifications do not alter the multiply robustness conditions. In the revised manuscript, we will expand the proof of Theorem 2 to include this derivation, demonstrating that the partition enters only through the interpretation of the target parameter while leaving the algebraic form of the estimating equations and the robustness properties unchanged. revision: yes
Referee: [Simulation section, Table 2] Simulation section, Table 2: the reported bias and coverage for the bridging estimator under partial allowability (row 3) are close to the fully allowable case, but the data-generating process does not vary the strength of the allowability violation; this limits the ability to confirm that the estimator remains stable when allowability is misspecified.

Authors: We acknowledge that the current data-generating process does not vary the strength of the allowability violation, which limits insight into estimator stability under misspecification. We will revise the simulation study to include additional scenarios that systematically vary the magnitude of the allowability violation (e.g., by modulating the disparity parameters between allowable and non-allowable components) and will update Table 2 or add a supplementary table with the corresponding bias and coverage results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces bridging estimators (density-free) and multiply robust sequential weighted regression estimators for causal decomposition analysis under allowability specifications. These follow standard patterns from the potential outcomes framework with external identification assumptions; the formal robustness results to model misspecification derive from multiply robust properties (consistency if at least one nuisance model is correct) without reducing by construction to fitted quantities defined in terms of the target disparity. No self-definitional loops, fitted-input predictions, or load-bearing self-citation chains appear in the abstract or described methods. The simulation study on real data and nuisance diagnostics supply independent validation, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on standard causal identification assumptions plus newly proposed estimation procedures; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Potential outcomes framework identifying assumptions hold for causal interpretation of CDA
Invoked to give CDA a causal interpretation when met.

invented entities (1)

Bridging estimators no independent evidence
purpose: Avoid modeling any density in CDA estimation
Newly introduced class of estimators described in the abstract.

pith-pipeline@v0.9.0 · 5565 in / 1112 out tokens · 45225 ms · 2026-05-16T06:44:36.832609+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Ang Yu and Felix Elwert

URLhttps://arxiv.org/abs/2511.17907. Ang Yu and Felix Elwert. Nonparametric causal decomposition of group disparities.The Annals of Applied Statistics, 19(1):821–845, 2025. Paul N Zivich. Commentary: The seedy side of causal effect estimation with machine learning.Epidemiology, 35(6):787–790, 2024. 30 11 Tables Table 1: Estimator performance under correct...

work page arXiv 2025
[2]

T P(T= 1) [η1(Ay )−θ 1] | {z } ϕθ1,1(O)∈ H 1 o . (Note that T P(T=1) [η1(Ay )−θ 1] belongs inH 1 because it is a function ofG, A y , and its expectation is zero becauseθ 1 = E[η1(Ay )|T= 1].) ∂θ1(β) ∂β2 β=β 0 = ∂θ1(β2) ∂β2 β2=β0 2 = Estd Z y ∂f2(y , Ay , β2) ∂β2 β2=β0 2 d y = Estd Z y S2(y , Ay , β0 2)P(y|A y , G= 1)d y (S2() is the score function w.r.t.β...

work page
[3]

G P(G= 1) P(Ay ) P(Ay |G= 1) [Y−η 1(Ay )]|A y (by Lemma A.1) = E Pstd(Ay ) P(Ay ) E S2(Y, Ay , β0

work page
[4]

G P(G= 1) P(Ay ) P(Ay |G= 1) [Y−η 1(Ay )]|A y = E S2(Y, Ay , β0

work page
[5]

G P(G= 1) Pstd(Ay ) P(Ay |G= 1) [Y−η 1(Ay )] | {z } ϕθ1,2(O)∈ H 2 . Putting the two terms together, ϕθ1(O) = G P(G= 1) Pstd(Ay ) P(Ay |G= 1) [Y−η 1(Ay )] | {z } ϕθ1,2(O) + T P(T= 1) [η1(Ay )−θ 1] | {z } ϕθ1,1(O) , and leveraging symmetry, we have ϕθg (O) = I(G=g) P(G=g) Pstd(Ay ) P(Ay |G=g) [Y−η g(Ay )] + T P(T= 1) [ηg(Ay )−θ g].(8) Proof of Theorem 1, pa...

work page
[6]

The other terms are derived as follows

T P(T= 1) [η∗ 1(Ay )−θ ∗ 1] | {z } ϕθ∗ 1,1(O)∈ H 1 . The other terms are derived as follows. ∂θ ∗ 1(β) ∂β2 β=β 0 = ∂θ ∗ 1(β2) ∂β2 β2=β0 2 = Estd ZZ ν∗ 1(n, az , Ay ) ∂f2(n, az , Ay , β2) ∂β2 β2=β0 2 d n d az = Estd ZZ ν∗ 1(n, az , Ay )S2(n, az , Ay , β0 2)P(n, az |A y , G= 1)d n d a z (S2() is the score function w.r.t.β 2) = Estd{E[S2(N, Az , Ay , β0 2)ν∗...

work page
[7]

G P(G= 1) P(Ay ) P(Ay |G= 1) [ν∗ 1(N, Az , Ay )−η ∗ 1(Ay )]|A y (by Lemma A.1) = E Pstd(Ay ) P(Ay ) E S2(N, Az , Ay , β0

work page
[8]

G P(G= 1) P(Ay ) P(Ay |G= 1) [ν∗ 1(N, Az , Ay )−η ∗ 1(Ay )]|A y = E S2(N, Az , Ay , β0

work page
[9]

G P(G= 1) Pstd(Ay ) P(Ay |G= 1) [ν∗ 1(N, Az , Ay )−η ∗ 1(Ay )] | {z } ϕθ∗ 1,2(O)∈ H 2 . ∂θ ∗ 1(β) ∂β4 β=β 0 = ∂θ ∗ 1(β4) ∂β4 β4=β0 4 = Estd E∗ Z y ∂f4(y , Z, N, Az , Ay , β4) ∂β4 β4=β0 4 d y|A y , G= 1 = Estd E∗ Z y S4(y , Z, N, Az , Ay , β0 4)P(y|Z, N, A z , Ay , G= 1)d y|A y , G= 1 (S4() is the score function w.r.t.β 4) = Estd E∗ E[S4(Y, Z, N, Az , Ay ,...

work page
[10]

P(Z|A z , Ay , G= 0) P(Z|N, A z , Ay , G= 1) [Y−µ 1(Z, N, Az , Ay )]|A y , G= 1 = Estd E S4(Y, Z, N, Az , Ay , β0

work page
[11]

G P(G= 1) P(Ay ) P(Ay |G= 1) P(Z|A z , Ay , G= 0) P(Z|N, A z , Ay , G= 1) [Y−µ 1(Z, N, Az , Ay )]|A y (by Lemma A.1) = E Pstd(Ay ) P(Ay ) E S4(Y, Z, N, Az , Ay , β0

work page
[12]

G P(G= 1) P(Ay ) P(Ay |G= 1) P(Z|A z , Ay , G= 0) P(Z|N, A z , Ay , G= 1) [Y−µ 1(Z, N, Az , Ay )] 67 |A y = E S4(Y, Z, N, Az , Ay , β0

work page
[13]

G P(G= 1) Pstd(Ay ) P(Ay |G= 1) P(Z|A z , Ay , G= 0) P(Z|N, A z , Ay , G= 1) [Y−µ 1(Z, N, Az , Ay )] | {z } ϕθ∗ 1,4(O)∈ H 4 . ∂θ ∗ 1(β) ∂β6 β=β 0 = ∂θ ∗ 1(β6) ∂β6 β6=β0 6 = Estd E Z ζ∗ 1(z , Az , Ay ) ∂ ∂β6 f6(z , Az , Ay , β6) β6=β0 6 d z|A y , G= 1 = Estd E Z ζ∗ 1(z , Az , Ay )S6(z , Az , Ay , β0 6)P(z|A z , Ay , G= 0)d z|A y , G= 1 (S6() is the score f...

work page
[14]

1−G P(G= 0) P(Az , Ay ) P(Az , Ay |G= 0) [ζ∗ 1(Z, Az , Ay )−κ ∗ 1(Az , Ay )]|A z , Ay |A y , G= 1 (by Lemma A.1) = Estd E P(Az |A y , G= 1) P(Az |A y ) E S6(Z, Az , Ay , β0

work page
[15]

1−G P(G= 0) P(Az , Ay ) P(Az , Ay |G= 0) [ζ∗ 1(Z, Az , Ay )−κ ∗ 1(Az , Ay )] |A z , Ay |A y (density ratio weighting to swap distribution over which expectation is taken) = Estd E S6(Z, Az , Ay , β0

work page
[16]

1−G P(G= 0) P(Ay )P(Az |A y , G= 1) P(Az , Ay |G= 0) [ζ∗ 1(Z, Az , Ay )−κ ∗ 1(Az , Ay )]|A y = E Pstd(Ay ) P(Ay ) E S6(Z, Az , Ay , β0

work page
[17]

1−G P(G= 0) P(Ay )P(Az |A y , G= 1) P(Az , Ay |G= 0) [ζ∗ 1(Z, Az , Ay )−κ ∗ 1(Az , Ay )]|A y = E S6(Z, Az , Ay , β0

work page
[18]

1−G P(G= 0) Pstd(Ay )P(Az |A y , G= 1) P(Az , Ay |G= 0) [ζ∗ 1(Z, Az , Ay )−κ ∗ 1(Az , Ay )] | {z } ϕθ∗ 1,6(O)∈ H 6 . Putting the terms together, we have the IF forθ ∗ 1: ϕθ∗ 1(O) = G P(G= 1) Pstd(Ay ) P(Ay |G= 1) P(Z|A z , Ay , G= 0) P(Z|N, A z , Ay , G= 1) [Y−µ 1(Z, N, Az , Ay )]+ (ϕ θ∗ 1,4(O)) 1−G P(G= 0) Pstd(Ay )P(Az |A y , G= 1) P(Az , Ay |G= 0) [ζ∗ ...

work page
[19]

Start with the set of estimating equations that the nuisance estimators and the estimator ofθ ∗ 1 based on them (referred to generically as ˆθ∗

work page
[20]

For regression models (models of outcome means), the estimating equation is typically vector-valued

solve. For regression models (models of outcome means), the estimating equation is typically vector-valued. We require that all these models are mean-recovering, and will use only the element of the estimating equation that reflects this mean-recovering feature

work page
[21]

For details on the regularity conditions, see Boos and Stefanski (2013), theorem 7.1

Assume regularlity conditions hold such that the nuisance estimators and ˆθ∗ 1 converge to certain probability limits (indicated with a † superscript), and the estimating equations imply a set of equalities involving the probability limits. For details on the regularity conditions, see Boos and Stefanski (2013), theorem 7.1. We assume these regularity con...

work page 2013
[22]

Consider different cases where certain regression models are correctly specified and/or certain weight- ing functions are consistently estimated (replacing their probability limits with the corresponding true functionals) and show that in that case the probability limitθ ∗† 1 coincide with the true value (θ ∗ 1). When we say a regression model is correctl...

work page
[23]

Here ˆθ∗ 1 solves the estimating equation Pn{T[ˆη∗ 1(Ay )−θ ∗ 1] = 0, so we have the following equality involving the probability limits of ˆη ∗ 1() and ˆθ∗ 1: θ∗† 1 = E[η ∗† 1 |T= 1] = E std[η∗† 1 (Ay )].(49) •The second piece concerns the first step of all the SWR estimators: estimatingµ 1() by regressingY onZ, N, A z , Ay in theG= 1 sample weighted by ...

work page

[1] [1]

Ang Yu and Felix Elwert

URLhttps://arxiv.org/abs/2511.17907. Ang Yu and Felix Elwert. Nonparametric causal decomposition of group disparities.The Annals of Applied Statistics, 19(1):821–845, 2025. Paul N Zivich. Commentary: The seedy side of causal effect estimation with machine learning.Epidemiology, 35(6):787–790, 2024. 30 11 Tables Table 1: Estimator performance under correct...

work page arXiv 2025

[2] [2]

T P(T= 1) [η1(Ay )−θ 1] | {z } ϕθ1,1(O)∈ H 1 o . (Note that T P(T=1) [η1(Ay )−θ 1] belongs inH 1 because it is a function ofG, A y , and its expectation is zero becauseθ 1 = E[η1(Ay )|T= 1].) ∂θ1(β) ∂β2 β=β 0 = ∂θ1(β2) ∂β2 β2=β0 2 = Estd Z y ∂f2(y , Ay , β2) ∂β2 β2=β0 2 d y = Estd Z y S2(y , Ay , β0 2)P(y|A y , G= 1)d y (S2() is the score function w.r.t.β...

work page

[3] [3]

G P(G= 1) P(Ay ) P(Ay |G= 1) [Y−η 1(Ay )]|A y (by Lemma A.1) = E Pstd(Ay ) P(Ay ) E S2(Y, Ay , β0

work page

[4] [4]

G P(G= 1) P(Ay ) P(Ay |G= 1) [Y−η 1(Ay )]|A y = E S2(Y, Ay , β0

work page

[5] [5]

G P(G= 1) Pstd(Ay ) P(Ay |G= 1) [Y−η 1(Ay )] | {z } ϕθ1,2(O)∈ H 2 . Putting the two terms together, ϕθ1(O) = G P(G= 1) Pstd(Ay ) P(Ay |G= 1) [Y−η 1(Ay )] | {z } ϕθ1,2(O) + T P(T= 1) [η1(Ay )−θ 1] | {z } ϕθ1,1(O) , and leveraging symmetry, we have ϕθg (O) = I(G=g) P(G=g) Pstd(Ay ) P(Ay |G=g) [Y−η g(Ay )] + T P(T= 1) [ηg(Ay )−θ g].(8) Proof of Theorem 1, pa...

work page

[6] [6]

The other terms are derived as follows

T P(T= 1) [η∗ 1(Ay )−θ ∗ 1] | {z } ϕθ∗ 1,1(O)∈ H 1 . The other terms are derived as follows. ∂θ ∗ 1(β) ∂β2 β=β 0 = ∂θ ∗ 1(β2) ∂β2 β2=β0 2 = Estd ZZ ν∗ 1(n, az , Ay ) ∂f2(n, az , Ay , β2) ∂β2 β2=β0 2 d n d az = Estd ZZ ν∗ 1(n, az , Ay )S2(n, az , Ay , β0 2)P(n, az |A y , G= 1)d n d a z (S2() is the score function w.r.t.β 2) = Estd{E[S2(N, Az , Ay , β0 2)ν∗...

work page

[7] [7]

G P(G= 1) P(Ay ) P(Ay |G= 1) [ν∗ 1(N, Az , Ay )−η ∗ 1(Ay )]|A y (by Lemma A.1) = E Pstd(Ay ) P(Ay ) E S2(N, Az , Ay , β0

work page

[8] [8]

G P(G= 1) P(Ay ) P(Ay |G= 1) [ν∗ 1(N, Az , Ay )−η ∗ 1(Ay )]|A y = E S2(N, Az , Ay , β0

work page

[9] [9]

G P(G= 1) Pstd(Ay ) P(Ay |G= 1) [ν∗ 1(N, Az , Ay )−η ∗ 1(Ay )] | {z } ϕθ∗ 1,2(O)∈ H 2 . ∂θ ∗ 1(β) ∂β4 β=β 0 = ∂θ ∗ 1(β4) ∂β4 β4=β0 4 = Estd E∗ Z y ∂f4(y , Z, N, Az , Ay , β4) ∂β4 β4=β0 4 d y|A y , G= 1 = Estd E∗ Z y S4(y , Z, N, Az , Ay , β0 4)P(y|Z, N, A z , Ay , G= 1)d y|A y , G= 1 (S4() is the score function w.r.t.β 4) = Estd E∗ E[S4(Y, Z, N, Az , Ay ,...

work page

[10] [10]

P(Z|A z , Ay , G= 0) P(Z|N, A z , Ay , G= 1) [Y−µ 1(Z, N, Az , Ay )]|A y , G= 1 = Estd E S4(Y, Z, N, Az , Ay , β0

work page

[11] [11]

G P(G= 1) P(Ay ) P(Ay |G= 1) P(Z|A z , Ay , G= 0) P(Z|N, A z , Ay , G= 1) [Y−µ 1(Z, N, Az , Ay )]|A y (by Lemma A.1) = E Pstd(Ay ) P(Ay ) E S4(Y, Z, N, Az , Ay , β0

work page

[12] [12]

G P(G= 1) P(Ay ) P(Ay |G= 1) P(Z|A z , Ay , G= 0) P(Z|N, A z , Ay , G= 1) [Y−µ 1(Z, N, Az , Ay )] 67 |A y = E S4(Y, Z, N, Az , Ay , β0

work page

[13] [13]

G P(G= 1) Pstd(Ay ) P(Ay |G= 1) P(Z|A z , Ay , G= 0) P(Z|N, A z , Ay , G= 1) [Y−µ 1(Z, N, Az , Ay )] | {z } ϕθ∗ 1,4(O)∈ H 4 . ∂θ ∗ 1(β) ∂β6 β=β 0 = ∂θ ∗ 1(β6) ∂β6 β6=β0 6 = Estd E Z ζ∗ 1(z , Az , Ay ) ∂ ∂β6 f6(z , Az , Ay , β6) β6=β0 6 d z|A y , G= 1 = Estd E Z ζ∗ 1(z , Az , Ay )S6(z , Az , Ay , β0 6)P(z|A z , Ay , G= 0)d z|A y , G= 1 (S6() is the score f...

work page

[14] [14]

1−G P(G= 0) P(Az , Ay ) P(Az , Ay |G= 0) [ζ∗ 1(Z, Az , Ay )−κ ∗ 1(Az , Ay )]|A z , Ay |A y , G= 1 (by Lemma A.1) = Estd E P(Az |A y , G= 1) P(Az |A y ) E S6(Z, Az , Ay , β0

work page

[15] [15]

1−G P(G= 0) P(Az , Ay ) P(Az , Ay |G= 0) [ζ∗ 1(Z, Az , Ay )−κ ∗ 1(Az , Ay )] |A z , Ay |A y (density ratio weighting to swap distribution over which expectation is taken) = Estd E S6(Z, Az , Ay , β0

work page

[16] [16]

1−G P(G= 0) P(Ay )P(Az |A y , G= 1) P(Az , Ay |G= 0) [ζ∗ 1(Z, Az , Ay )−κ ∗ 1(Az , Ay )]|A y = E Pstd(Ay ) P(Ay ) E S6(Z, Az , Ay , β0

work page

[17] [17]

1−G P(G= 0) P(Ay )P(Az |A y , G= 1) P(Az , Ay |G= 0) [ζ∗ 1(Z, Az , Ay )−κ ∗ 1(Az , Ay )]|A y = E S6(Z, Az , Ay , β0

work page

[18] [18]

1−G P(G= 0) Pstd(Ay )P(Az |A y , G= 1) P(Az , Ay |G= 0) [ζ∗ 1(Z, Az , Ay )−κ ∗ 1(Az , Ay )] | {z } ϕθ∗ 1,6(O)∈ H 6 . Putting the terms together, we have the IF forθ ∗ 1: ϕθ∗ 1(O) = G P(G= 1) Pstd(Ay ) P(Ay |G= 1) P(Z|A z , Ay , G= 0) P(Z|N, A z , Ay , G= 1) [Y−µ 1(Z, N, Az , Ay )]+ (ϕ θ∗ 1,4(O)) 1−G P(G= 0) Pstd(Ay )P(Az |A y , G= 1) P(Az , Ay |G= 0) [ζ∗ ...

work page

[19] [19]

Start with the set of estimating equations that the nuisance estimators and the estimator ofθ ∗ 1 based on them (referred to generically as ˆθ∗

work page

[20] [20]

For regression models (models of outcome means), the estimating equation is typically vector-valued

solve. For regression models (models of outcome means), the estimating equation is typically vector-valued. We require that all these models are mean-recovering, and will use only the element of the estimating equation that reflects this mean-recovering feature

work page

[21] [21]

For details on the regularity conditions, see Boos and Stefanski (2013), theorem 7.1

Assume regularlity conditions hold such that the nuisance estimators and ˆθ∗ 1 converge to certain probability limits (indicated with a † superscript), and the estimating equations imply a set of equalities involving the probability limits. For details on the regularity conditions, see Boos and Stefanski (2013), theorem 7.1. We assume these regularity con...

work page 2013

[22] [22]

Consider different cases where certain regression models are correctly specified and/or certain weight- ing functions are consistently estimated (replacing their probability limits with the corresponding true functionals) and show that in that case the probability limitθ ∗† 1 coincide with the true value (θ ∗ 1). When we say a regression model is correctl...

work page

[23] [23]

Here ˆθ∗ 1 solves the estimating equation Pn{T[ˆη∗ 1(Ay )−θ ∗ 1] = 0, so we have the following equality involving the probability limits of ˆη ∗ 1() and ˆθ∗ 1: θ∗† 1 = E[η ∗† 1 |T= 1] = E std[η∗† 1 (Ay )].(49) •The second piece concerns the first step of all the SWR estimators: estimatingµ 1() by regressingY onZ, N, A z , Ay in theG= 1 sample weighted by ...

work page