arxiv: 2512.00175 · v3 · submitted 2025-11-28 · 📊 stat.ME · cs.LG· stat.ML

Recognition: 1 theorem link

· Lean Theorem

Comparing Two Proxy Methods for Causal Identification

Helen Guo , Elizabeth L. Ogburn , Ilya Shpitser

Authors on Pith no claims yet

Pith reviewed 2026-05-17 03:18 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML

keywords proxy variablescausal identificationunmeasured confoundingbridge equationsarray decompositionlatent factorscounterfactual inferencecausal inference methods

0 comments

The pith

Proxy variable methods for causal identification split into bridge equation and array decomposition approaches with distinct model restrictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contrasts bridge equation methods, which recover causal targets by solving integral equations, against array decomposition methods, which identify counterfactual quantities by recovering latent factors through eigendecomposition. It examines the specific model restrictions each approach requires and explains what those restrictions imply for when each method can succeed. Readers would care because proxy methods offer a route to causal effects when direct measurement of confounders is impossible, yet the choice between approaches hinges on which set of assumptions fits the data-generating process at hand.

Core claim

Bridge equation methods leverage solutions to integral equations to recover causal targets, while array decomposition methods recover latent factors used to identify counterfactual quantities via eigendecomposition tasks. Comparing the model restrictions underlying these two approaches provides insight into the implications of the underlying assumptions and clarifies the scope of applicability for each method.

What carries the argument

Bridge equation methods that solve integral equations versus array decomposition methods that perform eigendecomposition on latent factors, each used to identify causal effects from proxy variables.

If this is right

Choice of method depends on whether the setting satisfies the integral equation conditions or the eigendecomposition conditions.
Each approach applies only within the range of its particular restrictions on the joint distribution of observed and latent variables.
Researchers gain guidance on which proxy-based identification strategy is feasible for a given unmeasured confounding problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The comparison may help identify settings where one method's assumptions are easier to justify than the other's.
Extensions could explore whether combining elements of both approaches relaxes restrictions in some cases.
The insights apply directly to any causal query that can be expressed through the same latent structure.

Load-bearing premise

The model restrictions and assumptions of the two methods are distinct enough to allow a meaningful comparison that clarifies applicability.

What would settle it

A concrete data-generating process in which one method identifies the target causal effect while the other fails under identical proxy observations would test whether their scopes truly differ.

Figures

Figures reproduced from arXiv: 2512.00175 by Elizabeth L. Ogburn, Helen Guo, Ilya Shpitser.

**Figure 1.** Figure 1: Observed confounding; Supports A,Y, C (2) E[Y (1)] − E[Y (0)] = Z E[Y | A = 1, C = c] − E[Y | A = 0, C = c] fC(c) dc. In general, Equation 1 holds if C satisfies conditional ignorability assumption Y (a) ⊥⊥ A | C. This assumption is justified when, by the rules of d-separation, conditioning on C blocks all backdoor paths (which start with an arrowhead into A) from A to Y (Pearl, 2009; Richardson and R… view at source ↗

**Figure 2.** Figure 2: Miao et al. (2018); Supports A,Y,U,W,Z ASSUMPTION 4. W ⊥⊥ Z, A | U. ASSUMPTION 5. Z ⊥⊥ Y | U, A. In addition to Assumption 4 and 5, Miao, Geng and Tchetgen Tchetgen (2018) presume Assumptions 6 and 7, where E[g(u) | z, a] is a vector indexed by values z ∈ Z. A detailed proof of identification is given in their paper. We will elaborate upon these additional two assumptions and their meanings in Sections 2.0… view at source ↗

**Figure 3.** Figure 3: Kuroki and Pearl (2014); Supports A,Y,U,W,Z [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Latent Variable with Proxies; Supports Y,W,Z,L ASSUMPTION 15. For all i ̸= j, the conditional distributions fY |L(y | li) and fY |L(y | lj ) differ with positive probability under the marginal distribution of Y . Hu and Schennach (2008) prove that the full law fY,W,Z,U (y, w, z,l) is uniquely determined up to labeling of latent states [Assumptions 19 or 20 give the labelings]. For ease of comparison, we r… view at source ↗

**Figure 5.** Figure 5: Hidden confounding with observed mediation; [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Analogue of Kuroki and Pearl (2014); Supports A,M,Y,U,W,Z CANAY, I. A., SANTOS, A. and SHAIKH, A. M. (2013). On the Testability of Identification in Some Nonparametric Models With Endogeneity. Econometrica 81 2535-2559. https://doi.org/10.3982/ ECTA10851 CUI, Y., PU, H., SHI, X., MIAO, W. and TCHETGEN TCHETGEN, E. (2023). Semiparametric proximal causal inference. Journal of the American Statistical Ass… view at source ↗

read the original abstract

Identifying causal effects in the presence of unmeasured variables is a fundamental challenge in causal inference, for which proxy variable methods have emerged as a powerful solution. We contrast two major approaches in this framework: (1) bridge equation methods, which leverage solutions to integral equations to recover causal targets, and (2) array decomposition methods, which recover latent factors used to identify counterfactual quantities via eigendecomposition tasks. We compare the model restrictions underlying these two approaches and provide insight into implications of the underlying assumptions, clarifying the scope of applicability for each method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper compares the model restrictions of bridge-equation and array-decomposition proxy methods for causal effects with unmeasured variables, but adds no new identification results or formal unification.

read the letter

The main thing here is a side-by-side look at two existing proxy approaches for causal identification when there are hidden variables. Bridge equation methods solve integral equations to recover targets, while array decomposition methods pull out latent factors via eigendecompositions. The authors lay out the differing model restrictions and give qualitative comments on what those restrictions imply for when each method can be applied in practice.

Referee Report

1 major / 2 minor

Summary. The paper contrasts two proxy-based approaches to causal identification with unmeasured variables: (1) bridge equation methods that recover targets by solving integral equations and (2) array decomposition methods that recover latent factors via eigendecomposition to identify counterfactual quantities. It compares the model restrictions of each approach and discusses the implications of those restrictions for the scope of applicability of each method.

Significance. A clear, high-level comparison of assumption sets could help applied researchers choose between existing proxy techniques. The manuscript's modest scope—descriptive contrast without new identification theorems, unification, or simulation evidence—means its value lies mainly in organizing existing ideas rather than advancing the technical frontier. No machine-checked proofs or reproducible code are supplied.

major comments (1)

[§4] The central claim that the comparison 'clarifies the scope of applicability' rests on a qualitative discussion of model restrictions; without an explicit statement of the identification regions or a concrete counter-example showing non-overlapping applicability (e.g., a data-generating process identifiable by one method but not the other), the insight remains too general to be load-bearing for the stated contribution.

minor comments (2)

Notation for the latent factors and the integral operators is introduced without a consolidated table of symbols, making it difficult to track assumptions across the two methods.
[References] The manuscript would benefit from citing recent work on proxy variable identification (e.g., extensions of Miao et al. or related tensor decomposition results) to situate the comparison within the current literature.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their helpful comments on our manuscript comparing bridge equation and array decomposition methods for proxy-based causal identification. We address the major comment below and will incorporate revisions to strengthen the paper's contribution.

read point-by-point responses

Referee: [§4] The central claim that the comparison 'clarifies the scope of applicability' rests on a qualitative discussion of model restrictions; without an explicit statement of the identification regions or a concrete counter-example showing non-overlapping applicability (e.g., a data-generating process identifiable by one method but not the other), the insight remains too general to be load-bearing for the stated contribution.

Authors: We agree that a concrete counter-example would make the differences in scope more tangible and strengthen the central claim. In the revised version, we will add to Section 4 an explicit discussion of the identification regions implied by each method's assumptions, followed by a simple data-generating process that satisfies the conditions for array decomposition but violates those for bridge equations (and vice versa). This will illustrate non-overlapping applicability without altering the manuscript's modest descriptive scope. revision: yes

Circularity Check

0 steps flagged

No significant circularity in descriptive comparison of proxy methods

full rationale

The paper is a high-level contrast of two existing proxy-variable approaches (bridge equations vs. array decomposition) for causal identification. It compares model restrictions and discusses applicability implications without introducing new derivations, theorems, or predictions that reduce to fitted parameters, self-definitions, or self-citation chains. The central claim rests on qualitative description of prior methods rather than any load-bearing step that collapses to its own inputs by construction. This is the expected honest non-finding for a comparison paper whose scope is limited to clarifying distinctions already present in the literature.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5384 in / 997 out tokens · 48842 ms · 2026-05-17T03:18:45.010850+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We contrast two major approaches... bridge equation methods... array decomposition methods... eigendecomposition tasks.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Identifying Causal Effects Using a Single Proxy Variable
stat.ML 2026-04 unverdicted novelty 6.0

Causal effects are identifiable from a single proxy of the unobserved confounder under the SPICE completeness assumption, supported by a neural estimation framework.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · cited by 1 Pith paper

[1]

barticle [author] Allman , E. E. , Rhodes , J. J. , Stanghellini , E. E. Valtorta , M. M. ( 2015 ). Parameter Identifiability of Discrete Bayesian Networks with Hidden Variables . Journal of Causal Inference 3 189--205 . 10.1515/jci-2014-0021 barticle

work page doi:10.1515/jci-2014-0021 2015
[2]

bbook [author] Brown , Lawrence D. L. D. ( 1986 ). Fundamentals of Statistical Exponential Families . Lecture Notes--Monograph Series . Institute of Mathematical Statistics , Hayward, CA . bbook

work page 1986
[3]

barticle [author] Canay , Ivan A. I. A. , Santos , Andres A. Shaikh , Azeem M. A. M. ( 2013 ). On the Testability of Identification in Some Nonparametric Models With Endogeneity . Econometrica 81 2535-2559 . https://doi.org/10.3982/ECTA10851 barticle

work page doi:10.3982/ecta10851 2013
[4]

, Pu , Hongming H

barticle [author] Cui , Yifan Y. , Pu , Hongming H. , Shi , Xu X. , Miao , Wang W. Tchetgen Tchetgen , Eric E. ( 2023 ). Semiparametric proximal causal inference . Journal of the American Statistical Association . 10.1080/01621459.2023.2191817 barticle

work page doi:10.1080/01621459.2023.2191817 2023
[5]

( 2023 )

bmisc [author] Deaner , Ben B. ( 2023 ). Controlling for Latent Confounding with Triple Proxies . bmisc

work page 2023
[6]

, yang , Alan A

barticle [author] Ghassami , Amir A. , yang , Alan A. , Shpitser , Ilya I. Tchetgen Tchetgen , Eric E. ( 2024 ). Causal Inference with Hidden Mediators . Biometrika . barticle

work page 2024
[7]

Schennach , Susanne S

barticle [author] Hu , Yingyao Y. Schennach , Susanne S. ( 2008 ). Instrumental Variable Treatment of Nonclassical Measurement Error Models . Econometrica 76 195--216 . 10.1111/j.0012-9682.2008.00823.x barticle

work page doi:10.1111/j.0012-9682.2008.00823.x 2008
[8]

Shiu , Ji-Liang J.-L

barticle [author] Hu , Yingyao Y. Shiu , Ji-Liang J.-L. ( 2022 ). A simple test of completeness in a class of nonparametric specification . Econometric Reviews 41 373--399 . 10.1080/07474938.2021.1957285 barticle

work page doi:10.1080/07474938.2021.1957285 2022
[9]

barticle [author] Kolda , Tamara G. T. G. Hong , David D. ( 2020 ). Stochastic Gradients for Large-Scale Tensor Decomposition . SIAM Journal on Mathematics of Data Science 2 1066-1095 . 10.1137/19M1266265 barticle

work page doi:10.1137/19m1266265 2020
[10]

bbook [author] Kress , R. R. ( 1989 ). Linear Integral Equations . Springer , Berlin . bbook

work page 1989
[11]

barticle [author] Kruskal , J. J. ( 1977 ). Three-way arrays: rank and uniqueness of trilinear decompositions, with applications to arithmetic complexity and statistics . Linear Algebra and its Applications 18 95--138 . 10.1016/0024-3795(77)90069-6 barticle

work page doi:10.1016/0024-3795(77)90069-6 1977
[12]

Pearl , Judea J

barticle [author] Kuroki , Manabu M. Pearl , Judea J. ( 2014 ). Measurement bias and effect restoration in causal inference . Biometrika 101 423-437 . barticle

work page 2014
[13]

, Geng , Zhi Z

barticle [author] Miao , Wang W. , Geng , Zhi Z. Tchetgen Tchetgen , Eric J E. J. ( 2018 ). Identifying causal effects with proxy variables of an unmeasured confounder . Biometrika 105 987--993 . barticle

work page 2018
[14]

barticle [author] Newey , W. K. W. K. Powell , J. L. J. L. ( 2003 ). Instrumental variable estimation of nonparametric models . Econometrica 71 1565-1578 . barticle

work page 2003
[15]

( 1988 )

bbook [author] Pearl , Judea J. ( 1988 ). Probabilistic Reasoning in Intelligent Systems . Morgan and Kaufmann, San Mateo . bbook

work page 1988
[16]

( 2009 )

bbook [author] Pearl , Judea J. ( 2009 ). Causality: Models, Reasoning, and Inference , 2 ed. Cambridge University Press . bbook

work page 2009
[17]

barticle [author] Rhodes , John A. J. A. ( 2010 ). A concise proof of Kruskal’s theorem on tensor decomposition . Linear Algebra and its Applications 432 1818-1824 . https://doi.org/10.1016/j.laa.2009.11.033 barticle

work page doi:10.1016/j.laa.2009.11.033 2010
[18]

barticle [author] Richardson , Thomas S. T. S. Robins , Jamie M. J. M. ( 2013 ). Single World Intervention Graphs ( SWIG s): A Unification of the Counterfactual and Graphical Approaches to Causality . preprint: http://www.csss.washington.edu/Papers/wp128.pdf . barticle

work page 2013
[19]

, Glymour , Clark C

bbook [author] Spirtes , Peter P. , Glymour , Clark C. Scheines , Richard R. ( 2001 ). Causation, Prediction, and Search , 2 ed. Springer Verlag, New York . bbook

work page 2001
[20]

barticle [author] Stegeman , A. A. Sidiropoulos , N. N. ( 2007 ). On Kruskal’s uniqueness condition for the Candecomp/Parafac decomposition . Linear Algebra and its Applications 420 540--552 . 10.1016/j.laa.2006.08.010 barticle

work page doi:10.1016/j.laa.2006.08.010 2007
[21]

( 2012 )

barticle [author] Uschmajew , Andr\' e A. ( 2012 ). Local Convergence of the Alternating Least Squares Algorithm for Canonical Tensor Approximation . SIAM Journal on Matrix Analysis and Applications 33 639-652 . 10.1137/110843587 barticle

work page doi:10.1137/110843587 2012
[22]

bmisc [author] Zhou , Y. Y. Tchetgen , E. Tchetgen E. T. ( 2024 ). Causal Inference for a Hidden Treatment . bmisc

work page 2024