pith. sign in

arxiv: 2509.18484 · v2 · submitted 2025-09-23 · 📊 stat.ML · cs.LG

Estimating Heterogeneous Causal Effect on Networks via Orthogonal Learning

Pith reviewed 2026-05-18 15:17 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords causal inferenceheterogeneous effectsnetwork interferenceorthogonal learninggraph neural networksspillover effectsattention model
0
0 comments X

The pith

A two-stage orthogonal learning method estimates heterogeneous direct and spillover causal effects on networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a procedure that first trains graph neural networks to capture how covariates and network links create confounding and dependence. It then residualizes those estimates and fits an attention-based model for interference in a second stage. Neyman orthogonalization combined with cross-fitting ensures that mistakes in the first stage affect the final causal estimates only at higher order. The result is edge-level spillover estimates plus node and population summaries, together with a bootstrap procedure for uncertainty quantification. A reader would care because the approach makes it feasible to recover varying treatment effects that spread unevenly across connected units without the usual first-stage bias dominating the answer.

Core claim

The central claim is that a two-stage procedure—graph neural networks for nuisance functions in stage one, followed by residualization and an attention-based interference model in stage two—delivers consistent estimates of heterogeneous direct and spillover effects on networks once Neyman orthogonal scores and cross-fitting are applied, so that first-stage estimation errors enter the second-stage expansion only at higher order.

What carries the argument

The Neyman-orthogonal score inside a cross-fitted two-stage estimator, where graph neural networks model the nuisance functions that capture covariate and network dependence, and an attention-based interference model extracts the heterogeneous effects in the second stage.

Load-bearing premise

The graph neural networks in the first stage must capture the dependence on covariates and network structure well enough that residualizing them removes all leading bias from the second-stage attention model.

What would settle it

Run the procedure on simulated networks where the first-stage graph neural networks are deliberately misspecified so they leave a non-negligible linear term in the residuals, then check whether the estimated heterogeneous spillover effects remain consistent with the known ground truth.

Figures

Figures reproduced from arXiv: 2509.18484 by Yuanchen Wu, Yubai Yuan.

Figure 1
Figure 1. Figure 1: Causal diagram for an ego unit i on network where units j and k are two neighbors of i. The magnitude of these spillover effects vary among voters depending on their ideological alignment and socioeconomic status. Crit￾ically, the sign of the spillover effects also differ based on voters’ ideological positions, which is the well-documented phenomenon of political polarization and echo chambers [3]. Moreove… view at source ↗
Figure 2
Figure 2. Figure 2: Two-stage orthogonal learning framework for estimating direct and spillover effects under an additive [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Spillover effects in political polarization [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Edge-level interference estimation. (a): pairwise influence recovery. (b,c): influential neighbors [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Estimating causal effects on networks is challenging because treatments may affect both treated units and their neighbors, while network homophily induces dependence and confounding. These challenges are amplified when causal effects are heterogeneous across units and edges. We propose a two-stage orthogonal learning framework for estimating heterogeneous direct and spillover effects on networks. The first stage uses graph neural networks to estimate nuisance components that capture complex dependence on covariates and network structure. The second stage residualizes these nuisance components and estimates causal effects through an interpretable attention-based interference model, yielding edge-level spillover estimates as well as node- and population-level summaries. Neyman orthogonalization and cross-fitting reduce sensitivity to first-stage estimation error, so nuisance errors enter only at higher order. We further develop a bootstrap-based uncertainty quantification procedure for the estimated spillover matrix, enabling pointwise and simultaneous inference for heterogeneous edge- and node-level effects. Experiments show that our method improves heterogeneous effect estimation while supporting interpretable downstream analyses such as influential-neighbor detection and spillover-sign recovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes a two-stage orthogonal learning framework for estimating heterogeneous direct and spillover causal effects on networks. Graph neural networks estimate nuisance components in the first stage to capture covariate and network dependencies. The second stage residualizes these and fits an attention-based interference model to obtain edge-level spillover estimates along with node- and population-level summaries. Neyman orthogonality combined with cross-fitting is invoked to ensure first-stage estimation errors affect the target estimator only at higher order. A bootstrap procedure is developed for uncertainty quantification of the spillover matrix, and experiments are reported to show gains in heterogeneous effect estimation and support for downstream tasks such as influential-neighbor detection.

Significance. If the higher-order bias property holds under network dependence, the framework would offer a practical advance for causal inference with interference by delivering interpretable heterogeneous spillover estimates via attention weights. The bootstrap for pointwise and simultaneous inference on edge- and node-level effects is a concrete strength. The work adapts standard Neyman orthogonalization to GNN nuisance estimation and attention-based modeling, which could be useful when network homophily and complex dependence are present.

major comments (1)
  1. [Cross-fitting and Neyman orthogonality (methods / theoretical analysis)] The central claim that Neyman orthogonalization and cross-fitting reduce first-stage errors to higher order (stated in the abstract and elaborated in the two-stage framework) assumes that cross-fit folds produce nuisance estimates that are asymptotically independent of the second-stage observations. On networks, however, units remain dependent through edges and homophily; standard random or k-fold splits do not necessarily break this dependence when the network is connected or contains dense clusters. Consequently, the remainder term in the orthogonal expansion may retain a first-order component proportional to network-induced covariance between folds. This directly undermines the higher-order bias guarantee and requires either additional theoretical conditions (e.g., network mixing or sparsity assumptions) or a modified cross-fitting scheme that respects network structure.
minor comments (2)
  1. [Abstract] The abstract states that experiments demonstrate improvement, yet specific quantitative comparisons (e.g., MSE or coverage rates against baselines) are not summarized; adding one or two key metrics would strengthen the claim.
  2. [Model description] Notation for the attention weights and the spillover matrix should be introduced with a clear mapping to the estimands (direct vs. spillover) to improve readability for readers unfamiliar with the attention-based interference model.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their insightful comments, which help clarify the scope of our theoretical guarantees. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Cross-fitting and Neyman orthogonality (methods / theoretical analysis)] The central claim that Neyman orthogonalization and cross-fitting reduce first-stage errors to higher order (stated in the abstract and elaborated in the two-stage framework) assumes that cross-fit folds produce nuisance estimates that are asymptotically independent of the second-stage observations. On networks, however, units remain dependent through edges and homophily; standard random or k-fold splits do not necessarily break this dependence when the network is connected or contains dense clusters. Consequently, the remainder term in the orthogonal expansion may retain a first-order component proportional to network-induced covariance between folds. This directly undermines the higher-order bias guarantee and requires either additional theoretical conditions (e.g., network mixing or sparsity assumptions) or a a

    Authors: We agree that the validity of the higher-order bias property under network dependence merits explicit discussion. Our analysis relies on the network satisfying standard weak-dependence conditions (bounded maximum degree and network mixing) that make the covariance between cross-fit folds vanish at a sufficient rate; these conditions are implicit in the GNN nuisance estimation step but were not stated as formal assumptions. We will revise the theoretical section to add these conditions explicitly and to note that the result may not hold for fully dense or non-mixing networks. We will also add a brief discussion of network-aware splitting (e.g., via graph partitioning) as a practical safeguard, together with a small simulation check. These changes strengthen the manuscript without altering the core method or empirical results. revision: partial

Circularity Check

0 steps flagged

No significant circularity; standard Neyman orthogonalization applied to distinct stages

full rationale

The paper's derivation chain applies established Neyman orthogonalization and cross-fitting to a two-stage procedure (GNN nuisance estimation followed by attention-based interference modeling). These techniques are invoked as external properties that ensure higher-order remainder terms, without the central result reducing to a self-definition, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. The first- and second-stage models are explicitly separated, and network dependence is treated as an assumption rather than derived from the estimator itself. The framework remains self-contained against external benchmarks for orthogonal learning.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the framework depends on modeling choices for nuisance estimation and interference that are not fully specified.

free parameters (2)
  • GNN architecture and hyperparameters
    Chosen to estimate nuisance components that capture covariate and network dependence; values are fitted during the first stage.
  • Attention weights in the interference model
    Learned in the second stage to produce edge-level spillover estimates.
axioms (2)
  • domain assumption Neyman orthogonality holds for the chosen first-stage estimators
    Invoked so that first-stage errors affect the target parameters only at higher order.
  • ad hoc to paper The attention-based interference model correctly represents the spillover mechanism
    Assumed when moving from residualized data to edge-level estimates.
invented entities (1)
  • attention-based interference model no independent evidence
    purpose: To produce interpretable edge-level spillover estimates and node-level summaries
    Introduced as the second-stage estimator; no independent evidence outside the paper is provided in the abstract.

pith-pipeline@v0.9.0 · 5698 in / 1530 out tokens · 48304 ms · 2026-05-18T15:17:22.186450+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    Estimating average causal effects under general interference, with application to a social network experiment

    Peter M Aronow and Cyrus Samii. Estimating average causal effects under general interference, with application to a social network experiment. 2017

  2. [2]

    Aronow and Cyrus Samii

    Peter M. Aronow and Cyrus Samii. Estimating average causal effects under general interference, with application to a social network experiment.The Annals of Applied Statistics, 11(4):1912 – 1947, 2017

  3. [3]

    Exposure to opposing views on social media can increase political polarization.Proceedings of the National Academy of Sciences, 115(37):9216–9221, 2018

    Christopher A Bail, Laura P Argyle, Taylor W Brown, John P Bumpus, Haohan Chen, M Brooke Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, and Alexander V olfovsky. Exposure to opposing views on social media can increase political polarization.Proceedings of the National Academy of Sciences, 115(37):9216–9221, 2018

  4. [4]

    Heterogeneous treatment and spillover effects under clustered network interference.The Annals of Applied Statistics, 19(1):28– 55, 2025

    Falco J Bargagli-Stoffi, Costanza Tortù, and Laura Forastiere. Heterogeneous treatment and spillover effects under clustered network interference.The Annals of Applied Statistics, 19(1):28– 55, 2025

  5. [5]

    Springer Science & Business Media, 1998

    Béla Bollobás.Modern graph theory, volume 184. Springer Science & Business Media, 1998

  6. [6]

    A 61-million-person experiment in social influence and political mobilization.Nature, 489(7415):295–298, 2012

    Robert M Bond, Christopher J Fariss, Jason J Jones, Adam DI Kramer, Cameron Marlow, Jaime E Settle, and James H Fowler. A 61-million-person experiment in social influence and political mobilization.Nature, 489(7415):295–298, 2012

  7. [7]

    Doubly robust causal effect estimation under networked interference via targeted learning

    Weilin Chen, Ruichu Cai, Zeqin Yang, Jie Qiao, Yuguang Yan, Zijian Li, and Zhifeng Hao. Doubly robust causal effect estimation under networked interference via targeted learning. In Proceedings of the 41st International Conference on Machine Learning, pages 6457–6485. PMLR, 2024

  8. [8]

    Double/debiased machine learning for treatment and structural parameters, 2018

    Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters, 2018

  9. [9]

    Identification and estimation of treatment and interference effects in observational studies on networks.Journal of the American Statistical Association, 116(534):901–918, 2021

    Laura Forastiere, Edoardo M Airoldi, and Fabrizia Mealli. Identification and estimation of treatment and interference effects in observational studies on networks.Journal of the American Statistical Association, 116(534):901–918, 2021

  10. [10]

    Orthogonal statistical learning.The Annals of Statistics, 51(3):879–908, 2023

    Dylan J Foster and Vasilis Syrgkanis. Orthogonal statistical learning.The Annals of Statistics, 51(3):879–908, 2023

  11. [11]

    Generalization and representational limits of graph neural networks

    Vikas Garg, Stefanie Jegelka, and Tommi Jaakkola. Generalization and representational limits of graph neural networks. InInternational conference on machine learning, pages 3419–3430. PMLR, 2020

  12. [12]

    Social networks and the identification of peer effects.Journal of Business & Economic Statistics, 31(3):253–264, 2013

    Paul Goldsmith-Pinkham and Guido W Imbens. Social networks and the identification of peer effects.Journal of Business & Economic Statistics, 31(3):253–264, 2013

  13. [13]

    Learning individual causal effects from networked observational data

    Ruocheng Guo, Jundong Li, and Huan Liu. Learning individual causal effects from networked observational data. InProceedings of the 13th International Conference on Web Search and Data Mining (WSDM), pages 232–240. ACM, 2020

  14. [14]

    Model-based regression adjustment with model-free covariates for network interference.Journal of Causal Inference, 11(1):20230005, 2023

    Kevin Han and Johan Ugander. Model-based regression adjustment with model-free covariates for network interference.Journal of Causal Inference, 11(1):20230005, 2023

  15. [15]

    Modeling interference for individual treatment effect estimation from networked observational data.ACM Transactions on Knowledge Discovery from Data, 18(3):1–21, 2023

    Qiang Huang, Jing Ma, Jundong Li, Ruocheng Guo, Huiyan Sun, and Yi Chang. Modeling interference for individual treatment effect estimation from networked observational data.ACM Transactions on Knowledge Discovery from Data, 18(3):1–21, 2023

  16. [16]

    Toward causal inference with interference

    Michael G Hudgens and M Elizabeth Halloran. Toward causal inference with interference. Journal of the American Statistical Association, 103(482):832–842, 2008

  17. [17]

    Estimating causal effects on networked observational data via representation learning

    Song Jiang, Yaliang Li, Jing Gao, and Aidong Zhang. Estimating causal effects on networked observational data via representation learning. InProceedings of the 31st ACM International Conference on Information & Knowledge Management, pages 6457–6466. ACM, 2022. 10

  18. [18]

    Johansson, Uri Shalit, Nathan Kallus, and David Sontag

    Fredrik D. Johansson, Uri Shalit, Nathan Kallus, and David Sontag. Generalization bounds and representation learning for estimation of potential outcomes and causal effects.Journal of Machine Learning Research, 23(166):1–48, 2022

  19. [19]

    A fast and high quality multilevel scheme for partitioning irregular graphs.SIAM Journal on Scientific Computing, 20(1):359–392, 1998

    George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs.SIAM Journal on Scientific Computing, 20(1):359–392, 1998

  20. [20]

    Towards optimal doubly robust estimation of heterogeneous causal effects

    Edward H Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17(2):3008–3049, 2023

  21. [21]

    Semiparametric doubly robust targeted double machine learning: a review

    Edward H Kennedy. Semiparametric doubly robust targeted double machine learning: a review. Handbook of Statistical Methods for Precision Medicine, pages 207–236, 2024

  22. [22]

    Edward H Kennedy, Zongming Ma, Matthew D McHugh, and Dylan S Small. Non-parametric methods for doubly robust estimation of continuous treatment effects.Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(4):1229–1245, 2017

  23. [23]

    Graph machine learning based doubly robust estimator for network causal effects.arXiv preprint arXiv:2403.11332, 2024

    Seyedeh Baharan Khatami, Harsh Parikh, Haowei Chen, Sudeepa Roy, and Babak Salimi. Graph machine learning based doubly robust estimator for network causal effects.arXiv preprint arXiv:2403.11332, 2024

  24. [24]

    Metalearners for estimating heterogeneous treatment effects using machine learning.Proceedings of the national academy of sciences, 116(10):4156–4165, 2019

    Sören R Künzel, Jasjeet S Sekhon, Peter J Bickel, and Bin Yu. Metalearners for estimating heterogeneous treatment effects using machine learning.Proceedings of the national academy of sciences, 116(10):4156–4165, 2019

  25. [25]

    Treatment and spillover effects under network interference.Review of Economics and Statistics, 102(2):368–380, 2020

    Michael P Leung. Treatment and spillover effects under network interference.Review of Economics and Statistics, 102(2):368–380, 2020

  26. [26]

    Causal inference under approximate neighborhood interference.Economet- rica, 90(1):267–293, 2022

    Michael P Leung. Causal inference under approximate neighborhood interference.Economet- rica, 90(1):267–293, 2022

  27. [27]

    Random graph asymptotics for treatment effect estimation under network interference.The Annals of Statistics, 50(4):2334–2358, 2022

    Shuangning Li and Stefan Wager. Random graph asymptotics for treatment effect estimation under network interference.The Annals of Statistics, 50(4):2334–2358, 2022

  28. [28]

    Learning causal effects on hypergraphs

    Jing Ma, Mengting Wan, Longqi Yang, Jundong Li, Brent Hecht, and Jaime Teevan. Learning causal effects on hypergraphs. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1202–1212, 2022

  29. [29]

    Causal inference under networked interference and interven- tion policy enhancement

    Yunpu Ma and V olker Tresp. Causal inference under networked interference and interven- tion policy enhancement. InProceedings of The 24th International Conference on Artificial Intelligence and Statistics, pages 3700–3708. PMLR, 2021

  30. [30]

    Identification of endogenous social effects: The reflection problem.The review of economic studies, 60(3):531–542, 1993

    Charles F Manski. Identification of endogenous social effects: The reflection problem.The review of economic studies, 60(3):531–542, 1993

  31. [31]

    Identification of treatment response with social interactions.The Economet- rics Journal, 16(1):S1–S23, 2013

    Charles F Manski. Identification of treatment response with social interactions.The Economet- rics Journal, 16(1):S1–S23, 2013

  32. [32]

    Quasi-oracle estimation of heterogeneous treatment effects

    Xinkun Nie and Stefan Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021

  33. [33]

    Causal inference for social network data.Journal of the American Statistical Association, 119(545):597–611, 2024

    Elizabeth L Ogburn, Oleg Sofrygin, Ivan Diaz, and Mark J Van der Laan. Causal inference for social network data.Journal of the American Statistical Association, 119(545):597–611, 2024

  34. [34]

    Validating causal inference methods

    Harsh Parikh, Carlos Varjao, Louise Xu, and Eric Tchetgen Tchetgen. Validating causal inference methods. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 17346–17358. PMLR...

  35. [35]

    Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of educational Psychology, 66(5):688, 1974

    Donald B Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of educational Psychology, 66(5):688, 1974

  36. [36]

    Debiased machine learning of conditional average treatment effects and other causal functions.The Econometrics Journal, 24(2):264–289, 2021

    Vira Semenova and Victor Chernozhukov. Debiased machine learning of conditional average treatment effects and other causal functions.The Econometrics Journal, 24(2):264–289, 2021. 11

  37. [37]

    Towards understanding generalization of graph neural networks

    Huayi Tang and Yong Liu. Towards understanding generalization of graph neural networks. In International Conference on Machine Learning, pages 33674–33719. PMLR, 2023

  38. [38]

    Estimation of causal peer influence effects

    Panos Toulis and Edward Kao. Estimation of causal peer influence effects. InInternational conference on machine learning, pages 1489–1497. PMLR, 2013

  39. [39]

    Survey on generaliza- tion theory for graph neural networks.arXiv preprint arXiv:2503.15650, 2025

    Antonis Vasileiou, Stefanie Jegelka, Ron Levie, and Christopher Morris. Survey on generaliza- tion theory for graph neural networks.arXiv preprint arXiv:2503.15650, 2025

  40. [40]

    Graph Attention Networks

    Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks.arXiv preprint arXiv:1710.10903, 2017

  41. [41]

    Sarma, Michael M

    Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds.ACM Transactions on Graphics (TOG), 38(5):146, 2019

  42. [42]

    Causal graph transformer for treatment effect estimation under unknown interference

    Anpeng Wu, Haiyi Qiu, Zhengming Chen, Zijian Li, Ruoxuan Xiong, Fei Wu, and Kun Zhang. Causal graph transformer for treatment effect estimation under unknown interference. In Proceedings of the 13th International Conference on Learning Representations (ICLR), 2025. 12