pith. sign in

arxiv: 2605.21928 · v1 · pith:AWF6J47Knew · submitted 2026-05-21 · 💻 cs.LG · cs.AI· stat.ME

CausalGuard: Conformal Inference under Graph Uncertainty

Pith reviewed 2026-05-22 07:48 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ME
keywords causal inferenceconformal predictiontreatment effect estimationgraph uncertaintydoubly robust methodsBayesian model averaging
0
0 comments X

The pith

CausalGuard aggregates pseudo-outcomes from candidate causal graphs to provide finite-sample coverage guarantees for treatment effect estimates despite graph uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to estimate causal treatment effects from observational data when the causal graph is uncertain. It proposes multiple candidate graphs using an LLM edge prior, prunes them via conditional independence tests, and reweights by BIC. Doubly robust pseudo-outcomes are computed for each graph and aggregated with weights. A conformal procedure with a composite score then ensures marginal coverage in finite samples for the aggregated outcome. This matters because it allows reliable inference without assuming a single correct graph, and experiments show it meets coverage targets while keeping intervals narrower than alternatives.

Core claim

CausalGuard provides distribution-free finite-sample marginal coverage for this aggregated pseudo-outcome; under causal identification, overlap, conditional-mean nuisance stability, and concentration on target-aligned valid adjustment strategies, its conditional mean converges to the true Conditional Average Treatment Effect.

What carries the argument

A structure-weighted conformal framework that calibrates after aggregating graph-conditional doubly robust pseudo-outcomes using a composite nonconformity score on the posterior-weighted pseudo-outcome.

If this is right

  • Attains mean coverage above the nominal 90% level across five benchmarks for the target.
  • Reduces prediction interval width compared to graph-agnostic conformal methods that require large padding.
  • Suppresses invalid collider adjustment in stress tests.
  • Remains stable under misspecified priors provided the retained candidates are data-supported.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the approach scales, it could enable safer use of observational data in policy decisions where causal graphs are debated.
  • Combining this with expert knowledge to refine LLM-proposed graphs might further tighten the intervals.
  • Testing on time-series or dynamic graphs could reveal whether the method extends beyond static DAGs.

Load-bearing premise

The set of candidate DAGs must contain target-aligned valid adjustment strategies and be supported by the observed data after pruning and reweighting.

What would settle it

Observe whether coverage falls below 90% on a benchmark dataset where the true causal graph is known and the method's assumptions hold but graph uncertainty is high.

Figures

Figures reproduced from arXiv: 2605.21928 by Alexander Nemecek, Biyao Zhang, Chaoda Song, Debargha Ganguly, Erman Ayday, Jing Ma, Mohsen Hariri, Nengbo Wang, Shouren Wang, Sreehari Sankar, Van Yang, Vikash Singh, Vipin Chaudhary, Weicong Chen, Yanyan Zhang.

Figure 1
Figure 1. Figure 1: The CAUSALGUARD pipeline. Candidate DAGs are proposed from an LLM-derived edge prior, pruned by CI tests on Dtrain, scored by BIC plus a bounded structural prior, and collapsed to unique adjustment strategies before weight normalization. Graph-conditional doubly robust pseudo-outcomes are then aggregated into γ¯, and a composite split-conformal score calibrated on Dcal produces Cb(x). The distribution-free… view at source ↗
read the original abstract

Estimating treatment effects from observational data requires choosing an adjustment set, but valid adjustment depends on an unknown causal graph. Graph misspecification can cause under-coverage, while graph-agnostic conformal wrappers may regain nominal coverage only through large padding. We introduce CausalGuard, a structure-weighted conformal framework that calibrates after aggregating graph-conditional doubly robust pseudo-outcomes. Candidate DAGs are proposed from an LLM-derived edge prior, pruned by conditional-independence tests, and reweighted by Bayesian Information Criterion. A composite nonconformity score then calibrates the posterior-weighted pseudo-outcome. CausalGuard provides distribution-free finite-sample marginal coverage for this aggregated pseudo-outcome; under causal identification, overlap, conditional-mean nuisance stability, and concentration on target-aligned valid adjustment strategies, its conditional mean converges to the true Conditional Average Treatment Effect. Across five benchmarks, CausalGuard attains mean coverage above the nominal 90% level for the directly evaluable target and reduces width when graph-agnostic conformal baselines require large padding. Stress tests show that CausalGuard suppresses invalid collider adjustment and remains stable under misspecified priors when the retained candidate set is data-supported.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CausalGuard, a conformal framework for treatment effect estimation under causal graph uncertainty. Candidate DAGs are proposed via an LLM-derived edge prior, pruned by conditional-independence tests, and reweighted by BIC. Graph-conditional doubly robust pseudo-outcomes are aggregated using posterior weights, and a composite nonconformity score is applied to the posterior-weighted pseudo-outcome. The paper claims distribution-free finite-sample marginal coverage for this aggregated quantity; under causal identification, overlap, nuisance stability, and concentration on target-aligned valid adjustment strategies, the conditional mean is asserted to converge to the true CATE. Experiments across five benchmarks report coverage above the nominal 90% level and narrower intervals than graph-agnostic baselines, with stress tests indicating suppression of invalid collider adjustment.

Significance. If the central claims hold, the work provides a principled way to obtain valid conformal intervals for causal effects when the adjustment set is uncertain, potentially reducing the conservatism of purely graph-agnostic wrappers. The distribution-free finite-sample coverage for the aggregated pseudo-outcome is a clear technical strength, and the empirical demonstration that the method remains stable under misspecified priors when the retained set is data-supported is useful. The combination of LLM-assisted discovery, BIC reweighting, and conformal calibration is novel and addresses a practical gap in observational causal inference.

major comments (2)
  1. [Abstract] Abstract and theoretical claims: the finite-sample marginal coverage guarantee applies to the aggregated pseudo-outcome via the composite nonconformity score, but the claim that the conditional mean converges to the CATE rests on the unverified assumption that BIC-reweighted posteriors concentrate on graphs containing target-aligned valid adjustment strategies. No formal argument is supplied showing that BIC down-weights invalid yet data-compatible candidates (e.g., those inducing collider bias after CI pruning); the cited stress tests are empirical only and do not bound posterior mass on such graphs.
  2. [Theoretical results] Theoretical development: the manuscript should explicitly derive or cite the conditions under which posterior weighting of doubly robust pseudo-outcomes preserves the convergence property when the support includes both valid and invalid adjustment sets; the listed assumptions (identification, overlap, nuisance stability, concentration) are necessary but the interaction with the composite score and BIC reweighting requires a precise statement to support the CATE claim.
minor comments (2)
  1. The abstract would benefit from a clearer separation between the distribution-free coverage result (which holds regardless of graph correctness) and the convergence result (which requires the concentration assumption).
  2. Notation for the posterior-weighted pseudo-outcome and the composite nonconformity score should be introduced with explicit definitions early in the methods section to aid readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments on our work. We address the major comments below and outline the revisions we plan to make to strengthen the theoretical presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract and theoretical claims: the finite-sample marginal coverage guarantee applies to the aggregated pseudo-outcome via the composite nonconformity score, but the claim that the conditional mean converges to the CATE rests on the unverified assumption that BIC-reweighted posteriors concentrate on graphs containing target-aligned valid adjustment strategies. No formal argument is supplied showing that BIC down-weights invalid yet data-compatible candidates (e.g., those inducing collider bias after CI pruning); the cited stress tests are empirical only and do not bound posterior mass on such graphs.

    Authors: The referee is correct that the CATE convergence relies on the concentration assumption stated in the paper. We do not supply a formal argument that BIC universally down-weights invalid graphs, since this depends on the data and may not hold in all cases. The stress tests are indeed empirical. We will revise the abstract to better distinguish the coverage guarantee from the convergence claim and add a discussion of the assumption's role and limitations. revision: partial

  2. Referee: [Theoretical results] Theoretical development: the manuscript should explicitly derive or cite the conditions under which posterior weighting of doubly robust pseudo-outcomes preserves the convergence property when the support includes both valid and invalid adjustment sets; the listed assumptions (identification, overlap, nuisance stability, concentration) are necessary but the interaction with the composite score and BIC reweighting requires a precise statement to support the CATE claim.

    Authors: We will revise the theoretical section to provide a more precise statement of the assumptions and their interaction with the posterior weighting and composite nonconformity score. The coverage guarantee is distribution-free for the aggregated quantity. Convergence to CATE requires the concentration assumption to ensure the weighted pseudo-outcome targets the correct quantity. We will cite relevant literature on causal model averaging to support this. revision: yes

standing simulated objections not resolved
  • A general formal proof or bound that BIC reweighting concentrates posterior mass exclusively on valid adjustment sets for arbitrary invalid but data-compatible graphs.

Circularity Check

0 steps flagged

No significant circularity; coverage and convergence claims are self-contained under explicit assumptions

full rationale

The paper defines an aggregated posterior-weighted pseudo-outcome and applies a composite nonconformity score to obtain distribution-free finite-sample marginal coverage for that quantity; this is a direct application of standard conformal prediction to a constructed score and does not reduce by construction to a fitted parameter or self-referential definition. Convergence of the conditional mean to the true CATE is stated only under a list of assumptions that explicitly includes 'concentration on target-aligned valid adjustment strategies,' which is not derived or proven inside the paper but treated as a prerequisite. No load-bearing step relies on self-citation chains, ansatz smuggling, or renaming of known results; the BIC reweighting and LLM prior are procedural choices whose validity is assumed rather than tautologically enforced by the coverage guarantee. The derivation therefore remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 4 axioms · 0 invented entities

The central claims rest on standard causal identification assumptions plus the practical effectiveness of the LLM prior and the data-supported nature of the retained graph set; no new entities are postulated.

axioms (4)
  • domain assumption causal identification
    Invoked for convergence of the conditional mean to the true CATE.
  • domain assumption overlap
    Standard positivity assumption required for identification.
  • domain assumption conditional-mean nuisance stability
    Required for the convergence statement in the abstract.
  • domain assumption concentration on target-aligned valid adjustment strategies
    Assumed so that the weighted set includes valid adjustment sets.

pith-pipeline@v0.9.0 · 5786 in / 1271 out tokens · 43772 ms · 2026-05-22T07:48:23.406552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 5 internal anchors

  1. [1]

    Cambridge University Press, USA, 2nd edition, 2009

    Judea Pearl.Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X

  2. [2]

    The Econometrics Journal , volume =

    Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 02 2018. ISSN 1368-4221. doi: 10.1111/ectj.12097. URLhttps://doi.org/10.1111/ectj.12097

  3. [3]

    Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

    Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests, 2017. URLhttps://arxiv.org/abs/1510.04342

  4. [4]

    MIT Press, 2nd edition, 2000

    Peter Spirtes, Clark Glymour, and Richard Scheines.Causation, Prediction, and Search. MIT Press, 2nd edition, 2000

  5. [5]

    Lihua Lei and Emmanuel J. Candès. Conformal inference of counterfactuals and individual treatment effects.Journal of the Royal Statistical Society: Series B, 83(5):911–938, 2021

  6. [6]

    Yaniv Romano, Evan Patterson, and Emmanuel J. Candès. Conformalized quantile regression. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

  7. [7]

    Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464, 1978

    Gideon Schwarz. Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464, 1978

  8. [8]

    Springer, 2005

    Vladimir V ovk, Alexander Gammerman, and Glenn Shafer.Algorithmic Learning in a Random World. Springer, 2005

  9. [9]

    Distribution-free predictive inference for regression.Journal of the American Statistical Associ- ation, 113(523):1094–1111, 2018

    Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression.Journal of the American Statistical Associ- ation, 113(523):1094–1111, 2018

  10. [10]

    Alaa, Zaid Ahmad, and Mark van der Laan

    Ahmed M. Alaa, Zaid Ahmad, and Mark van der Laan. Conformal meta-learners for predictive inference of individual treatment effects. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  11. [11]

    Tibshirani, Rina Foygel Barber, Emmanuel J

    Ryan J. Tibshirani, Rina Foygel Barber, Emmanuel J. Candès, and Aaditya Ramdas. Conformal prediction under covariate shift. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

  12. [12]

    Ying Jin, Zhimei Ren, and Emmanuel J. Candès. Sensitivity analysis of individual treatment effects: A robust conformal inference approach.Proceedings of the National Academy of Sciences, 120(6):e2214889120, 2023

  13. [13]

    Data-driven covariate selection for nonparametric estimation of causal effects

    Doris Entner, Patrik Hoyer, and Peter Spirtes. Data-driven covariate selection for nonparametric estimation of causal effects. InAISTATS, 2013

  14. [14]

    Maathuis

    Leonard Henckel, Emilija Perkovi´c, and Marloes H. Maathuis. Graphical criteria for efficient total effect estimation via adjustment in causal linear models.Journal of the Royal Statistical Society: Series B, 84(2):579–599, 2022

  15. [15]

    Optimal structure identification with greedy search.Journal of Machine Learning Research, 3:507–554, 2002

    David Maxwell Chickering. Optimal structure identification with greedy search.Journal of Machine Learning Research, 3:507–554, 2002

  16. [16]

    Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. DAGs with NO TEARS: Continuous optimization for structure learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

  17. [17]

    Causal reasoning and large language models: Opening a new frontier for causality

    Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality.arXiv preprint arXiv:2305.00050, 2023

  18. [18]

    Causal discovery with language models as imperfect experts

    Stephanie Long, Alexandre Piché, Valentina Zantedeschi, Tibor Schuster, and Alexandre Drouin. Causal discovery with language models as imperfect experts. InICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023. URLhttps://openreview. net/forum?id=RXlvYZAE49. 10

  19. [19]

    From query tools to causal architects: Harnessing large language models for advanced causal discovery from data.arXiv preprint arXiv:2306.16902, 2023

    Taiyu Ban, Lyvzhou Chen, Xiangyu Wang, and Huanhuan Chen. From query tools to causal architects: Harnessing large language models for advanced causal discovery from data.arXiv preprint arXiv:2306.16902, 2023

  20. [20]

    Efficient Causal Graph Discovery Using Large Language Models

    Thomas Jiralerspong, Xiaoyin Chen, Yash More, Vedant Shah, and Yoshua Bengio. Efficient causal graph discovery using large language models.arXiv preprint arXiv:2402.01207, 2024

  21. [21]

    Prompting strategies for enabling large language models to infer causation from correlation.arXiv preprint arXiv:2412.13952, 2024

    Eleni Sgouritsa, Virginia Aglietti, Yee Whye Teh, Arnaud Doucet, Arthur Gretton, and Silvia Chiappa. Prompting strategies for enabling large language models to infer causation from correlation.arXiv preprint arXiv:2412.13952, 2024

  22. [22]

    Sankar, B

    Debargha Ganguly, Vikash Singh, S. Sankar, B. Zhang, X. Zhang, S. Iyengar, X. Han, et al. Grammars of formal uncertainty: When to trust LLMs in automated reasoning tasks.arXiv preprint arXiv:2505.20047, 2025. NeurIPS 2025

  23. [23]

    VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

    Vikash Singh, D. Cassel, N. Weir, N. Feng, and S. Bayless. VERGE: Formal refinement and guidance engine for verifiable LLM reasoning.arXiv preprint arXiv:2601.20055, 2026

  24. [24]

    David Madigan and Adrian E. Raftery. Model selection and accounting for model uncertainty in graphical models using Occam’s window.Journal of the American Statistical Association, 89(428):1535–1546, 1994

  25. [25]

    Hoeting, David Madigan, Adrian E

    Jennifer A. Hoeting, David Madigan, Adrian E. Raftery, and Chris T. V olinsky. Bayesian model averaging: A tutorial.Statistical Science, 14(4):382–401, 1999

  26. [26]

    Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20:217–240, 03 2011

    Jennifer Hill. Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20:217–240, 03 2011. doi: 10.1198/jcgs.2010.08162

  27. [27]

    Maathuis, Markus Kalisch, and Peter Bühlmann

    Marloes H. Maathuis, Markus Kalisch, and Peter Bühlmann. Estimating high-dimensional intervention effects from observational data.The Annals of Statistics, 37(6A):3133–3164, 2009

  28. [28]

    On the application of probability theory to agricultural experiments

    Jerzy Neyman. On the application of probability theory to agricultural experiments. essay on principles. section 9.Statistical Science, 5(4):465–472, 1990

  29. [29]

    Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of Educational Psychology, 66(5):688, 1974

  30. [30]

    Randomization analysis of experimental data: The fisher randomization test comment.Journal of the American Statistical Association, 75(371):591–593, 1980

    Donald B Rubin. Randomization analysis of experimental data: The fisher randomization test comment.Journal of the American Statistical Association, 75(371):591–593, 1980

  31. [31]

    Being bayesian about network structure

    Nir Friedman and Daphne Koller. Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks.Machine learning, 50:95–125, 2003

  32. [32]

    probable error

    Ronald A Fisher. On the" probable error" of a coefficient of correlation deduced from a small sample.Metron, 1:3–32, 1921

  33. [33]

    Estimation of regression coefficients when some regressors are not always observed.Journal of the American statistical Association, 89(427):846–866, 1994

    James M Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed.Journal of the American statistical Association, 89(427):846–866, 1994

  34. [34]

    Lauffenburger, and Garry P

    Karen Sachs, Omar Perez, Dana Pe’er, Douglas A. Lauffenburger, and Garry P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721): 523–529, 2005. doi: 10.1126/science.1105809. URL https://www.science.org/doi/ abs/10.1126/science.1105809

  35. [35]

    Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017

    P. Richard Hahn, Vincent Dorie, and Jared S. Murray. Atlantic causal inference conference (acic) data analysis challenge 2017, 2019. URLhttps://arxiv.org/abs/1905.09515

  36. [36]

    Robert J. LaLonde. Evaluating the econometric evaluations of training programs with experi- mental data.The American Economic Review, 76(4):604–620, 1986. ISSN 00028282. URL http://www.jstor.org/stable/1806062

  37. [37]

    Causal effect inference with deep latent-variable models.Advances in neural information processing systems, 30, 2017

    Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep latent-variable models.Advances in neural information processing systems, 30, 2017. 11

  38. [38]

    Dominique M. A. Haughton. On the choice of a model to fit data from an exponential family. The Annals of Statistics, 16(1):342–355, 1988. ISSN 00905364, 21688966. URL http: //www.jstor.org/stable/2241441

  39. [39]

    Edward H. Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects,

  40. [40]

    URLhttps://arxiv.org/abs/2004.14497

  41. [41]

    Vikash Singh, Debargha Ganguly, H. Yu, C. Zhou, P. Singh, B. Lee, Vipin Chaudhary, and G. Datta. Toward guarantees for clinical reasoning in vision language models via formal verification.arXiv preprint arXiv:2602.24111, 2026

  42. [42]

    Forte: Finding outliers with representation typicality estimation

    Debargha Ganguly, Warren Morningstar, Andrew Yu, and Vipin Chaudhary. Forte: Finding outliers with representation typicality estimation. In13th International Conference on Learning Representations (ICLR), Singapore, April 2025

  43. [43]

    Sankar, B

    Debargha Ganguly, S. Sankar, B. Zhang, Vikash Singh, K. Gupta, H. Kavuru, A. Luo, et al. Trust the typical: An out-of-distribution safety detection framework.arXiv preprint arXiv:2602.04581,

  44. [44]

    $K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning

    Weicong Chen, Vikash Singh, Z. Rahmani, Debargha Ganguly, M. Hariri, and Vipin Chaud- hary. K4: Online log anomaly detection via unsupervised typicality learning.arXiv preprint arXiv:2507.20051, 2025

  45. [45]

    N. Wang, T. Liang, Vikash Singh, C. Song, V . Yang, Y . Yin, Jing Ma, J. Singh, et al. HugRAG: Hierarchical causal knowledge graph design for RAG.arXiv preprint arXiv:2602.05143, 2026

  46. [46]

    Yang, Debargha Ganguly, X

    W. Yang, Debargha Ganguly, X. Li, C. Song, S. Wang, Vikash Singh, Vipin Chaudhary, and X. Han. Mid-Think: Training-free intermediate-budget reasoning via token-level triggers.arXiv preprint arXiv:2601.07036, 2026

  47. [47]

    Labeling copilot: A deep research agent for automated data curation in computer vision,

    Debargha Ganguly, Sumit Kumar, Ishwar Balappanawar, Weicong Chen, Shashank Kambhatla, Srinivasan Iyengar, Shivkumar Kalyanaraman, Ponnurangam Kumaraguru, and Vipin Chaud- hary. Labeling copilot: A deep research agent for automated data curation in computer vision,

  48. [48]

    URLhttps://arxiv.org/abs/2509.22631

  49. [49]

    Zhang, M

    B. Zhang, M. Zheng, Debargha Ganguly, X. Zhang, Vikash Singh, Vipin Chaudhary, and Z. Zhang. Efficient fine-grained GPU performance modeling for distributed deep learning of LLM.arXiv preprint arXiv:2509.22832, 2025. A Assumptions We collect the regularity conditions required for the theoretical guarantees. Assumption 1 is the only condition needed for th...

  50. [50]

    BIC scores and BMA weights(Equations 3–4): Gaussian likelihoods evaluated on Dtrain, followed by duplicate adjustment-strategy collapse before normalization

  51. [51]

    LLM is pattern-matching variable names

    Nuisance models: propensity score ˆe(k), outcome models ˆµ(k) t , and heuristic bounds [ˆq(k) low,ˆq(k) high]all fitted onD train. Stage 4 (conformal calibration) uses Dcal only for evaluating out-of-sample pseudo-outcomes, computing nonconformity scores, and determining ˆQ. This separation ensures that, conditioned on Dtrain, the composite scores are sym...

  52. [52]

    Initialise a NumPyRandomStatewith the run seed

  53. [53]

    Draw a uniform random permutationπof all variables (covariates,T,Y)

  54. [54]

    This guarantees T precedes Y in topological order so that the protected edgeT→Yis order-consistent

    If π(T)> π(Y) , swap the positions of T and Y in π. This guarantees T precedes Y in topological order so that the protected edgeT→Yis order-consistent

  55. [55]

    For each ordered pair (u, v) with π(u)< π(v) , include the edge u→v with probability pu→v via an independent Bernoulli draw

  56. [56]

    Force-add the edgeT→Yto the resulting graph

  57. [57]

    Return ONLY valid JSON

    Reject duplicates by hashing frozenset(G.edges()); up to 100K resamples are at- tempted before stopping. The firstKunique non-empty DAGs are kept. This procedure cannot produce cycles because every sampled edge respects π, and T→Y is consistent with the swap in step 3. CI-pruning rule (Fisher partial correlation).Given a sampled DAG G and Xtr (the train-o...