CausalGuard: Conformal Inference under Graph Uncertainty

Alexander Nemecek; Biyao Zhang; Chaoda Song; Debargha Ganguly; Erman Ayday; Jing Ma; Mohsen Hariri; Nengbo Wang; Shouren Wang; Sreehari Sankar

arxiv: 2605.21928 · v1 · pith:AWF6J47Knew · submitted 2026-05-21 · 💻 cs.LG · cs.AI· stat.ME

CausalGuard: Conformal Inference under Graph Uncertainty

Vikash Singh , Weicong Chen , Debargha Ganguly , Yanyan Zhang , Nengbo Wang , Sreehari Sankar , Mohsen Hariri , Alexander Nemecek

show 7 more authors

Chaoda Song Shouren Wang Biyao Zhang Van Yang Erman Ayday Jing Ma Vipin Chaudhary

This is my paper

Pith reviewed 2026-05-22 07:48 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ME

keywords causal inferenceconformal predictiontreatment effect estimationgraph uncertaintydoubly robust methodsBayesian model averaging

0 comments

The pith

CausalGuard aggregates pseudo-outcomes from candidate causal graphs to provide finite-sample coverage guarantees for treatment effect estimates despite graph uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to estimate causal treatment effects from observational data when the causal graph is uncertain. It proposes multiple candidate graphs using an LLM edge prior, prunes them via conditional independence tests, and reweights by BIC. Doubly robust pseudo-outcomes are computed for each graph and aggregated with weights. A conformal procedure with a composite score then ensures marginal coverage in finite samples for the aggregated outcome. This matters because it allows reliable inference without assuming a single correct graph, and experiments show it meets coverage targets while keeping intervals narrower than alternatives.

Core claim

CausalGuard provides distribution-free finite-sample marginal coverage for this aggregated pseudo-outcome; under causal identification, overlap, conditional-mean nuisance stability, and concentration on target-aligned valid adjustment strategies, its conditional mean converges to the true Conditional Average Treatment Effect.

What carries the argument

A structure-weighted conformal framework that calibrates after aggregating graph-conditional doubly robust pseudo-outcomes using a composite nonconformity score on the posterior-weighted pseudo-outcome.

If this is right

Attains mean coverage above the nominal 90% level across five benchmarks for the target.
Reduces prediction interval width compared to graph-agnostic conformal methods that require large padding.
Suppresses invalid collider adjustment in stress tests.
Remains stable under misspecified priors provided the retained candidates are data-supported.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the approach scales, it could enable safer use of observational data in policy decisions where causal graphs are debated.
Combining this with expert knowledge to refine LLM-proposed graphs might further tighten the intervals.
Testing on time-series or dynamic graphs could reveal whether the method extends beyond static DAGs.

Load-bearing premise

The set of candidate DAGs must contain target-aligned valid adjustment strategies and be supported by the observed data after pruning and reweighting.

What would settle it

Observe whether coverage falls below 90% on a benchmark dataset where the true causal graph is known and the method's assumptions hold but graph uncertainty is high.

Figures

Figures reproduced from arXiv: 2605.21928 by Alexander Nemecek, Biyao Zhang, Chaoda Song, Debargha Ganguly, Erman Ayday, Jing Ma, Mohsen Hariri, Nengbo Wang, Shouren Wang, Sreehari Sankar, Van Yang, Vikash Singh, Vipin Chaudhary, Weicong Chen, Yanyan Zhang.

**Figure 1.** Figure 1: The CAUSALGUARD pipeline. Candidate DAGs are proposed from an LLM-derived edge prior, pruned by CI tests on Dtrain, scored by BIC plus a bounded structural prior, and collapsed to unique adjustment strategies before weight normalization. Graph-conditional doubly robust pseudo-outcomes are then aggregated into γ¯, and a composite split-conformal score calibrated on Dcal produces Cb(x). The distribution-free… view at source ↗

read the original abstract

Estimating treatment effects from observational data requires choosing an adjustment set, but valid adjustment depends on an unknown causal graph. Graph misspecification can cause under-coverage, while graph-agnostic conformal wrappers may regain nominal coverage only through large padding. We introduce CausalGuard, a structure-weighted conformal framework that calibrates after aggregating graph-conditional doubly robust pseudo-outcomes. Candidate DAGs are proposed from an LLM-derived edge prior, pruned by conditional-independence tests, and reweighted by Bayesian Information Criterion. A composite nonconformity score then calibrates the posterior-weighted pseudo-outcome. CausalGuard provides distribution-free finite-sample marginal coverage for this aggregated pseudo-outcome; under causal identification, overlap, conditional-mean nuisance stability, and concentration on target-aligned valid adjustment strategies, its conditional mean converges to the true Conditional Average Treatment Effect. Across five benchmarks, CausalGuard attains mean coverage above the nominal 90% level for the directly evaluable target and reduces width when graph-agnostic conformal baselines require large padding. Stress tests show that CausalGuard suppresses invalid collider adjustment and remains stable under misspecified priors when the retained candidate set is data-supported.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CausalGuard gives finite-sample coverage for a BIC-weighted average of DR pseudo-outcomes from LLM-proposed graphs, but the step from that coverage to actual CATE convergence still rests on an unproven concentration claim.

read the letter

The core contribution is a conformal wrapper that first builds a posterior over candidate DAGs using an LLM edge prior, conditional-independence pruning, and BIC reweighting, then calibrates a composite nonconformity score on the resulting aggregated doubly-robust pseudo-outcome. This delivers distribution-free marginal coverage for the weighted quantity regardless of whether any single graph is correct. That part is clean and directly useful when graph uncertainty is the main practical headache.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CausalGuard, a conformal framework for treatment effect estimation under causal graph uncertainty. Candidate DAGs are proposed via an LLM-derived edge prior, pruned by conditional-independence tests, and reweighted by BIC. Graph-conditional doubly robust pseudo-outcomes are aggregated using posterior weights, and a composite nonconformity score is applied to the posterior-weighted pseudo-outcome. The paper claims distribution-free finite-sample marginal coverage for this aggregated quantity; under causal identification, overlap, nuisance stability, and concentration on target-aligned valid adjustment strategies, the conditional mean is asserted to converge to the true CATE. Experiments across five benchmarks report coverage above the nominal 90% level and narrower intervals than graph-agnostic baselines, with stress tests indicating suppression of invalid collider adjustment.

Significance. If the central claims hold, the work provides a principled way to obtain valid conformal intervals for causal effects when the adjustment set is uncertain, potentially reducing the conservatism of purely graph-agnostic wrappers. The distribution-free finite-sample coverage for the aggregated pseudo-outcome is a clear technical strength, and the empirical demonstration that the method remains stable under misspecified priors when the retained set is data-supported is useful. The combination of LLM-assisted discovery, BIC reweighting, and conformal calibration is novel and addresses a practical gap in observational causal inference.

major comments (2)

[Abstract] Abstract and theoretical claims: the finite-sample marginal coverage guarantee applies to the aggregated pseudo-outcome via the composite nonconformity score, but the claim that the conditional mean converges to the CATE rests on the unverified assumption that BIC-reweighted posteriors concentrate on graphs containing target-aligned valid adjustment strategies. No formal argument is supplied showing that BIC down-weights invalid yet data-compatible candidates (e.g., those inducing collider bias after CI pruning); the cited stress tests are empirical only and do not bound posterior mass on such graphs.
[Theoretical results] Theoretical development: the manuscript should explicitly derive or cite the conditions under which posterior weighting of doubly robust pseudo-outcomes preserves the convergence property when the support includes both valid and invalid adjustment sets; the listed assumptions (identification, overlap, nuisance stability, concentration) are necessary but the interaction with the composite score and BIC reweighting requires a precise statement to support the CATE claim.

minor comments (2)

The abstract would benefit from a clearer separation between the distribution-free coverage result (which holds regardless of graph correctness) and the convergence result (which requires the concentration assumption).
Notation for the posterior-weighted pseudo-outcome and the composite nonconformity score should be introduced with explicit definitions early in the methods section to aid readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments on our work. We address the major comments below and outline the revisions we plan to make to strengthen the theoretical presentation.

read point-by-point responses

Referee: [Abstract] Abstract and theoretical claims: the finite-sample marginal coverage guarantee applies to the aggregated pseudo-outcome via the composite nonconformity score, but the claim that the conditional mean converges to the CATE rests on the unverified assumption that BIC-reweighted posteriors concentrate on graphs containing target-aligned valid adjustment strategies. No formal argument is supplied showing that BIC down-weights invalid yet data-compatible candidates (e.g., those inducing collider bias after CI pruning); the cited stress tests are empirical only and do not bound posterior mass on such graphs.

Authors: The referee is correct that the CATE convergence relies on the concentration assumption stated in the paper. We do not supply a formal argument that BIC universally down-weights invalid graphs, since this depends on the data and may not hold in all cases. The stress tests are indeed empirical. We will revise the abstract to better distinguish the coverage guarantee from the convergence claim and add a discussion of the assumption's role and limitations. revision: partial
Referee: [Theoretical results] Theoretical development: the manuscript should explicitly derive or cite the conditions under which posterior weighting of doubly robust pseudo-outcomes preserves the convergence property when the support includes both valid and invalid adjustment sets; the listed assumptions (identification, overlap, nuisance stability, concentration) are necessary but the interaction with the composite score and BIC reweighting requires a precise statement to support the CATE claim.

Authors: We will revise the theoretical section to provide a more precise statement of the assumptions and their interaction with the posterior weighting and composite nonconformity score. The coverage guarantee is distribution-free for the aggregated quantity. Convergence to CATE requires the concentration assumption to ensure the weighted pseudo-outcome targets the correct quantity. We will cite relevant literature on causal model averaging to support this. revision: yes

standing simulated objections not resolved

A general formal proof or bound that BIC reweighting concentrates posterior mass exclusively on valid adjustment sets for arbitrary invalid but data-compatible graphs.

Circularity Check

0 steps flagged

No significant circularity; coverage and convergence claims are self-contained under explicit assumptions

full rationale

The paper defines an aggregated posterior-weighted pseudo-outcome and applies a composite nonconformity score to obtain distribution-free finite-sample marginal coverage for that quantity; this is a direct application of standard conformal prediction to a constructed score and does not reduce by construction to a fitted parameter or self-referential definition. Convergence of the conditional mean to the true CATE is stated only under a list of assumptions that explicitly includes 'concentration on target-aligned valid adjustment strategies,' which is not derived or proven inside the paper but treated as a prerequisite. No load-bearing step relies on self-citation chains, ansatz smuggling, or renaming of known results; the BIC reweighting and LLM prior are procedural choices whose validity is assumed rather than tautologically enforced by the coverage guarantee. The derivation therefore remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 4 axioms · 0 invented entities

The central claims rest on standard causal identification assumptions plus the practical effectiveness of the LLM prior and the data-supported nature of the retained graph set; no new entities are postulated.

axioms (4)

domain assumption causal identification
Invoked for convergence of the conditional mean to the true CATE.
domain assumption overlap
Standard positivity assumption required for identification.
domain assumption conditional-mean nuisance stability
Required for the convergence statement in the abstract.
domain assumption concentration on target-aligned valid adjustment strategies
Assumed so that the weighted set includes valid adjustment sets.

pith-pipeline@v0.9.0 · 5786 in / 1271 out tokens · 43772 ms · 2026-05-22T07:48:23.406552+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CAUSALGUARD provides distribution-free finite-sample marginal coverage for this aggregated pseudo-outcome; under causal identification, overlap, conditional-mean nuisance stability, and concentration on target-aligned valid adjustment strategies, its conditional mean converges to the true Conditional Average Treatment Effect.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A composite nonconformity score then calibrates the posterior-weighted pseudo-outcome.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 5 internal anchors

[1]

Cambridge University Press, USA, 2nd edition, 2009

Judea Pearl.Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X

work page 2009
[2]

Double/debiased machine learning for treatment and structural parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 02 2018. ISSN 1368-4221. doi: 10.1111/ectj.12097. URLhttps://doi.org/10.1111/ectj.12097

work page doi:10.1111/ectj.12097 2018
[3]

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests, 2017. URLhttps://arxiv.org/abs/1510.04342

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

MIT Press, 2nd edition, 2000

Peter Spirtes, Clark Glymour, and Richard Scheines.Causation, Prediction, and Search. MIT Press, 2nd edition, 2000

work page 2000
[5]

Lihua Lei and Emmanuel J. Candès. Conformal inference of counterfactuals and individual treatment effects.Journal of the Royal Statistical Society: Series B, 83(5):911–938, 2021

work page 2021
[6]

Yaniv Romano, Evan Patterson, and Emmanuel J. Candès. Conformalized quantile regression. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

work page 2019
[7]

Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464, 1978

Gideon Schwarz. Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464, 1978

work page 1978
[8]

Springer, 2005

Vladimir V ovk, Alexander Gammerman, and Glenn Shafer.Algorithmic Learning in a Random World. Springer, 2005

work page 2005
[9]

Distribution-free predictive inference for regression.Journal of the American Statistical Associ- ation, 113(523):1094–1111, 2018

Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression.Journal of the American Statistical Associ- ation, 113(523):1094–1111, 2018

work page 2018
[10]

Alaa, Zaid Ahmad, and Mark van der Laan

Ahmed M. Alaa, Zaid Ahmad, and Mark van der Laan. Conformal meta-learners for predictive inference of individual treatment effects. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[11]

Tibshirani, Rina Foygel Barber, Emmanuel J

Ryan J. Tibshirani, Rina Foygel Barber, Emmanuel J. Candès, and Aaditya Ramdas. Conformal prediction under covariate shift. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

work page 2019
[12]

Ying Jin, Zhimei Ren, and Emmanuel J. Candès. Sensitivity analysis of individual treatment effects: A robust conformal inference approach.Proceedings of the National Academy of Sciences, 120(6):e2214889120, 2023

work page 2023
[13]

Data-driven covariate selection for nonparametric estimation of causal effects

Doris Entner, Patrik Hoyer, and Peter Spirtes. Data-driven covariate selection for nonparametric estimation of causal effects. InAISTATS, 2013

work page 2013
[14]

Maathuis

Leonard Henckel, Emilija Perkovi´c, and Marloes H. Maathuis. Graphical criteria for efficient total effect estimation via adjustment in causal linear models.Journal of the Royal Statistical Society: Series B, 84(2):579–599, 2022

work page 2022
[15]

Optimal structure identification with greedy search.Journal of Machine Learning Research, 3:507–554, 2002

David Maxwell Chickering. Optimal structure identification with greedy search.Journal of Machine Learning Research, 3:507–554, 2002

work page 2002
[16]

Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. DAGs with NO TEARS: Continuous optimization for structure learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018
[17]

Causal reasoning and large language models: Opening a new frontier for causality

Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality.arXiv preprint arXiv:2305.00050, 2023

work page arXiv 2023
[18]

Causal discovery with language models as imperfect experts

Stephanie Long, Alexandre Piché, Valentina Zantedeschi, Tibor Schuster, and Alexandre Drouin. Causal discovery with language models as imperfect experts. InICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023. URLhttps://openreview. net/forum?id=RXlvYZAE49. 10

work page 2023
[19]

From query tools to causal architects: Harnessing large language models for advanced causal discovery from data.arXiv preprint arXiv:2306.16902, 2023

Taiyu Ban, Lyvzhou Chen, Xiangyu Wang, and Huanhuan Chen. From query tools to causal architects: Harnessing large language models for advanced causal discovery from data.arXiv preprint arXiv:2306.16902, 2023

work page arXiv 2023
[20]

Efficient Causal Graph Discovery Using Large Language Models

Thomas Jiralerspong, Xiaoyin Chen, Yash More, Vedant Shah, and Yoshua Bengio. Efficient causal graph discovery using large language models.arXiv preprint arXiv:2402.01207, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Prompting strategies for enabling large language models to infer causation from correlation.arXiv preprint arXiv:2412.13952, 2024

Eleni Sgouritsa, Virginia Aglietti, Yee Whye Teh, Arnaud Doucet, Arthur Gretton, and Silvia Chiappa. Prompting strategies for enabling large language models to infer causation from correlation.arXiv preprint arXiv:2412.13952, 2024

work page arXiv 2024
[22]

Sankar, B

Debargha Ganguly, Vikash Singh, S. Sankar, B. Zhang, X. Zhang, S. Iyengar, X. Han, et al. Grammars of formal uncertainty: When to trust LLMs in automated reasoning tasks.arXiv preprint arXiv:2505.20047, 2025. NeurIPS 2025

work page arXiv 2025
[23]

VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

Vikash Singh, D. Cassel, N. Weir, N. Feng, and S. Bayless. VERGE: Formal refinement and guidance engine for verifiable LLM reasoning.arXiv preprint arXiv:2601.20055, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[24]

David Madigan and Adrian E. Raftery. Model selection and accounting for model uncertainty in graphical models using Occam’s window.Journal of the American Statistical Association, 89(428):1535–1546, 1994

work page 1994
[25]

Hoeting, David Madigan, Adrian E

Jennifer A. Hoeting, David Madigan, Adrian E. Raftery, and Chris T. V olinsky. Bayesian model averaging: A tutorial.Statistical Science, 14(4):382–401, 1999

work page 1999
[26]

Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20:217–240, 03 2011

Jennifer Hill. Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20:217–240, 03 2011. doi: 10.1198/jcgs.2010.08162

work page doi:10.1198/jcgs.2010.08162 2011
[27]

Maathuis, Markus Kalisch, and Peter Bühlmann

Marloes H. Maathuis, Markus Kalisch, and Peter Bühlmann. Estimating high-dimensional intervention effects from observational data.The Annals of Statistics, 37(6A):3133–3164, 2009

work page 2009
[28]

On the application of probability theory to agricultural experiments

Jerzy Neyman. On the application of probability theory to agricultural experiments. essay on principles. section 9.Statistical Science, 5(4):465–472, 1990

work page 1990
[29]

Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of Educational Psychology, 66(5):688, 1974

work page 1974
[30]

Randomization analysis of experimental data: The fisher randomization test comment.Journal of the American Statistical Association, 75(371):591–593, 1980

Donald B Rubin. Randomization analysis of experimental data: The fisher randomization test comment.Journal of the American Statistical Association, 75(371):591–593, 1980

work page 1980
[31]

Being bayesian about network structure

Nir Friedman and Daphne Koller. Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks.Machine learning, 50:95–125, 2003

work page 2003
[32]

probable error

Ronald A Fisher. On the" probable error" of a coefficient of correlation deduced from a small sample.Metron, 1:3–32, 1921

work page 1921
[33]

Estimation of regression coefficients when some regressors are not always observed.Journal of the American statistical Association, 89(427):846–866, 1994

James M Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed.Journal of the American statistical Association, 89(427):846–866, 1994

work page 1994
[34]

Lauffenburger, and Garry P

Karen Sachs, Omar Perez, Dana Pe’er, Douglas A. Lauffenburger, and Garry P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721): 523–529, 2005. doi: 10.1126/science.1105809. URL https://www.science.org/doi/ abs/10.1126/science.1105809

work page doi:10.1126/science.1105809 2005
[35]

Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017

P. Richard Hahn, Vincent Dorie, and Jared S. Murray. Atlantic causal inference conference (acic) data analysis challenge 2017, 2019. URLhttps://arxiv.org/abs/1905.09515

work page internal anchor Pith review Pith/arXiv arXiv 2017
[36]

Robert J. LaLonde. Evaluating the econometric evaluations of training programs with experi- mental data.The American Economic Review, 76(4):604–620, 1986. ISSN 00028282. URL http://www.jstor.org/stable/1806062

work page arXiv 1986
[37]

Causal effect inference with deep latent-variable models.Advances in neural information processing systems, 30, 2017

Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep latent-variable models.Advances in neural information processing systems, 30, 2017. 11

work page 2017
[38]

Dominique M. A. Haughton. On the choice of a model to fit data from an exponential family. The Annals of Statistics, 16(1):342–355, 1988. ISSN 00905364, 21688966. URL http: //www.jstor.org/stable/2241441

work page arXiv 1988
[39]

Edward H. Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects,

work page
[40]

URLhttps://arxiv.org/abs/2004.14497

work page arXiv 2004
[41]

Vikash Singh, Debargha Ganguly, H. Yu, C. Zhou, P. Singh, B. Lee, Vipin Chaudhary, and G. Datta. Toward guarantees for clinical reasoning in vision language models via formal verification.arXiv preprint arXiv:2602.24111, 2026

work page arXiv 2026
[42]

Forte: Finding outliers with representation typicality estimation

Debargha Ganguly, Warren Morningstar, Andrew Yu, and Vipin Chaudhary. Forte: Finding outliers with representation typicality estimation. In13th International Conference on Learning Representations (ICLR), Singapore, April 2025

work page 2025
[43]

Sankar, B

Debargha Ganguly, S. Sankar, B. Zhang, Vikash Singh, K. Gupta, H. Kavuru, A. Luo, et al. Trust the typical: An out-of-distribution safety detection framework.arXiv preprint arXiv:2602.04581,

work page arXiv
[44]

$K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning

Weicong Chen, Vikash Singh, Z. Rahmani, Debargha Ganguly, M. Hariri, and Vipin Chaud- hary. K4: Online log anomaly detection via unsupervised typicality learning.arXiv preprint arXiv:2507.20051, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[45]

N. Wang, T. Liang, Vikash Singh, C. Song, V . Yang, Y . Yin, Jing Ma, J. Singh, et al. HugRAG: Hierarchical causal knowledge graph design for RAG.arXiv preprint arXiv:2602.05143, 2026

work page arXiv 2026
[46]

Yang, Debargha Ganguly, X

W. Yang, Debargha Ganguly, X. Li, C. Song, S. Wang, Vikash Singh, Vipin Chaudhary, and X. Han. Mid-Think: Training-free intermediate-budget reasoning via token-level triggers.arXiv preprint arXiv:2601.07036, 2026

work page arXiv 2026
[47]

Labeling copilot: A deep research agent for automated data curation in computer vision,

Debargha Ganguly, Sumit Kumar, Ishwar Balappanawar, Weicong Chen, Shashank Kambhatla, Srinivasan Iyengar, Shivkumar Kalyanaraman, Ponnurangam Kumaraguru, and Vipin Chaud- hary. Labeling copilot: A deep research agent for automated data curation in computer vision,

work page
[48]

URLhttps://arxiv.org/abs/2509.22631

work page arXiv
[49]

Zhang, M

B. Zhang, M. Zheng, Debargha Ganguly, X. Zhang, Vikash Singh, Vipin Chaudhary, and Z. Zhang. Efficient fine-grained GPU performance modeling for distributed deep learning of LLM.arXiv preprint arXiv:2509.22832, 2025. A Assumptions We collect the regularity conditions required for the theoretical guarantees. Assumption 1 is the only condition needed for th...

work page arXiv 2025
[50]

BIC scores and BMA weights(Equations 3–4): Gaussian likelihoods evaluated on Dtrain, followed by duplicate adjustment-strategy collapse before normalization

work page
[51]

LLM is pattern-matching variable names

Nuisance models: propensity score ˆe(k), outcome models ˆµ(k) t , and heuristic bounds [ˆq(k) low,ˆq(k) high]all fitted onD train. Stage 4 (conformal calibration) uses Dcal only for evaluating out-of-sample pseudo-outcomes, computing nonconformity scores, and determining ˆQ. This separation ensures that, conditioned on Dtrain, the composite scores are sym...

work page arXiv 2092
[52]

Initialise a NumPyRandomStatewith the run seed

work page
[53]

Draw a uniform random permutationπof all variables (covariates,T,Y)

work page
[54]

This guarantees T precedes Y in topological order so that the protected edgeT→Yis order-consistent

If π(T)> π(Y) , swap the positions of T and Y in π. This guarantees T precedes Y in topological order so that the protected edgeT→Yis order-consistent

work page
[55]

For each ordered pair (u, v) with π(u)< π(v) , include the edge u→v with probability pu→v via an independent Bernoulli draw

work page
[56]

Force-add the edgeT→Yto the resulting graph

work page
[57]

Return ONLY valid JSON

Reject duplicates by hashing frozenset(G.edges()); up to 100K resamples are at- tempted before stopping. The firstKunique non-empty DAGs are kept. This procedure cannot produce cycles because every sampled edge respects π, and T→Y is consistent with the swap in step 3. CI-pruning rule (Fisher partial correlation).Given a sampled DAG G and Xtr (the train-o...

work page

[1] [1]

Cambridge University Press, USA, 2nd edition, 2009

Judea Pearl.Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X

work page 2009

[2] [2]

Double/debiased machine learning for treatment and structural parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 02 2018. ISSN 1368-4221. doi: 10.1111/ectj.12097. URLhttps://doi.org/10.1111/ectj.12097

work page doi:10.1111/ectj.12097 2018

[3] [3]

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests, 2017. URLhttps://arxiv.org/abs/1510.04342

work page internal anchor Pith review Pith/arXiv arXiv 2017

[4] [4]

MIT Press, 2nd edition, 2000

Peter Spirtes, Clark Glymour, and Richard Scheines.Causation, Prediction, and Search. MIT Press, 2nd edition, 2000

work page 2000

[5] [5]

Lihua Lei and Emmanuel J. Candès. Conformal inference of counterfactuals and individual treatment effects.Journal of the Royal Statistical Society: Series B, 83(5):911–938, 2021

work page 2021

[6] [6]

Yaniv Romano, Evan Patterson, and Emmanuel J. Candès. Conformalized quantile regression. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

work page 2019

[7] [7]

Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464, 1978

Gideon Schwarz. Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464, 1978

work page 1978

[8] [8]

Springer, 2005

Vladimir V ovk, Alexander Gammerman, and Glenn Shafer.Algorithmic Learning in a Random World. Springer, 2005

work page 2005

[9] [9]

Distribution-free predictive inference for regression.Journal of the American Statistical Associ- ation, 113(523):1094–1111, 2018

Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression.Journal of the American Statistical Associ- ation, 113(523):1094–1111, 2018

work page 2018

[10] [10]

Alaa, Zaid Ahmad, and Mark van der Laan

Ahmed M. Alaa, Zaid Ahmad, and Mark van der Laan. Conformal meta-learners for predictive inference of individual treatment effects. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[11] [11]

Tibshirani, Rina Foygel Barber, Emmanuel J

Ryan J. Tibshirani, Rina Foygel Barber, Emmanuel J. Candès, and Aaditya Ramdas. Conformal prediction under covariate shift. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

work page 2019

[12] [12]

Ying Jin, Zhimei Ren, and Emmanuel J. Candès. Sensitivity analysis of individual treatment effects: A robust conformal inference approach.Proceedings of the National Academy of Sciences, 120(6):e2214889120, 2023

work page 2023

[13] [13]

Data-driven covariate selection for nonparametric estimation of causal effects

Doris Entner, Patrik Hoyer, and Peter Spirtes. Data-driven covariate selection for nonparametric estimation of causal effects. InAISTATS, 2013

work page 2013

[14] [14]

Maathuis

Leonard Henckel, Emilija Perkovi´c, and Marloes H. Maathuis. Graphical criteria for efficient total effect estimation via adjustment in causal linear models.Journal of the Royal Statistical Society: Series B, 84(2):579–599, 2022

work page 2022

[15] [15]

Optimal structure identification with greedy search.Journal of Machine Learning Research, 3:507–554, 2002

David Maxwell Chickering. Optimal structure identification with greedy search.Journal of Machine Learning Research, 3:507–554, 2002

work page 2002

[16] [16]

Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. DAGs with NO TEARS: Continuous optimization for structure learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018

[17] [17]

Causal reasoning and large language models: Opening a new frontier for causality

Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality.arXiv preprint arXiv:2305.00050, 2023

work page arXiv 2023

[18] [18]

Causal discovery with language models as imperfect experts

Stephanie Long, Alexandre Piché, Valentina Zantedeschi, Tibor Schuster, and Alexandre Drouin. Causal discovery with language models as imperfect experts. InICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023. URLhttps://openreview. net/forum?id=RXlvYZAE49. 10

work page 2023

[19] [19]

From query tools to causal architects: Harnessing large language models for advanced causal discovery from data.arXiv preprint arXiv:2306.16902, 2023

Taiyu Ban, Lyvzhou Chen, Xiangyu Wang, and Huanhuan Chen. From query tools to causal architects: Harnessing large language models for advanced causal discovery from data.arXiv preprint arXiv:2306.16902, 2023

work page arXiv 2023

[20] [20]

Efficient Causal Graph Discovery Using Large Language Models

Thomas Jiralerspong, Xiaoyin Chen, Yash More, Vedant Shah, and Yoshua Bengio. Efficient causal graph discovery using large language models.arXiv preprint arXiv:2402.01207, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

Prompting strategies for enabling large language models to infer causation from correlation.arXiv preprint arXiv:2412.13952, 2024

Eleni Sgouritsa, Virginia Aglietti, Yee Whye Teh, Arnaud Doucet, Arthur Gretton, and Silvia Chiappa. Prompting strategies for enabling large language models to infer causation from correlation.arXiv preprint arXiv:2412.13952, 2024

work page arXiv 2024

[22] [22]

Sankar, B

Debargha Ganguly, Vikash Singh, S. Sankar, B. Zhang, X. Zhang, S. Iyengar, X. Han, et al. Grammars of formal uncertainty: When to trust LLMs in automated reasoning tasks.arXiv preprint arXiv:2505.20047, 2025. NeurIPS 2025

work page arXiv 2025

[23] [23]

VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

Vikash Singh, D. Cassel, N. Weir, N. Feng, and S. Bayless. VERGE: Formal refinement and guidance engine for verifiable LLM reasoning.arXiv preprint arXiv:2601.20055, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[24] [24]

David Madigan and Adrian E. Raftery. Model selection and accounting for model uncertainty in graphical models using Occam’s window.Journal of the American Statistical Association, 89(428):1535–1546, 1994

work page 1994

[25] [25]

Hoeting, David Madigan, Adrian E

Jennifer A. Hoeting, David Madigan, Adrian E. Raftery, and Chris T. V olinsky. Bayesian model averaging: A tutorial.Statistical Science, 14(4):382–401, 1999

work page 1999

[26] [26]

Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20:217–240, 03 2011

Jennifer Hill. Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20:217–240, 03 2011. doi: 10.1198/jcgs.2010.08162

work page doi:10.1198/jcgs.2010.08162 2011

[27] [27]

Maathuis, Markus Kalisch, and Peter Bühlmann

Marloes H. Maathuis, Markus Kalisch, and Peter Bühlmann. Estimating high-dimensional intervention effects from observational data.The Annals of Statistics, 37(6A):3133–3164, 2009

work page 2009

[28] [28]

On the application of probability theory to agricultural experiments

Jerzy Neyman. On the application of probability theory to agricultural experiments. essay on principles. section 9.Statistical Science, 5(4):465–472, 1990

work page 1990

[29] [29]

Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of Educational Psychology, 66(5):688, 1974

work page 1974

[30] [30]

Randomization analysis of experimental data: The fisher randomization test comment.Journal of the American Statistical Association, 75(371):591–593, 1980

Donald B Rubin. Randomization analysis of experimental data: The fisher randomization test comment.Journal of the American Statistical Association, 75(371):591–593, 1980

work page 1980

[31] [31]

Being bayesian about network structure

Nir Friedman and Daphne Koller. Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks.Machine learning, 50:95–125, 2003

work page 2003

[32] [32]

probable error

Ronald A Fisher. On the" probable error" of a coefficient of correlation deduced from a small sample.Metron, 1:3–32, 1921

work page 1921

[33] [33]

Estimation of regression coefficients when some regressors are not always observed.Journal of the American statistical Association, 89(427):846–866, 1994

James M Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed.Journal of the American statistical Association, 89(427):846–866, 1994

work page 1994

[34] [34]

Lauffenburger, and Garry P

Karen Sachs, Omar Perez, Dana Pe’er, Douglas A. Lauffenburger, and Garry P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721): 523–529, 2005. doi: 10.1126/science.1105809. URL https://www.science.org/doi/ abs/10.1126/science.1105809

work page doi:10.1126/science.1105809 2005

[35] [35]

Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017

P. Richard Hahn, Vincent Dorie, and Jared S. Murray. Atlantic causal inference conference (acic) data analysis challenge 2017, 2019. URLhttps://arxiv.org/abs/1905.09515

work page internal anchor Pith review Pith/arXiv arXiv 2017

[36] [36]

Robert J. LaLonde. Evaluating the econometric evaluations of training programs with experi- mental data.The American Economic Review, 76(4):604–620, 1986. ISSN 00028282. URL http://www.jstor.org/stable/1806062

work page arXiv 1986

[37] [37]

Causal effect inference with deep latent-variable models.Advances in neural information processing systems, 30, 2017

Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep latent-variable models.Advances in neural information processing systems, 30, 2017. 11

work page 2017

[38] [38]

Dominique M. A. Haughton. On the choice of a model to fit data from an exponential family. The Annals of Statistics, 16(1):342–355, 1988. ISSN 00905364, 21688966. URL http: //www.jstor.org/stable/2241441

work page arXiv 1988

[39] [39]

Edward H. Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects,

work page

[40] [40]

URLhttps://arxiv.org/abs/2004.14497

work page arXiv 2004

[41] [41]

Vikash Singh, Debargha Ganguly, H. Yu, C. Zhou, P. Singh, B. Lee, Vipin Chaudhary, and G. Datta. Toward guarantees for clinical reasoning in vision language models via formal verification.arXiv preprint arXiv:2602.24111, 2026

work page arXiv 2026

[42] [42]

Forte: Finding outliers with representation typicality estimation

Debargha Ganguly, Warren Morningstar, Andrew Yu, and Vipin Chaudhary. Forte: Finding outliers with representation typicality estimation. In13th International Conference on Learning Representations (ICLR), Singapore, April 2025

work page 2025

[43] [43]

Sankar, B

Debargha Ganguly, S. Sankar, B. Zhang, Vikash Singh, K. Gupta, H. Kavuru, A. Luo, et al. Trust the typical: An out-of-distribution safety detection framework.arXiv preprint arXiv:2602.04581,

work page arXiv

[44] [44]

$K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning

Weicong Chen, Vikash Singh, Z. Rahmani, Debargha Ganguly, M. Hariri, and Vipin Chaud- hary. K4: Online log anomaly detection via unsupervised typicality learning.arXiv preprint arXiv:2507.20051, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[45] [45]

N. Wang, T. Liang, Vikash Singh, C. Song, V . Yang, Y . Yin, Jing Ma, J. Singh, et al. HugRAG: Hierarchical causal knowledge graph design for RAG.arXiv preprint arXiv:2602.05143, 2026

work page arXiv 2026

[46] [46]

Yang, Debargha Ganguly, X

W. Yang, Debargha Ganguly, X. Li, C. Song, S. Wang, Vikash Singh, Vipin Chaudhary, and X. Han. Mid-Think: Training-free intermediate-budget reasoning via token-level triggers.arXiv preprint arXiv:2601.07036, 2026

work page arXiv 2026

[47] [47]

Labeling copilot: A deep research agent for automated data curation in computer vision,

Debargha Ganguly, Sumit Kumar, Ishwar Balappanawar, Weicong Chen, Shashank Kambhatla, Srinivasan Iyengar, Shivkumar Kalyanaraman, Ponnurangam Kumaraguru, and Vipin Chaud- hary. Labeling copilot: A deep research agent for automated data curation in computer vision,

work page

[48] [48]

URLhttps://arxiv.org/abs/2509.22631

work page arXiv

[49] [49]

Zhang, M

B. Zhang, M. Zheng, Debargha Ganguly, X. Zhang, Vikash Singh, Vipin Chaudhary, and Z. Zhang. Efficient fine-grained GPU performance modeling for distributed deep learning of LLM.arXiv preprint arXiv:2509.22832, 2025. A Assumptions We collect the regularity conditions required for the theoretical guarantees. Assumption 1 is the only condition needed for th...

work page arXiv 2025

[50] [50]

BIC scores and BMA weights(Equations 3–4): Gaussian likelihoods evaluated on Dtrain, followed by duplicate adjustment-strategy collapse before normalization

work page

[51] [51]

LLM is pattern-matching variable names

Nuisance models: propensity score ˆe(k), outcome models ˆµ(k) t , and heuristic bounds [ˆq(k) low,ˆq(k) high]all fitted onD train. Stage 4 (conformal calibration) uses Dcal only for evaluating out-of-sample pseudo-outcomes, computing nonconformity scores, and determining ˆQ. This separation ensures that, conditioned on Dtrain, the composite scores are sym...

work page arXiv 2092

[52] [52]

Initialise a NumPyRandomStatewith the run seed

work page

[53] [53]

Draw a uniform random permutationπof all variables (covariates,T,Y)

work page

[54] [54]

This guarantees T precedes Y in topological order so that the protected edgeT→Yis order-consistent

If π(T)> π(Y) , swap the positions of T and Y in π. This guarantees T precedes Y in topological order so that the protected edgeT→Yis order-consistent

work page

[55] [55]

For each ordered pair (u, v) with π(u)< π(v) , include the edge u→v with probability pu→v via an independent Bernoulli draw

work page

[56] [56]

Force-add the edgeT→Yto the resulting graph

work page

[57] [57]

Return ONLY valid JSON

Reject duplicates by hashing frozenset(G.edges()); up to 100K resamples are at- tempted before stopping. The firstKunique non-empty DAGs are kept. This procedure cannot produce cycles because every sampled edge respects π, and T→Y is consistent with the swap in step 3. CI-pruning rule (Fisher partial correlation).Given a sampled DAG G and Xtr (the train-o...

work page