pith. sign in

arxiv: 2605.04838 · v1 · submitted 2026-05-06 · 📊 stat.ME · cs.LG· stat.ML

PAIR-CI: Calibrated Conditional Independence Testing for Causal Discovery with Incomplete Data

Pith reviewed 2026-05-08 16:54 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML
keywords conditional independence testingcausal discoveryincomplete datamultiple imputationpermutation testcalibrationPC algorithm
0
0 comments X

The pith

PAIR-CI restores proper calibration to conditional independence tests for causal discovery when data have missing values by integrating imputation into a paired permutation procedure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the usual approach of imputing missing data first and then testing for conditional independence produces badly miscalibrated tests, with false positives soaring as high as 45 percent under missing-not-at-random conditions. PAIR-CI instead performs multiple imputations and then compares two cross-validated models—one that includes the candidate variable and one that excludes it—while holding the imputed conditioning set fixed for both. This pairing makes the imputation error cancel out in the difference of their losses, yielding a test statistic whose distribution under the null is correctly calibrated. A new variance estimator that accounts for both the cross-validation and the multiple imputations is proven consistent. Simulations confirm that this keeps false positive rates near the nominal 5 percent level across linear and nonlinear settings, and improves the accuracy of the PC algorithm on larger causal graphs.

Core claim

PAIR-CI is a nonparametric conditional independence test that integrates multiple imputation directly into the inferential procedure via a paired permutation design, forcing imputation error to cancel in the loss difference between cross-validated models that include versus exclude the candidate variable, together with a provably consistent variance estimator that jointly accounts for uncertainty from cross-validation and multiple imputation.

What carries the argument

The paired permutation design, in which the same imputed conditioning sets are used for both the model that includes the candidate variable and the one that excludes it, so that imputation uncertainty subtracts out when comparing their cross-validated losses.

If this is right

  • Existing imputation-based CI tests show false positive rates of 28-45% under MNAR, while PAIR-CI stays below 5%.
  • Integrating PAIR-CI into the PC algorithm reduces structural Hamming distance by 8% on 10-variable nonlinear graphs, 15% on 30-variable graphs, and up to 44% on the 56-variable HAILFINDER network.
  • Performance remains stable across data-generating processes and missingness mechanisms, including MNAR.
  • The approach provides the first formal unification of cross-validation and multiple imputation uncertainty in a single consistent variance estimator.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar paired designs could be applied to other testing problems where imputation introduces bias, such as regression with missing covariates.
  • If the cancellation of imputation error holds more generally, it could improve calibration in other causal discovery algorithms that rely on CI tests.
  • The variance estimator might extend to settings with different imputation models beyond the ones simulated.

Load-bearing premise

The paired permutation design makes imputation error cancel exactly in the loss difference between the two cross-validated models.

What would settle it

A dataset or simulation under MNAR missingness where the false positive rate of PAIR-CI exceeds the nominal level or where the variance estimator fails to be consistent.

Figures

Figures reproduced from arXiv: 2605.04838 by Ranjit Lall, Thomas S. Robinson.

Figure 2
Figure 2. Figure 2: Power curves in standalone performance experiment. Rejection rate under H1 by signal strength and missingness mechanism for three DGPs (linear Gaussian, post-nonlinear, and latent confounder; described in Section 5.1). PAIR-CI (blue) has lower power than the miscalibrated baselines by design, so comparisons are uninformative ( view at source ↗
Figure 3
Figure 3. Figure 3: Precision–recall profiles by scale and DGP. Each panel pools all four missingness conditions. Per-replicate scatter is shown in low alpha, with per-method medians overlaid as large markers. F1 iso-curves are included for reference. I HAILFINDER Network: Full Results Tables 10 and 11 show median SHD and F1 across all missingness rates and mechanisms (20 replicates per cell, nonlinear edges, missingness indu… view at source ↗
Figure 4
Figure 4. Figure 4: Empirical relationship between κY · κZ and false positive rate across adversarial cells. Each point represents a combination of DGP, missingness rate, conditioning strategy, and imputer. The horizontal axis measures the post-cancellation residual κY · κZ (Decomposition 8); the vertical axis measures the PAIR-CI false positive rate over 100 replicates, with Clopper–Pearson 95% confidence intervals. The shad… view at source ↗
read the original abstract

The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation error induces spurious conditional dependence. We introduce PAIR-CI, a nonparametric CI test that restores calibration by integrating multiple imputation directly into the inferential procedure via a paired permutation design. PAIR-CI compares cross-validated models that include and exclude the candidate variable while receiving the same imputed conditioning set, forcing imputation error to cancel in their loss difference rather than contaminate the test statistic. A provably consistent variance estimator jointly accounts for uncertainty arising from cross-validation and multiple imputation -- to our knowledge, the first formal unification of these two inferential frameworks. In simulations, existing imputation-based CI tests exhibit false positive rates of 28--45% when data are missing not at random (MNAR), whereas PAIR-CI averages below the nominal 5% level across data-generating processes and missingness mechanisms. These gains are largest in nonlinear settings and grow with causal graph size: when integrated into the PC algorithm, PAIR-CI reduces structural Hamming distance by 8% on 10-variable nonlinear graphs, 15% on 30-variable equivalents, and up to 44% on the 56-variable HAILFINDER network, with stable performance in all settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces PAIR-CI, a nonparametric conditional independence test designed for causal discovery with incomplete data. It addresses miscalibration in the standard 'impute first, test second' approach by using a paired permutation design that integrates multiple imputation, ensuring imputation errors cancel in the loss difference between cross-validated models that include and exclude the candidate variable. The method includes a provably consistent variance estimator accounting for both cross-validation and imputation uncertainty, and simulations demonstrate improved false positive control under various missingness mechanisms, including MNAR, as well as better performance in the PC algorithm for graph recovery.

Significance. Should the theoretical guarantees on calibration and consistency hold, this work offers a meaningful contribution to causal inference under missing data by providing a calibrated nonparametric test that avoids the pitfalls of separate imputation and testing steps. The reported simulation improvements, particularly the reduction in false positives from 28-45% to under 5% and SHD reductions up to 44% on larger graphs, indicate potential for enhancing reliability in practical applications of constraint-based causal discovery. The formal unification of cross-validation and multiple imputation inference frameworks is a notable strength.

major comments (2)
  1. [Abstract and §3] Abstract and §3: The claim that the paired permutation design forces imputation error to cancel exactly in the loss difference (thereby keeping the test statistic asymptotically pivotal under the null even for MNAR) is load-bearing for the calibration result. However, this appears to treat imputed values as fixed; under misspecified imputation models or MNAR depending on unobserved variables, the residual dependence may not cancel symmetrically, potentially invalidating the pivotality. Explicit assumptions or a detailed proof addressing this case are needed.
  2. [§4] §4: The consistency proof for the variance estimator, which jointly accounts for uncertainty from cross-validation and multiple imputation, relies on the same paired structure and cancellation property. If the imputation error is not mean-zero conditional on the observed data under general MNAR, the estimator's consistency may not hold, undermining the formal unification claim.
minor comments (1)
  1. The abstract mentions specific performance gains but lacks details on the number of simulations or exact parameter settings; adding these would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We appreciate the recognition of PAIR-CI's potential contribution. We address each major comment below with clarifications on the paired design and indicate revisions to strengthen the theoretical presentation.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3: The claim that the paired permutation design forces imputation error to cancel exactly in the loss difference (thereby keeping the test statistic asymptotically pivotal under the null even for MNAR) is load-bearing for the calibration result. However, this appears to treat imputed values as fixed; under misspecified imputation models or MNAR depending on unobserved variables, the residual dependence may not cancel symmetrically, potentially invalidating the pivotality. Explicit assumptions or a detailed proof addressing this case are needed.

    Authors: The paired permutation design applies the identical imputed values for the conditioning set to both the model including the candidate variable and the model excluding it. Consequently, any imputation error (whether from misspecification or MNAR depending on unobserved variables) enters the two loss functions in exactly the same way and subtracts out in their difference. Under the null, the expected loss difference is therefore driven solely by the candidate variable, preserving asymptotic pivotality. This cancellation does not require the imputation error to be mean-zero or the model to be correctly specified; it requires only that the same imputations are used for the paired losses. We will add an explicit assumption on identical application of the imputation procedure and include a detailed proof in the revised §3 and a new appendix establishing the result for general MNAR. revision: yes

  2. Referee: [§4] §4: The consistency proof for the variance estimator, which jointly accounts for uncertainty from cross-validation and multiple imputation, relies on the same paired structure and cancellation property. If the imputation error is not mean-zero conditional on the observed data under general MNAR, the estimator's consistency may not hold, undermining the formal unification claim.

    Authors: The variance estimator targets the variance of the paired loss difference. Because the imputation error is identical across the pair, it cancels in the difference; the estimator therefore consistently recovers the remaining variability from cross-validation and the multiple imputations. Consistency holds conditionally on the observed data even when the imputation error is not mean-zero, provided the paired structure is maintained. We will revise §4 to state this conditioning explicitly, derive the consistency result under general MNAR, and clarify the scope of the unification between cross-validation and multiple-imputation inference. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper's central construction defines PAIR-CI via a paired permutation design that shares the imputed conditioning set across include/exclude cross-validated models, then asserts that this forces imputation error to cancel in the loss difference (abstract). A separate variance estimator is introduced and labeled 'provably consistent' under the same structure. No quoted step reduces a claimed prediction or consistency result to a fitted parameter by construction, nor does any load-bearing premise rest on a self-citation whose content is unverified. The unification of cross-validation and multiple imputation is presented as a new formal step whose validity is internal to the paper's proofs and simulations rather than imported or renamed from prior author work. This meets the default expectation of a non-circular methodological contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on standard nonparametric consistency assumptions for CI tests and multiple imputation; no new free parameters, ad-hoc constants, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Standard nonparametric consistency assumptions for conditional independence tests and multiple imputation under the missingness mechanisms considered.
    Invoked to guarantee that the paired design cancels imputation error and that the variance estimator is consistent.

pith-pipeline@v0.9.0 · 5558 in / 1336 out tokens · 32233 ms · 2026-05-08T16:54:04.802928+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    and Xing, Eric P , journal=

    Zheng, Xun and Aragam, Bryon and Ravikumar, Pradeep K. and Xing, Eric P , journal=

  2. [2]

    2020 , organization=

    Zheng, Xun and Dan, Chen and Aragam, Bryon and Ravikumar, Pradeep and Xing, Eric , booktitle=. 2020 , organization=

  3. [3]

    1996 , publisher=

    Abramson, Bruce and Brown, John and Edwards, Ward and Murphy, Allan and Winkler, Robert L , journal=. 1996 , publisher=

  4. [4]

    1989 , publisher=

    Beinlich, Ingo A and Suermondt, Henri Jacques and Chavez, R Martin and Cooper, Gregory F , booktitle=. 1989 , publisher=

  5. [5]

    Biometrika , volume=

    Small-sample degrees of freedom with multiple imputation , author=. Biometrika , volume=. 1999 , publisher=

  6. [6]

    Advances in Neural Information Processing Systems , volume=

    Cross-validation confidence intervals for test error , author=. Advances in Neural Information Processing Systems , volume=

  7. [7]

    Advances in Neural Information Processing Systems , volume=

    Conditional independence testing using generative adversarial networks , author=. Advances in Neural Information Processing Systems , volume=

  8. [8]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    The conditional permutation test for independence while controlling for confounders , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2020 , publisher=

  9. [9]

    Machine learning , volume=

    Random forests , author=. Machine learning , volume=. 2001 , publisher=

  10. [10]

    Journal of the Royal Statistical Society Series A: Statistics in Society , volume=

    Causal discovery of gene regulation with incomplete data , author=. Journal of the Royal Statistical Society Series A: Statistics in Society , volume=. 2020 , publisher=

  11. [11]

    Machine learning , volume=

    Extremely randomized trees , author=. Machine learning , volume=. 2006 , publisher=

  12. [12]

    International Conference on Probabilistic Graphical Models (PGM) , pages=

    Structure Learning Under Missing Data , author=. International Conference on Probabilistic Graphical Models (PGM) , pages=

  13. [13]

    Gao, Erdun and Ng, Ignavier and Gong, Mingming and Shen, Li and Huang, Wei and Liu, Tongliang and Zhang, Kun and Bondell, Howard , journal=

  14. [14]

    Estimating high-dimensional directed acyclic graphs with the

    Kalisch, Markus and B. Estimating high-dimensional directed acyclic graphs with the. Journal of Machine Learning Research , volume=

  15. [15]

    UAI , pages =

    Meek, Christopher , title =. UAI , pages =

  16. [16]

    Journal of the American Statistical Association , volume=

    Graphical models for processing missing data , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=

  17. [17]

    Scikit-learn: Machine learning in

    Pedregosa, Fabian and Varoquaux, Ga. Scikit-learn: Machine learning in. the Journal of machine Learning research , volume=. 2011 , publisher=

  18. [18]

    Advances in Neural Information Processing Systems , volume=

    Inference for the generalization error , author=. Advances in Neural Information Processing Systems , volume=

  19. [19]

    Annual review of psychology , volume=

    Missing data analysis: Making it work in the real world , author=. Annual review of psychology , volume=. 2009 , publisher=

  20. [20]

    Biometrika , volume=

    Inference for imputation estimators , author=. Biometrika , volume=. 2000 , publisher=

  21. [21]

    Rubin, Donald B , title =

  22. [22]

    Science , volume=

    Causal protein-signaling networks derived from multiparameter single-cell data , author=. Science , volume=. 2005 , publisher=

  23. [23]

    The Annals of Statistics , number =

    Erwan Scornet and G. The Annals of Statistics , number =. 2015 , doi =

  24. [24]

    Advances in Neural Information Processing Systems , volume=

    Model-powered conditional independence test , author=. Advances in Neural Information Processing Systems , volume=

  25. [25]

    The Annals of Statistics , number =

    Rajen D Shah and Jonas Peters , title =. The Annals of Statistics , number =. 2020 , doi =

  26. [26]

    Spirtes, Peter and Glymour, Clark and Scheines, Richard , title =

  27. [27]

    Journal of Causal Inference , volume=

    Approximate kernel-based conditional independence tests for fast non-parametric causal discovery , author=. Journal of Causal Inference , volume=. 2019 , publisher=

  28. [28]

    The 22nd International Conference on Artificial Intelligence and Statistics , pages=

    Causal discovery in the presence of missing data , author=. The 22nd International Conference on Artificial Intelligence and Statistics , pages=. 2019 , organization=

  29. [29]

    Machine Learning , volume=

    Testing conditional independence in supervised learning algorithms , author=. Machine Learning , volume=. 2021 , publisher=

  30. [30]

    Statistics in medicine , volume=

    Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data , author=. Statistics in medicine , volume=. 2022 , publisher=

  31. [31]

    mice: Multivariate imputation by chained equations in

    Van Buuren, Stef and Groothuis-Oudshoorn, Karin , journal=. mice: Multivariate imputation by chained equations in

  32. [32]

    Kernel-based conditional independence test and application in causal discovery , year =

    Zhang, Kun and Peters, Jonas and Janzing, Dominik and Sch\". Kernel-based conditional independence test and application in causal discovery , year =. Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence , pages =

  33. [33]

    2009 , series =

    Introduction to Nonparametric Estimation , author =. 2009 , series =