arxiv: 2604.02488 · v1 · submitted 2026-04-02 · 💻 cs.LG

Recognition: no theorem link

Causal-Audit: A Framework for Risk Assessment of Assumption Violations in Time-Series Causal Discovery

Marco Ruiz , Miguel Arana-Catania , David R. Ardila , Rodrigo Ventura

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:56 UTC · model grok-4.3

classification 💻 cs.LG

keywords time-series causal discoveryassumption violationrisk assessmentcalibrated scoresabstention policyeffect-size diagnosticsPCMCI+

0 comments

The pith

Causal-Audit turns assumption checks into calibrated risk scores for time-series causal discovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents Causal-Audit as a way to assess the risk that key assumptions are violated when discovering causal structures in time series data. The framework runs diagnostics on stationarity, sampling patterns, dependence persistence, nonlinearity, and possible confounders. It combines these into risk scores that include uncertainty measures and uses them to recommend or abstain from using specific causal methods. On a large set of synthetic examples, the scores are highly accurate at identifying problematic cases and reduce erroneous recommendations substantially. The same decisions align with known specifications in external benchmark collections.

Core claim

The central claim is that assumption violations in time-series causal discovery can be formalized as a risk assessment problem where effect-size diagnostics from five families are aggregated into four calibrated risk scores with uncertainty intervals, enabling an abstention-aware policy that only recommends methods such as PCMCI+ when the data supports reliable inference.

What carries the argument

Effect-size diagnostics aggregated into calibrated risk scores with uncertainty intervals that drive an abstention-aware decision policy.

Load-bearing premise

The 500 synthetic data-generating processes spanning 10 violation families sufficiently represent the assumption violations that appear in real time-series data.

What would settle it

A test on real-world time-series datasets with documented assumption violations where the framework's risk scores fail to predict poor performance of causal methods or incorrectly abstain from reliable ones.

Figures

Figures reproduced from arXiv: 2604.02488 by David R. Ardila, Marco Ruiz, Miguel Arana-Catania, Rodrigo Ventura.

**Figure 1.** Figure 1: Framework overview. Tier 1 (Stage I alone) provides automatic diagnostics d across five assumption families for expert-guided assumption auditing. Tier 2 (Stages I–III) adds calibrated risk estimation with uncertainty intervals and an abstention-aware decision policy that recommends using or abstaining from a method m∗ according to its risk score R. Despite the availability of causal discovery tools such a… view at source ↗

**Figure 2.** Figure 2: Assumption violations in causal discovery. Each row shows data violating an assumption (left) and the resulting causal graph with true and erroneous edges (right; see legend). Individual tests exist for stationarity (ADF [42], KPSS [43]), structural breaks [44], missingness patterns [45], and autocorrelation [46], but these diagnostics are typically applied in isolation, yielding binary decisions rather t… view at source ↗

**Figure 3.** Figure 3: Time series causal graph G = (V, E) for N = 3 variables with τmax = 2. (a) Timeline representation: arrows between variable timelines encode causal effects; horizontal span equals the lag τ. Each edge repeats at every time step (stationarity). (b) Summary causal graph: each directed edge is annotated with its lag, corresponding to the triple (i, j, τ) ∈ E. A catalog of causal discovery methods M = {m1, .… view at source ↗

**Figure 4.** Figure 4: Detailed flowcharts for each pipeline stage. (a) Stage I: Diagnostic Auditing computes five diagnostic families from input X, producing the diagnostic vector d = [d1, . . . , d5]. (b) Stage II: Risk Estimation transforms diagnostics into calibrated risk scores via logistic aggregation, isotonic calibration, and bootstrap uncertainty quantification. (c) Stage III: Decision Policy evaluates thresholds to out… view at source ↗

**Figure 5.** Figure 5: Sigmoid mapping from linear predictor z to risk probability Rk ∈ [0, 1] for VAR-Granger and PCMCI+ methods. Shaded regions illustrate decision zones for nonstationarity risk (Rnonstat) using the hard constraints from [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: SHAP feature attribution analysis. Bar lengths indicate mean absolute SHAP values quantifying each diagnostic’s contribution to risk predictions across the Synthetic DGP Atlas (396 calibration datasets). Individual risks are aggregated using worst-case aggregation: Rcomposite = max(Rnonstat, Rirreg, Rpersist, Rconfound) (8) This conservative choice reflects that a severe violation in any single dimension s… view at source ↗

**Figure 7.** Figure 7: Heatmap of mean risk scores across the 10 DGP families (columns) and four calibrated risk dimensions (rows). Cell values report family-level averages computed using hybrid labelling: primary dimensions retain generator-assigned labels; off-diagonal dimensions are measured empirically from the data (confounding proxy baseline-calibrated to F1 ≈ 0.20). Within the core sub-block F2–F5, diagonal dominance hol… view at source ↗

**Figure 8.** Figure 8: Distribution of empirically measured primary risk scores for the core families F2–F5 (n = 50 each), corresponding to the four calibrated risk dimensions. Scores are computed from the generated data using Stage I diagnostic statistics (confounding proxy baseline-calibrated). Dashed lines indicate family means; annotations report observed ranges. All four families exhibit continuous severity gradations, conf… view at source ↗

**Figure 9.** Figure 9: Reliability diagrams for four risk dimensions on held-out validation set (100 DGPs). Points clustered along the diagonal indicate well-calibrated predictions. Shaded regions denote 95% confidence intervals. A comparison with baselines across violation severity strata ( [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Cross-validation performance stability analysis comparing 5-fold, 10-fold, and bootstrap resampling schemes across AUROC, R2, and MAE for the four risk dimensions. Error bars denote ±1 standard deviation; dashed lines indicate target thresholds. used only for evaluation, shows modest degradation (AUROC 0.974 to 0.918), indicating that the learned calibration transfers imperfectly to unseen violation combi… view at source ↗

read the original abstract

Time-series causal discovery methods rely on assumptions such as stationarity, regular sampling, and bounded temporal dependence. When these assumptions are violated, structure learning can produce confident but misleading causal graphs without warning. We introduce Causal-Audit, a framework that formalizes assumption validation as calibrated risk assessment. The framework computes effect-size diagnostics across five assumption families (stationarity, irregularity, persistence, nonlinearity, and confounding proxies), aggregates them into four calibrated risk scores with uncertainty intervals, and applies an abstention-aware decision policy that recommends methods (e.g., PCMCI+, VAR-based Granger causality) only when evidence supports reliable inference. The semi-automatic diagnostic stage can also be used independently for structured assumption auditing in individual studies. Evaluation on a synthetic atlas of 500 data-generating processes (DGPs) spanning 10 violation families demonstrates well-calibrated risk scores (AUROC > 0.95), a 62% false positive reduction among recommended datasets, and 78% abstention on severe-violation cases. On 21 external evaluations from TimeGraph (18 categories) and CausalTime (3 domains), recommend-or-abstain decisions are consistent with benchmark specifications in all cases. An open-source implementation of our framework is available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Causal-Audit adds a practical auditing layer with risk scores and abstention for time-series causal discovery, but its calibration rests on synthetic data.

read the letter

The key point is that this paper gives a structured auditing framework for spotting when time-series causal discovery methods are likely to fail due to violated assumptions. It computes diagnostics on five families like stationarity and nonlinearity, turns them into four risk scores with uncertainty, and has a policy to abstain from recommending a method if risks are high. What works well is the evaluation setup. The synthetic atlas with 500 DGPs across 10 violation types shows strong calibration with AUROC above 0.95, cuts false positives by 62 percent on recommended cases, and abstains 78 percent of the time on severe violations. The consistency check on 21 external cases from TimeGraph and CausalTime is a nice addition, and releasing the code makes it usable right away. The soft spot is the heavy reliance on synthetic data for claiming calibration. The risk scores are tuned and tested where violations are known and controlled. The real-data part only verifies that decisions align with benchmark expectations, not that the scores stay accurate when the underlying violation structure is hidden or different. If the effect-size measures don't generalize, the abstention policy could either over-abstain or give false confidence on messy real series. Minor point: the exact aggregation into risk scores and how uncertainty intervals are calculated aren't detailed in the abstract, though the full paper presumably has them. This is aimed at people doing applied time-series causal work who need a guardrail against overconfident graphs. It fills a practical gap even if it doesn't solve the underlying identifiability issues. I would send it for peer review. The idea is straightforward and the synthetic results are convincing enough to merit referee input, particularly on whether the framework holds up beyond the atlas.

Referee Report

2 major / 2 minor

Summary. The paper introduces Causal-Audit, a framework that formalizes assumption validation for time-series causal discovery methods (e.g., PCMCI+, VAR-based Granger causality) as calibrated risk assessment. It computes effect-size diagnostics across five assumption families (stationarity, irregularity, persistence, nonlinearity, confounding proxies), aggregates them into four risk scores with uncertainty intervals, and applies an abstention-aware policy that recommends methods only when evidence supports reliable inference. The semi-automatic diagnostics can be used independently. Evaluation on a synthetic atlas of 500 DGPs spanning 10 violation families reports AUROC > 0.95 for well-calibrated risk scores, 62% false-positive reduction among recommended datasets, and 78% abstention on severe-violation cases; decisions on 21 external TimeGraph/CausalTime cases are consistent with benchmark specifications. An open-source implementation is provided.

Significance. If the risk scores prove well-calibrated and generalizable, the framework would address a critical gap by providing structured diagnostics that prevent overconfident but misleading causal graphs from violated assumptions. The synthetic atlas evaluation is comprehensive in scale, the abstention policy is practically useful, and the open-source release enables reproducibility and extension. This could improve reliability in applied domains relying on time-series causal inference.

major comments (2)

[§3] §3 (Methods): The manuscript does not provide the explicit formulas for the effect-size diagnostics, the precise aggregation rules that produce the four risk scores, or the derivation of the uncertainty intervals. Without these, it is impossible to verify that the reported AUROC > 0.95 reflects genuine calibration rather than construction within the synthetic atlas.
[§5] §5 (Evaluation): All calibration metrics (AUROC > 0.95, 62% false-positive reduction, 78% abstention) are obtained exclusively on the synthetic atlas of 500 DGPs where the 10 violation families are explicitly parameterized. The external check on 21 TimeGraph/CausalTime cases only verifies consistency with benchmark specifications and does not test whether the risk scores remain calibrated on data whose violation structure lies outside the atlas families. This is load-bearing for the central claim of well-calibrated, generalizable risk assessment.

minor comments (2)

[Abstract, §4] The abstract and §4 refer to 'four calibrated risk scores' without naming them or linking them to the five assumption families; a table or explicit mapping would improve clarity.
[§3.2] Notation for the uncertainty intervals around risk scores is introduced without a dedicated definition or example computation; this should be added for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and outline the revisions we will incorporate.

read point-by-point responses

Referee: [§3] §3 (Methods): The manuscript does not provide the explicit formulas for the effect-size diagnostics, the precise aggregation rules that produce the four risk scores, or the derivation of the uncertainty intervals. Without these, it is impossible to verify that the reported AUROC > 0.95 reflects genuine calibration rather than construction within the synthetic atlas.

Authors: We agree that the original submission presented the diagnostics at a conceptual level without the full mathematical details. In the revised manuscript we will expand §3 to include: (i) the explicit formulas for each effect-size diagnostic across the five assumption families, (ii) the precise aggregation functions (including weights and normalization) that produce the four risk scores, and (iii) the bootstrap-based derivation of the uncertainty intervals. These additions will allow readers to reproduce and verify the calibration results independently of the synthetic atlas. revision: yes
Referee: [§5] §5 (Evaluation): All calibration metrics (AUROC > 0.95, 62% false-positive reduction, 78% abstention) are obtained exclusively on the synthetic atlas of 500 DGPs where the 10 violation families are explicitly parameterized. The external check on 21 TimeGraph/CausalTime cases only verifies consistency with benchmark specifications and does not test whether the risk scores remain calibrated on data whose violation structure lies outside the atlas families. This is load-bearing for the central claim of well-calibrated, generalizable risk assessment.

Authors: We acknowledge that the quantitative calibration metrics are derived from the synthetic atlas and that the 21 external cases provide only a consistency check rather than a full out-of-distribution calibration test, as ground-truth violation labels are unavailable for those benchmarks. In the revision we will add explicit discussion in §5 and a dedicated limitations paragraph clarifying the scope of the evaluation, the design rationale for the atlas (covering 10 parameterized families), and the need for future labeled real-world data to assess generalization beyond the atlas. We will qualify the generalizability claims accordingly while retaining the synthetic results as the primary calibration evidence. revision: partial

Circularity Check

0 steps flagged

No load-bearing circularity; risk scores derived independently and evaluated without reduction by construction

full rationale

The framework computes effect-size diagnostics across assumption families and aggregates them into four calibrated risk scores with uncertainty intervals, followed by an abstention policy. These quantities are evaluated on an author-generated synthetic atlas of 500 DGPs spanning 10 violation families, producing AUROC > 0.95, 62% false-positive reduction, and 78% abstention rates. No equation or step in the provided derivation shows the risk scores or calibration reducing to fitted inputs by construction within the same paper. The consistency check on 21 external TimeGraph/CausalTime cases supplies independent verification against benchmark specifications. This keeps the central claim self-contained with only minor evaluation dependence on constructed data, warranting a low circularity score of 2.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that the chosen five assumption families and four aggregated risk scores capture the main failure modes of existing causal discovery algorithms; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption The five assumption families (stationarity, irregularity, persistence, nonlinearity, confounding proxies) cover the primary violations relevant to time-series causal discovery.
Framework design treats these families as the basis for all diagnostics.

pith-pipeline@v0.9.0 · 5525 in / 1210 out tokens · 35164 ms · 2026-05-13T20:56:30.240339+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages

[1]

Causality: Models, Reasoning, and Inference

Pearl J. Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge, UK: Cambridge University Press; 2009

work page 2009
[2]

Causal inference for time series

Runge J, Gerhardus A, Varando G, Eyring V, Camps-Valls G. Causal inference for time series. Nature Reviews Earth & Environment. 2023;4(7):487-505

work page 2023
[3]

Assessing the Significance of Directed and Multivariate Mea- sures of Linear Dependence Between Time Series

Cliff OM, Novelli L, Fulcher BD, Shine JM, Lizier JT. Assessing the Significance of Directed and Multivariate Mea- sures of Linear Dependence Between Time Series. Physical Review Research. 2020;2(1):013006. 3https://github.com/marcoruizrueda/causal-audit 26 Marco Ruiz, Miguel Arana-Catania, David R. Ardila, and Rodrigo Ventura, Causal-Audit

work page 2020
[4]

Spurious Regressions in Econometrics

Granger CWJ, Newbold P. Spurious Regressions in Econometrics. Journal of Econometrics. 1974;2(2):111-20

work page 1974
[5]

Elements of Causal Inference: Foundations and Learning Algorithms

Peters J, Janzing D, Schölkopf B. Elements of Causal Inference: Foundations and Learning Algorithms. Cambridge, MA: MIT Press; 2017

work page 2017
[6]

Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets

Runge J, Nowack P, Kretschmer M, Flaxman S, Sejdinovic D. Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets. Science Advances. 2019;5(11):eaau4996

work page 2019
[7]

A DCM for Resting State fMRI

Friston KJ, Kahan J, Biswal B, Razi A. A DCM for Resting State fMRI. NeuroImage. 2014;94:396-407

work page 2014
[8]

Granger Causality: A Review and Recent Advances

Shojaie A, Fox EB. Granger Causality: A Review and Recent Advances. Annual Review of Statistics and Its Applica- tion. 2022;9:289-319

work page 2022
[9]

Vector Autoregressions

Stock JH, Watson MW. Vector Autoregressions. Journal of Economic Perspectives. 2001;15(4):101-15

work page 2001
[10]

causal-learn: Causal Discovery in Python

Squires C, Yun T, Nichani E, Agrawal R, Uhler C. causal-learn: Causal Discovery in Python. Journal of Machine Learning Research. 2023;24(225):1-8. Available from: http://jmlr.org/papers/v24/23-0125.html

work page 2023
[11]

Statistical Decision Theory and Bayesian Analysis

Berger JO. Statistical Decision Theory and Bayesian Analysis. 2nd ed. New York: Springer; 1985

work page 1985
[12]

Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets

Runge J. Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. In: Peters J, Sontag D, editors. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI). vol. 124 of Proceedings of Machine Learning Research. PMLR; 2020. p. 1388-97. Available from: https://proceedings. mlr.press/...

work page 2020
[13]

Investigating causal relations by econometric models and cross-spectral methods

Granger CWJ. Investigating causal relations by econometric models and cross-spectral methods. Econometrica. 1969;37(3):424-38

work page 1969
[14]

Review of Causal Discovery Methods Based on Graphical Models

Glymour C, Zhang K, Spirtes P. Review of Causal Discovery Methods Based on Graphical Models. Frontiers in Genet- ics. 2019;Volume 10 - 2019

work page 2019
[15]

A Survey of Learning Causality with Data: Problems and Methods

Guo R, Cheng L, Li J, Hahn PR, Liu H. A Survey of Learning Causality with Data: Problems and Methods. ACM Comput Surv. 2020 Jul;53(4). Available from: https://doi.org/10.1145/3397269

work page doi:10.1145/3397269 2020
[16]

Causal Discovery from Temporal Data: An Overview and New Perspec- tives

Gong C, Zhang C, Yao D, Bi J, Li W, Xu Y. Causal Discovery from Temporal Data: An Overview and New Perspec- tives. ACM Comput Surv. 2024 Dec;57(4)

work page 2024
[17]

Causation, Prediction, and Search

Spirtes P, Glymour CN, Scheines R. Causation, Prediction, and Search. 2nd ed. Cambridge, MA: MIT Press; 2001

work page 2001
[18]

D’ya Like DAGs? A Survey on Structure Learning and Causal Discovery

Vowels MJ, Camgoz NC, Bowden R. D’ya Like DAGs? A Survey on Structure Learning and Causal Discovery. ACM Comput Surv. 2022 Nov;55(4)

work page 2022
[19]

High-recall causal discovery for autocorrelated time series with latent confounders

Gerhardus A, Runge J. High-recall causal discovery for autocorrelated time series with latent confounders. In: Ad- vances in Neural Information Processing Systems. vol. 33. Curran Associates, Inc.; 2020. p. 12615-25

work page 2020
[20]

DYNOTEARS: Structure Learning from Time-Series Data

Pamfil R, Sriwattanaworachai N, Desai S, Pilgerstorfer P, Beaumont P, Georgatzis K, et al. DYNOTEARS: Structure Learning from Time-Series Data. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. vol. 108 of Proceedings of Machine Learning Research. PMLR; 2020. p. 1595-605. Available from: https: //proceedings.m...

work page 2020
[21]

DAGs with NO TEARS: Continuous optimization for structure learning

Zheng X, Aragam B, Ravikumar P, Xing EP. DAGs with NO TEARS: Continuous optimization for structure learning. In: Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc.; 2018. p. 9472-83

work page 2018
[22]

Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-Gaussianity

Hyvärinen A, Shimizu S, Hoyer PO. Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-Gaussianity. In: Proceedings of the 25th International Conference on Machine Learning. ICML ’08. Association for Computing Machinery; 2008. p. 424-31

work page 2008
[23]

Causal Discovery with Attention-Based Convolutional Neural Networks

Nauta M, Bucur D, Seifert C. Causal Discovery with Attention-Based Convolutional Neural Networks. Machine Learning and Knowledge Extraction. 2019;1(1):312-40. Available from: https://www.mdpi.com/2504-4990/1/1/19

work page 2019
[24]

Measuring information transfer

Schreiber T. Measuring information transfer. Physical Review Letters. 2000;85(2):461-4

work page 2000
[25]

Partial transfer entropy on rank vectors

Kugiumtzis D. Partial transfer entropy on rank vectors. The European Physical Journal Special Topics. 2013;222(2):401-20

work page 2013
[26]

Discovering Temporal Causal Relations from Subsampled Data

Gong M, Zhang K, Schölkopf B, Tao D, Geiger P. Discovering Temporal Causal Relations from Subsampled Data. In: Proceedings of the 32nd International Conference on Machine Learning (ICML). vol. 37 of Proceedings of Machine Learning Research. PMLR; 2015. p. 1898-906

work page 2015
[27]

Causal discovery from heteroge- neous/nonstationary data

Huang B, Zhang K, Zhang J, Ramsey J, Sanchez-Romero R, Glymour C, et al. Causal discovery from heteroge- neous/nonstationary data. Journal of Machine Learning Research. 2020 Jan;21(1)

work page 2020
[28]

Causal Discovery from Condition- ally Stationary Time Series

Balsells-Rodas C, Sumba X, Narendra T, Tu R, Schweikert G, Kjellström H, et al. Causal Discovery from Condition- ally Stationary Time Series. In: Forty-second International Conference on Machine Learning. vol. 267 of Proceed- ings of Machine Learning Research. PMLR; 2025. p. 2715-41. Available from: https://openreview.net/forum?id= j88QAtutwW

work page 2025
[29]

The Robustness of Differentiable Causal Discovery in Misspecified Scenarios

Yi H, He Y, Chen D, Kang M, Wang H, Yu W. The Robustness of Differentiable Causal Discovery in Misspecified Scenarios. In: The Thirteenth International Conference on Learning Representations; 2025. p. 1-24. Available from: https://openreview.net/forum?id=iaP7yHRq1l

work page 2025
[30]

Scalable Causal Discovery with Score Matching

Montagna F, Noceti N, Rosasco L, Zhang K, Locatello F. Scalable Causal Discovery with Score Matching. In: Ad- vances in Neural Information Processing Systems. vol. 36; 2023. p. 12640-54

work page 2023
[31]

Assumption violations in causal discovery and the robustness of score matching

Montagna F, Mastakouri A, Eulig E, Noceti N, Rosasco L, Janzing D, et al. Assumption violations in causal discovery and the robustness of score matching. In: Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, editors. Advances in Neural Information Processing Systems. vol. 36. Curran Associates, Inc.; 2023. p. 47339-78. Marco Ruiz, Miguel Arana-Ca...

work page 2023
[32]

TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations

Stein G, Penzel N, Piater T, Denzler J. TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations. In: The Fourteenth International Conference on Learning Representations; 2026. p. 1-31. Available from: https://openreview.net/forum?id=MtdrOCLAGY

work page 2026
[33]

Understanding Spurious Regressions in Econometrics

Phillips PCB. Understanding Spurious Regressions in Econometrics. Journal of Econometrics. 1986;33(3):311-40

work page 1986
[34]

Comparison of correlation analysis techniques for irregularly sampled time series

Rehfeld K, Marwan N, Heitzig J, Kurths J. Comparison of correlation analysis techniques for irregularly sampled time series. Nonlinear Processes in Geophysics. 2011;18(3):389-404

work page 2011
[35]

Statistical Analysis with Missing Data

Little RJA, Rubin DB. Statistical Analysis with Missing Data. 3rd ed. John Wiley & Sons; 2019

work page 2019
[36]

Causal Inference: A Missing Data Perspective

Ding P, Li F. Causal Inference: A Missing Data Perspective. Statistical Science. 2018;33(2):214-37

work page 2018
[37]

effective

Bayley GV, Hammersley JM. The “effective” number of independent observations in an autocorrelated time series. Supplement to the Journal of the Royal Statistical Society. 1946;8(2):184-97

work page 1946
[38]

The interpretation and estimation of effective sample size

Thiébaux HJ, Zwiers FW. The interpretation and estimation of effective sample size. Journal of Climate and Applied Meteorology. 1984;23(5):800-11

work page 1984
[39]

The hardness of conditional independence testing and the generalised covariance measure

Shah RD, Peters J. The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics. 2020;48(3):1514 1538

work page 2020
[40]

Ridge Regression: Biased Estimation for Nonorthogonal Problems

Hoerl AE, Kennard RW. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 1970;12(1):55-67

work page 1970
[41]

A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity

White H. A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica. 1980;48(4):817-38

work page 1980
[42]

Distribution of the Estimators for Autoregressive Time Series with a Unit Root

Dickey DA, Fuller WA. Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association. 1979;74(366):427-31

work page 1979
[43]

Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root

Kwiatkowski D, Phillips PCB, Schmidt P, Shin Y. Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root. Journal of Econometrics. 1992;54(1–3):159-78

work page 1992
[44]

Computation and Analysis of Multiple Structural Change Models

Bai J, Perron P. Computation and Analysis of Multiple Structural Change Models. Journal of Applied Econometrics. 2003;18(1):1-22

work page 2003
[45]

A Test of Missing Completely at Random for Multivariate Data with Missing Values

Little RJA. A Test of Missing Completely at Random for Multivariate Data with Missing Values. Journal of the American Statistical Association. 1988;83(404):1198-202

work page 1988
[46]

On a Measure of Lack of Fit in Time Series Models

Ljung GM, Box GEP. On a Measure of Lack of Fit in Time Series Models. Biometrika. 1978;65(2):297-303

work page 1978
[47]

Learning Bayesian networks: The combination of knowledge and statistical data

Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning. 1995;20(3):197-243

work page 1995
[48]

Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks

Friedman N, Koller D. Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning. 2003;50(1):95-125

work page 2003
[49]

A Million Variables and More: The Fast Greedy Equivalence Search Algorithm for Learning High-Dimensional Graphical Causal Models

Ramsey JD, Glymour M, Sanchez-Romero R, Glymour C. A Million Variables and More: The Fast Greedy Equivalence Search Algorithm for Learning High-Dimensional Graphical Causal Models. International Journal of Data Science and Analytics. 2018;3(2):121-9

work page 2018
[50]

Bootstrap Aggregation and Confidence Measures to Improve Time Series Causal Discovery

Debeire K, Gerhardus A, Runge J, Eyring V. Bootstrap Aggregation and Confidence Measures to Improve Time Series Causal Discovery. In: Proceedings of the Third Conference on Causal Learning and Reasoning. vol. 236 of Proceed- ings of Machine Learning Research. PMLR; 2024. p. 979-1007. Available from: https://proceedings.mlr.press/v236/ debeire24a.html

work page 2024
[51]

Transforming Classifier Scores into Accurate Multiclass Probability Estimates

Zadrozny B, Elkan C. Transforming Classifier Scores into Accurate Multiclass Probability Estimates. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002. p. 694-9

work page 2002
[52]

Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression

Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression. John Wiley & Sons; 1972

work page 1972
[53]

Classifier Calibration with ROC-Regularized Isotonic Regression

Berta E, Bach F, Jordan MI. Classifier Calibration with ROC-Regularized Isotonic Regression. In: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics. vol. 238 of Proceedings of Machine Learning Research. PMLR; 2024. p. 1972-80. Available from: https://proceedings.mlr.press/v238/berta24a.html

work page 2024
[54]

The Well-Calibrated Bayesian

Dawid AP. The Well-Calibrated Bayesian. Journal of the American Statistical Association. 1982;77(379):605-10

work page 1982
[55]

The Comparison and Evaluation of Forecasters

DeGroot MH, Fienberg SE. The Comparison and Evaluation of Forecasters. The Statistician. 1983;32(1–2):12-22

work page 1983
[56]

On Optimum Recognition Error and Reject Tradeoff

Chow CK. On Optimum Recognition Error and Reject Tradeoff. IEEE Transactions on Information Theory. 1970;16(1):41-6

work page 1970
[57]

On the Foundations of Noise-Free Selective Classification

El-Yaniv R, Wiener Y. On the Foundations of Noise-Free Selective Classification. Journal of Machine Learning Re- search. 2010;11:1605-41

work page 2010
[58]

Time Series Analysis

Hamilton JD. Time Series Analysis. Princeton, NJ: Princeton University Press; 1994

work page 1994
[59]

Theory of Games and Economic Behavior

Von Neumann J, Morgenstern O. Theory of Games and Economic Behavior. Princeton, NJ: Princeton University Press; 1944

work page 1944
[60]

The foundations of statistics

Savage LJ. The foundations of statistics. New York: John Wiley & Sons; 1954

work page 1954
[61]

Decision Theory: Principles and Approaches

Parmigiani G, Inoue LYT. Decision Theory: Principles and Approaches. Chichester, UK: John Wiley & Sons; 2009

work page 2009
[62]

Selective classification for deep neural networks

Geifman Y, El-Yaniv R. Selective classification for deep neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.; 2017. p. 4885–4894. 28 Marco Ruiz, Miguel Arana-Catania, David R. Ardila, and Rodrigo Ventura, Causal-Audit

work page 2017
[63]

On Calibration of Modern Neural Networks

Guo C, Pleiss G, Sun Y, Weinberger KQ. On Calibration of Modern Neural Networks. In: Proceedings of the 34th International Conference on Machine Learning (ICML). vol. 70. PMLR; 2017. p. 1321-30

work page 2017
[64]

Predicting Good Probabilities with Supervised Learning

Niculescu-Mizil A, Caruana R. Predicting Good Probabilities with Supervised Learning. In: Proceedings of the 22nd International Conference on Machine Learning (ICML); 2005. p. 625-32

work page 2005
[65]

The Control of the False Discovery Rate in Multiple Testing Under Dependency

Benjamini Y, Yekutieli D. The Control of the False Discovery Rate in Multiple Testing Under Dependency. Annals of Statistics. 2001;29(4):1165-88

work page 2001
[66]

The Pivot Algorithm: A Highly Efficient Monte Carlo Method for the Self-Avoiding Walk

Madras N, Sokal AD. The Pivot Algorithm: A Highly Efficient Monte Carlo Method for the Self-Avoiding Walk. Journal of Statistical Physics. 1988;50(1):109-86

work page 1988
[67]

Statistical Power Analysis for the Behavioral Sciences

Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Lawrence Erlbaum Associates; 1988

work page 1988
[68]

Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation

Marquardt DW. Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technomet- rics. 1970;12(3):591-612

work page 1970
[69]

Tests of Equality Between Sets of Coefficients in Two Linear Regressions

Chow GC. Tests of Equality Between Sets of Coefficients in Two Linear Regressions. Econometrica. 1960;28(3):591- 605

work page 1960
[70]

A caution regarding rules of thumb for variance inflation factors

O’Brien RM. A caution regarding rules of thumb for variance inflation factors. Quality & Quantity. 2007;41(5):673- 90

work page 2007
[71]

A Redefined Variance Inflation Factor: Overcoming the Limitations of the Variance Inflation Factor

Salmerón R, García C, García J. A Redefined Variance Inflation Factor: Overcoming the Limitations of the Variance Inflation Factor. Computational Economics. 2025;65(1):337-63

work page 2025
[72]

Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods

Platt J. Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In: Advances in Large Margin Classifiers. MIT Press; 1999. p. 61-74

work page 1999
[73]

An Introduction to the Bootstrap

Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman & Hall/CRC; 1993

work page 1993
[74]

The Stationary Bootstrap

Politis DN, Romano JP. The Stationary Bootstrap. Journal of the American Statistical Association. 1994;89(428):1303-13

work page 1994
[75]

A Unified Approach to Interpreting Model Predictions

Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. In: Advances in Neural Information Processing Systems. vol. 30; 2017. p. 4765-74

work page 2017
[76]

New Introduction to Multiple Time Series Analysis

Lütkepohl H. New Introduction to Multiple Time Series Analysis. Springer; 2005

work page 2005
[77]

Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data

Lawrence AR, Kaiser M, Sampaio R, Sipos M. Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data. In: Causal Discovery & Causality-Inspired Machine Learning Workshop at Neural Information Processing Systems; 2021. p. 1-26

work page 2021
[78]

TimeGraph: Synthetic Benchmark Datasets for Robust Time-Series Causal Dis- covery

Ferdous MH, Hossain E, Gani MO. TimeGraph: Synthetic Benchmark Datasets for Robust Time-Series Causal Dis- covery. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’25). Toronto, ON, Canada: ACM; 2025. p. 1-11

work page 2025
[79]

CausalTime: Realistically Generated Time-series for Benchmark- ing of Causal Discovery

Cheng Y, Wang Z, Xiao T, Zhong Q, Suo J, He K. CausalTime: Realistically Generated Time-series for Benchmark- ing of Causal Discovery. In: Proceedings of the Twelfth International Conference on Learning Representations; 2024. p. 1-22. Available from: https://openreview.net/forum?id=iad1yyyGme

work page 2024
[80]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Hendrycks D, Dietterich T. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. In: Proceedings of the International Conference on Learning Representations (ICLR); 2019. p. 1-15

work page 2019

Showing first 80 references.