pith. machine review for the scientific record. sign in

arxiv: 2604.02488 · v1 · submitted 2026-04-02 · 💻 cs.LG

Recognition: no theorem link

Causal-Audit: A Framework for Risk Assessment of Assumption Violations in Time-Series Causal Discovery

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:56 UTC · model grok-4.3

classification 💻 cs.LG
keywords time-series causal discoveryassumption violationrisk assessmentcalibrated scoresabstention policyeffect-size diagnosticsPCMCI+
0
0 comments X

The pith

Causal-Audit turns assumption checks into calibrated risk scores for time-series causal discovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents Causal-Audit as a way to assess the risk that key assumptions are violated when discovering causal structures in time series data. The framework runs diagnostics on stationarity, sampling patterns, dependence persistence, nonlinearity, and possible confounders. It combines these into risk scores that include uncertainty measures and uses them to recommend or abstain from using specific causal methods. On a large set of synthetic examples, the scores are highly accurate at identifying problematic cases and reduce erroneous recommendations substantially. The same decisions align with known specifications in external benchmark collections.

Core claim

The central claim is that assumption violations in time-series causal discovery can be formalized as a risk assessment problem where effect-size diagnostics from five families are aggregated into four calibrated risk scores with uncertainty intervals, enabling an abstention-aware policy that only recommends methods such as PCMCI+ when the data supports reliable inference.

What carries the argument

Effect-size diagnostics aggregated into calibrated risk scores with uncertainty intervals that drive an abstention-aware decision policy.

Load-bearing premise

The 500 synthetic data-generating processes spanning 10 violation families sufficiently represent the assumption violations that appear in real time-series data.

What would settle it

A test on real-world time-series datasets with documented assumption violations where the framework's risk scores fail to predict poor performance of causal methods or incorrectly abstain from reliable ones.

Figures

Figures reproduced from arXiv: 2604.02488 by David R. Ardila, Marco Ruiz, Miguel Arana-Catania, Rodrigo Ventura.

Figure 1
Figure 1. Figure 1: Framework overview. Tier 1 (Stage I alone) provides automatic diagnostics d across five assumption families for expert-guided assumption auditing. Tier 2 (Stages I–III) adds calibrated risk estimation with uncertainty intervals and an abstention-aware decision policy that recommends using or abstaining from a method m∗ according to its risk score R. Despite the availability of causal discovery tools such a… view at source ↗
Figure 2
Figure 2. Figure 2: Assumption violations in causal discovery. Each row shows data violating an assumption (left) and the resulting causal graph with true and erroneous edges (right; see legend). Individual tests exist for stationarity (ADF [42], KPSS [43]), structural breaks [44], missingness pat￾terns [45], and autocorrelation [46], but these diagnostics are typically applied in isolation, yielding binary decisions rather t… view at source ↗
Figure 3
Figure 3. Figure 3: Time series causal graph G = (V, E) for N = 3 variables with τmax = 2. (a) Timeline representation: arrows be￾tween variable timelines encode causal effects; horizontal span equals the lag τ. Each edge repeats at every time step (sta￾tionarity). (b) Summary causal graph: each directed edge is annotated with its lag, corresponding to the triple (i, j, τ) ∈ E. A catalog of causal discovery methods M = {m1, .… view at source ↗
Figure 4
Figure 4. Figure 4: Detailed flowcharts for each pipeline stage. (a) Stage I: Diagnostic Auditing computes five diagnostic families from input X, producing the diagnostic vector d = [d1, . . . , d5]. (b) Stage II: Risk Estimation transforms diagnostics into calibrated risk scores via logistic aggregation, isotonic calibration, and bootstrap uncertainty quantification. (c) Stage III: Decision Policy evaluates thresholds to out… view at source ↗
Figure 5
Figure 5. Figure 5: Sigmoid mapping from linear predictor z to risk probability Rk ∈ [0, 1] for VAR-Granger and PCMCI+ methods. Shaded regions illustrate decision zones for nonstationarity risk (Rnonstat) using the hard constraints from [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: SHAP feature attribution analysis. Bar lengths indicate mean absolute SHAP values quantifying each diagnostic’s contribution to risk predictions across the Synthetic DGP Atlas (396 calibration datasets). Individual risks are aggregated using worst-case aggregation: Rcomposite = max(Rnonstat, Rirreg, Rpersist, Rconfound) (8) This conservative choice reflects that a severe violation in any single dimension s… view at source ↗
Figure 7
Figure 7. Figure 7: Heatmap of mean risk scores across the 10 DGP families (columns) and four calibrated risk dimensions (rows). Cell values report family-level averages computed using hybrid labelling: primary dimensions retain generator-assigned la￾bels; off-diagonal dimensions are measured empirically from the data (confounding proxy baseline-calibrated to F1 ≈ 0.20). Within the core sub-block F2–F5, diagonal dominance hol… view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of empirically measured primary risk scores for the core families F2–F5 (n = 50 each), corresponding to the four calibrated risk dimensions. Scores are computed from the generated data using Stage I diagnostic statistics (confounding proxy baseline-calibrated). Dashed lines indicate family means; annotations report observed ranges. All four families exhibit continuous severity gradations, conf… view at source ↗
Figure 9
Figure 9. Figure 9: Reliability diagrams for four risk dimensions on held-out validation set (100 DGPs). Points clustered along the diagonal indicate well-calibrated predictions. Shaded regions denote 95% confidence intervals. A comparison with baselines across violation severity strata ( [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Cross-validation performance stability analysis comparing 5-fold, 10-fold, and bootstrap resampling schemes across AUROC, R2, and MAE for the four risk dimensions. Error bars denote ±1 standard deviation; dashed lines indicate target thresholds. used only for evaluation, shows modest degradation (AUROC 0.974 to 0.918), indicating that the learned calibration transfers imperfectly to unseen violation combi… view at source ↗
read the original abstract

Time-series causal discovery methods rely on assumptions such as stationarity, regular sampling, and bounded temporal dependence. When these assumptions are violated, structure learning can produce confident but misleading causal graphs without warning. We introduce Causal-Audit, a framework that formalizes assumption validation as calibrated risk assessment. The framework computes effect-size diagnostics across five assumption families (stationarity, irregularity, persistence, nonlinearity, and confounding proxies), aggregates them into four calibrated risk scores with uncertainty intervals, and applies an abstention-aware decision policy that recommends methods (e.g., PCMCI+, VAR-based Granger causality) only when evidence supports reliable inference. The semi-automatic diagnostic stage can also be used independently for structured assumption auditing in individual studies. Evaluation on a synthetic atlas of 500 data-generating processes (DGPs) spanning 10 violation families demonstrates well-calibrated risk scores (AUROC > 0.95), a 62% false positive reduction among recommended datasets, and 78% abstention on severe-violation cases. On 21 external evaluations from TimeGraph (18 categories) and CausalTime (3 domains), recommend-or-abstain decisions are consistent with benchmark specifications in all cases. An open-source implementation of our framework is available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Causal-Audit, a framework that formalizes assumption validation for time-series causal discovery methods (e.g., PCMCI+, VAR-based Granger causality) as calibrated risk assessment. It computes effect-size diagnostics across five assumption families (stationarity, irregularity, persistence, nonlinearity, confounding proxies), aggregates them into four risk scores with uncertainty intervals, and applies an abstention-aware policy that recommends methods only when evidence supports reliable inference. The semi-automatic diagnostics can be used independently. Evaluation on a synthetic atlas of 500 DGPs spanning 10 violation families reports AUROC > 0.95 for well-calibrated risk scores, 62% false-positive reduction among recommended datasets, and 78% abstention on severe-violation cases; decisions on 21 external TimeGraph/CausalTime cases are consistent with benchmark specifications. An open-source implementation is provided.

Significance. If the risk scores prove well-calibrated and generalizable, the framework would address a critical gap by providing structured diagnostics that prevent overconfident but misleading causal graphs from violated assumptions. The synthetic atlas evaluation is comprehensive in scale, the abstention policy is practically useful, and the open-source release enables reproducibility and extension. This could improve reliability in applied domains relying on time-series causal inference.

major comments (2)
  1. [§3] §3 (Methods): The manuscript does not provide the explicit formulas for the effect-size diagnostics, the precise aggregation rules that produce the four risk scores, or the derivation of the uncertainty intervals. Without these, it is impossible to verify that the reported AUROC > 0.95 reflects genuine calibration rather than construction within the synthetic atlas.
  2. [§5] §5 (Evaluation): All calibration metrics (AUROC > 0.95, 62% false-positive reduction, 78% abstention) are obtained exclusively on the synthetic atlas of 500 DGPs where the 10 violation families are explicitly parameterized. The external check on 21 TimeGraph/CausalTime cases only verifies consistency with benchmark specifications and does not test whether the risk scores remain calibrated on data whose violation structure lies outside the atlas families. This is load-bearing for the central claim of well-calibrated, generalizable risk assessment.
minor comments (2)
  1. [Abstract, §4] The abstract and §4 refer to 'four calibrated risk scores' without naming them or linking them to the five assumption families; a table or explicit mapping would improve clarity.
  2. [§3.2] Notation for the uncertainty intervals around risk scores is introduced without a dedicated definition or example computation; this should be added for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and outline the revisions we will incorporate.

read point-by-point responses
  1. Referee: [§3] §3 (Methods): The manuscript does not provide the explicit formulas for the effect-size diagnostics, the precise aggregation rules that produce the four risk scores, or the derivation of the uncertainty intervals. Without these, it is impossible to verify that the reported AUROC > 0.95 reflects genuine calibration rather than construction within the synthetic atlas.

    Authors: We agree that the original submission presented the diagnostics at a conceptual level without the full mathematical details. In the revised manuscript we will expand §3 to include: (i) the explicit formulas for each effect-size diagnostic across the five assumption families, (ii) the precise aggregation functions (including weights and normalization) that produce the four risk scores, and (iii) the bootstrap-based derivation of the uncertainty intervals. These additions will allow readers to reproduce and verify the calibration results independently of the synthetic atlas. revision: yes

  2. Referee: [§5] §5 (Evaluation): All calibration metrics (AUROC > 0.95, 62% false-positive reduction, 78% abstention) are obtained exclusively on the synthetic atlas of 500 DGPs where the 10 violation families are explicitly parameterized. The external check on 21 TimeGraph/CausalTime cases only verifies consistency with benchmark specifications and does not test whether the risk scores remain calibrated on data whose violation structure lies outside the atlas families. This is load-bearing for the central claim of well-calibrated, generalizable risk assessment.

    Authors: We acknowledge that the quantitative calibration metrics are derived from the synthetic atlas and that the 21 external cases provide only a consistency check rather than a full out-of-distribution calibration test, as ground-truth violation labels are unavailable for those benchmarks. In the revision we will add explicit discussion in §5 and a dedicated limitations paragraph clarifying the scope of the evaluation, the design rationale for the atlas (covering 10 parameterized families), and the need for future labeled real-world data to assess generalization beyond the atlas. We will qualify the generalizability claims accordingly while retaining the synthetic results as the primary calibration evidence. revision: partial

Circularity Check

0 steps flagged

No load-bearing circularity; risk scores derived independently and evaluated without reduction by construction

full rationale

The framework computes effect-size diagnostics across assumption families and aggregates them into four calibrated risk scores with uncertainty intervals, followed by an abstention policy. These quantities are evaluated on an author-generated synthetic atlas of 500 DGPs spanning 10 violation families, producing AUROC > 0.95, 62% false-positive reduction, and 78% abstention rates. No equation or step in the provided derivation shows the risk scores or calibration reducing to fitted inputs by construction within the same paper. The consistency check on 21 external TimeGraph/CausalTime cases supplies independent verification against benchmark specifications. This keeps the central claim self-contained with only minor evaluation dependence on constructed data, warranting a low circularity score of 2.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that the chosen five assumption families and four aggregated risk scores capture the main failure modes of existing causal discovery algorithms; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption The five assumption families (stationarity, irregularity, persistence, nonlinearity, confounding proxies) cover the primary violations relevant to time-series causal discovery.
    Framework design treats these families as the basis for all diagnostics.

pith-pipeline@v0.9.0 · 5525 in / 1210 out tokens · 35164 ms · 2026-05-13T20:56:30.240339+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages

  1. [1]

    Causality: Models, Reasoning, and Inference

    Pearl J. Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge, UK: Cambridge University Press; 2009

  2. [2]

    Causal inference for time series

    Runge J, Gerhardus A, Varando G, Eyring V, Camps-Valls G. Causal inference for time series. Nature Reviews Earth & Environment. 2023;4(7):487-505

  3. [3]

    Assessing the Significance of Directed and Multivariate Mea- sures of Linear Dependence Between Time Series

    Cliff OM, Novelli L, Fulcher BD, Shine JM, Lizier JT. Assessing the Significance of Directed and Multivariate Mea- sures of Linear Dependence Between Time Series. Physical Review Research. 2020;2(1):013006. 3https://github.com/marcoruizrueda/causal-audit 26 Marco Ruiz, Miguel Arana-Catania, David R. Ardila, and Rodrigo Ventura, Causal-Audit

  4. [4]

    Spurious Regressions in Econometrics

    Granger CWJ, Newbold P. Spurious Regressions in Econometrics. Journal of Econometrics. 1974;2(2):111-20

  5. [5]

    Elements of Causal Inference: Foundations and Learning Algorithms

    Peters J, Janzing D, Schölkopf B. Elements of Causal Inference: Foundations and Learning Algorithms. Cambridge, MA: MIT Press; 2017

  6. [6]

    Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets

    Runge J, Nowack P, Kretschmer M, Flaxman S, Sejdinovic D. Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets. Science Advances. 2019;5(11):eaau4996

  7. [7]

    A DCM for Resting State fMRI

    Friston KJ, Kahan J, Biswal B, Razi A. A DCM for Resting State fMRI. NeuroImage. 2014;94:396-407

  8. [8]

    Granger Causality: A Review and Recent Advances

    Shojaie A, Fox EB. Granger Causality: A Review and Recent Advances. Annual Review of Statistics and Its Applica- tion. 2022;9:289-319

  9. [9]

    Vector Autoregressions

    Stock JH, Watson MW. Vector Autoregressions. Journal of Economic Perspectives. 2001;15(4):101-15

  10. [10]

    causal-learn: Causal Discovery in Python

    Squires C, Yun T, Nichani E, Agrawal R, Uhler C. causal-learn: Causal Discovery in Python. Journal of Machine Learning Research. 2023;24(225):1-8. Available from: http://jmlr.org/papers/v24/23-0125.html

  11. [11]

    Statistical Decision Theory and Bayesian Analysis

    Berger JO. Statistical Decision Theory and Bayesian Analysis. 2nd ed. New York: Springer; 1985

  12. [12]

    Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets

    Runge J. Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. In: Peters J, Sontag D, editors. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI). vol. 124 of Proceedings of Machine Learning Research. PMLR; 2020. p. 1388-97. Available from: https://proceedings. mlr.press/...

  13. [13]

    Investigating causal relations by econometric models and cross-spectral methods

    Granger CWJ. Investigating causal relations by econometric models and cross-spectral methods. Econometrica. 1969;37(3):424-38

  14. [14]

    Review of Causal Discovery Methods Based on Graphical Models

    Glymour C, Zhang K, Spirtes P. Review of Causal Discovery Methods Based on Graphical Models. Frontiers in Genet- ics. 2019;Volume 10 - 2019

  15. [15]

    A Survey of Learning Causality with Data: Problems and Methods

    Guo R, Cheng L, Li J, Hahn PR, Liu H. A Survey of Learning Causality with Data: Problems and Methods. ACM Comput Surv. 2020 Jul;53(4). Available from: https://doi.org/10.1145/3397269

  16. [16]

    Causal Discovery from Temporal Data: An Overview and New Perspec- tives

    Gong C, Zhang C, Yao D, Bi J, Li W, Xu Y. Causal Discovery from Temporal Data: An Overview and New Perspec- tives. ACM Comput Surv. 2024 Dec;57(4)

  17. [17]

    Causation, Prediction, and Search

    Spirtes P, Glymour CN, Scheines R. Causation, Prediction, and Search. 2nd ed. Cambridge, MA: MIT Press; 2001

  18. [18]

    D’ya Like DAGs? A Survey on Structure Learning and Causal Discovery

    Vowels MJ, Camgoz NC, Bowden R. D’ya Like DAGs? A Survey on Structure Learning and Causal Discovery. ACM Comput Surv. 2022 Nov;55(4)

  19. [19]

    High-recall causal discovery for autocorrelated time series with latent confounders

    Gerhardus A, Runge J. High-recall causal discovery for autocorrelated time series with latent confounders. In: Ad- vances in Neural Information Processing Systems. vol. 33. Curran Associates, Inc.; 2020. p. 12615-25

  20. [20]

    DYNOTEARS: Structure Learning from Time-Series Data

    Pamfil R, Sriwattanaworachai N, Desai S, Pilgerstorfer P, Beaumont P, Georgatzis K, et al. DYNOTEARS: Structure Learning from Time-Series Data. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. vol. 108 of Proceedings of Machine Learning Research. PMLR; 2020. p. 1595-605. Available from: https: //proceedings.m...

  21. [21]

    DAGs with NO TEARS: Continuous optimization for structure learning

    Zheng X, Aragam B, Ravikumar P, Xing EP. DAGs with NO TEARS: Continuous optimization for structure learning. In: Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc.; 2018. p. 9472-83

  22. [22]

    Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-Gaussianity

    Hyvärinen A, Shimizu S, Hoyer PO. Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-Gaussianity. In: Proceedings of the 25th International Conference on Machine Learning. ICML ’08. Association for Computing Machinery; 2008. p. 424-31

  23. [23]

    Causal Discovery with Attention-Based Convolutional Neural Networks

    Nauta M, Bucur D, Seifert C. Causal Discovery with Attention-Based Convolutional Neural Networks. Machine Learning and Knowledge Extraction. 2019;1(1):312-40. Available from: https://www.mdpi.com/2504-4990/1/1/19

  24. [24]

    Measuring information transfer

    Schreiber T. Measuring information transfer. Physical Review Letters. 2000;85(2):461-4

  25. [25]

    Partial transfer entropy on rank vectors

    Kugiumtzis D. Partial transfer entropy on rank vectors. The European Physical Journal Special Topics. 2013;222(2):401-20

  26. [26]

    Discovering Temporal Causal Relations from Subsampled Data

    Gong M, Zhang K, Schölkopf B, Tao D, Geiger P. Discovering Temporal Causal Relations from Subsampled Data. In: Proceedings of the 32nd International Conference on Machine Learning (ICML). vol. 37 of Proceedings of Machine Learning Research. PMLR; 2015. p. 1898-906

  27. [27]

    Causal discovery from heteroge- neous/nonstationary data

    Huang B, Zhang K, Zhang J, Ramsey J, Sanchez-Romero R, Glymour C, et al. Causal discovery from heteroge- neous/nonstationary data. Journal of Machine Learning Research. 2020 Jan;21(1)

  28. [28]

    Causal Discovery from Condition- ally Stationary Time Series

    Balsells-Rodas C, Sumba X, Narendra T, Tu R, Schweikert G, Kjellström H, et al. Causal Discovery from Condition- ally Stationary Time Series. In: Forty-second International Conference on Machine Learning. vol. 267 of Proceed- ings of Machine Learning Research. PMLR; 2025. p. 2715-41. Available from: https://openreview.net/forum?id= j88QAtutwW

  29. [29]

    The Robustness of Differentiable Causal Discovery in Misspecified Scenarios

    Yi H, He Y, Chen D, Kang M, Wang H, Yu W. The Robustness of Differentiable Causal Discovery in Misspecified Scenarios. In: The Thirteenth International Conference on Learning Representations; 2025. p. 1-24. Available from: https://openreview.net/forum?id=iaP7yHRq1l

  30. [30]

    Scalable Causal Discovery with Score Matching

    Montagna F, Noceti N, Rosasco L, Zhang K, Locatello F. Scalable Causal Discovery with Score Matching. In: Ad- vances in Neural Information Processing Systems. vol. 36; 2023. p. 12640-54

  31. [31]

    Assumption violations in causal discovery and the robustness of score matching

    Montagna F, Mastakouri A, Eulig E, Noceti N, Rosasco L, Janzing D, et al. Assumption violations in causal discovery and the robustness of score matching. In: Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, editors. Advances in Neural Information Processing Systems. vol. 36. Curran Associates, Inc.; 2023. p. 47339-78. Marco Ruiz, Miguel Arana-Ca...

  32. [32]

    TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations

    Stein G, Penzel N, Piater T, Denzler J. TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations. In: The Fourteenth International Conference on Learning Representations; 2026. p. 1-31. Available from: https://openreview.net/forum?id=MtdrOCLAGY

  33. [33]

    Understanding Spurious Regressions in Econometrics

    Phillips PCB. Understanding Spurious Regressions in Econometrics. Journal of Econometrics. 1986;33(3):311-40

  34. [34]

    Comparison of correlation analysis techniques for irregularly sampled time series

    Rehfeld K, Marwan N, Heitzig J, Kurths J. Comparison of correlation analysis techniques for irregularly sampled time series. Nonlinear Processes in Geophysics. 2011;18(3):389-404

  35. [35]

    Statistical Analysis with Missing Data

    Little RJA, Rubin DB. Statistical Analysis with Missing Data. 3rd ed. John Wiley & Sons; 2019

  36. [36]

    Causal Inference: A Missing Data Perspective

    Ding P, Li F. Causal Inference: A Missing Data Perspective. Statistical Science. 2018;33(2):214-37

  37. [37]

    effective

    Bayley GV, Hammersley JM. The “effective” number of independent observations in an autocorrelated time series. Supplement to the Journal of the Royal Statistical Society. 1946;8(2):184-97

  38. [38]

    The interpretation and estimation of effective sample size

    Thiébaux HJ, Zwiers FW. The interpretation and estimation of effective sample size. Journal of Climate and Applied Meteorology. 1984;23(5):800-11

  39. [39]

    The hardness of conditional independence testing and the generalised covariance measure

    Shah RD, Peters J. The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics. 2020;48(3):1514 1538

  40. [40]

    Ridge Regression: Biased Estimation for Nonorthogonal Problems

    Hoerl AE, Kennard RW. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 1970;12(1):55-67

  41. [41]

    A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity

    White H. A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica. 1980;48(4):817-38

  42. [42]

    Distribution of the Estimators for Autoregressive Time Series with a Unit Root

    Dickey DA, Fuller WA. Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association. 1979;74(366):427-31

  43. [43]

    Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root

    Kwiatkowski D, Phillips PCB, Schmidt P, Shin Y. Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root. Journal of Econometrics. 1992;54(1–3):159-78

  44. [44]

    Computation and Analysis of Multiple Structural Change Models

    Bai J, Perron P. Computation and Analysis of Multiple Structural Change Models. Journal of Applied Econometrics. 2003;18(1):1-22

  45. [45]

    A Test of Missing Completely at Random for Multivariate Data with Missing Values

    Little RJA. A Test of Missing Completely at Random for Multivariate Data with Missing Values. Journal of the American Statistical Association. 1988;83(404):1198-202

  46. [46]

    On a Measure of Lack of Fit in Time Series Models

    Ljung GM, Box GEP. On a Measure of Lack of Fit in Time Series Models. Biometrika. 1978;65(2):297-303

  47. [47]

    Learning Bayesian networks: The combination of knowledge and statistical data

    Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning. 1995;20(3):197-243

  48. [48]

    Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks

    Friedman N, Koller D. Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning. 2003;50(1):95-125

  49. [49]

    A Million Variables and More: The Fast Greedy Equivalence Search Algorithm for Learning High-Dimensional Graphical Causal Models

    Ramsey JD, Glymour M, Sanchez-Romero R, Glymour C. A Million Variables and More: The Fast Greedy Equivalence Search Algorithm for Learning High-Dimensional Graphical Causal Models. International Journal of Data Science and Analytics. 2018;3(2):121-9

  50. [50]

    Bootstrap Aggregation and Confidence Measures to Improve Time Series Causal Discovery

    Debeire K, Gerhardus A, Runge J, Eyring V. Bootstrap Aggregation and Confidence Measures to Improve Time Series Causal Discovery. In: Proceedings of the Third Conference on Causal Learning and Reasoning. vol. 236 of Proceed- ings of Machine Learning Research. PMLR; 2024. p. 979-1007. Available from: https://proceedings.mlr.press/v236/ debeire24a.html

  51. [51]

    Transforming Classifier Scores into Accurate Multiclass Probability Estimates

    Zadrozny B, Elkan C. Transforming Classifier Scores into Accurate Multiclass Probability Estimates. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002. p. 694-9

  52. [52]

    Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression

    Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression. John Wiley & Sons; 1972

  53. [53]

    Classifier Calibration with ROC-Regularized Isotonic Regression

    Berta E, Bach F, Jordan MI. Classifier Calibration with ROC-Regularized Isotonic Regression. In: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics. vol. 238 of Proceedings of Machine Learning Research. PMLR; 2024. p. 1972-80. Available from: https://proceedings.mlr.press/v238/berta24a.html

  54. [54]

    The Well-Calibrated Bayesian

    Dawid AP. The Well-Calibrated Bayesian. Journal of the American Statistical Association. 1982;77(379):605-10

  55. [55]

    The Comparison and Evaluation of Forecasters

    DeGroot MH, Fienberg SE. The Comparison and Evaluation of Forecasters. The Statistician. 1983;32(1–2):12-22

  56. [56]

    On Optimum Recognition Error and Reject Tradeoff

    Chow CK. On Optimum Recognition Error and Reject Tradeoff. IEEE Transactions on Information Theory. 1970;16(1):41-6

  57. [57]

    On the Foundations of Noise-Free Selective Classification

    El-Yaniv R, Wiener Y. On the Foundations of Noise-Free Selective Classification. Journal of Machine Learning Re- search. 2010;11:1605-41

  58. [58]

    Time Series Analysis

    Hamilton JD. Time Series Analysis. Princeton, NJ: Princeton University Press; 1994

  59. [59]

    Theory of Games and Economic Behavior

    Von Neumann J, Morgenstern O. Theory of Games and Economic Behavior. Princeton, NJ: Princeton University Press; 1944

  60. [60]

    The foundations of statistics

    Savage LJ. The foundations of statistics. New York: John Wiley & Sons; 1954

  61. [61]

    Decision Theory: Principles and Approaches

    Parmigiani G, Inoue LYT. Decision Theory: Principles and Approaches. Chichester, UK: John Wiley & Sons; 2009

  62. [62]

    Selective classification for deep neural networks

    Geifman Y, El-Yaniv R. Selective classification for deep neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.; 2017. p. 4885–4894. 28 Marco Ruiz, Miguel Arana-Catania, David R. Ardila, and Rodrigo Ventura, Causal-Audit

  63. [63]

    On Calibration of Modern Neural Networks

    Guo C, Pleiss G, Sun Y, Weinberger KQ. On Calibration of Modern Neural Networks. In: Proceedings of the 34th International Conference on Machine Learning (ICML). vol. 70. PMLR; 2017. p. 1321-30

  64. [64]

    Predicting Good Probabilities with Supervised Learning

    Niculescu-Mizil A, Caruana R. Predicting Good Probabilities with Supervised Learning. In: Proceedings of the 22nd International Conference on Machine Learning (ICML); 2005. p. 625-32

  65. [65]

    The Control of the False Discovery Rate in Multiple Testing Under Dependency

    Benjamini Y, Yekutieli D. The Control of the False Discovery Rate in Multiple Testing Under Dependency. Annals of Statistics. 2001;29(4):1165-88

  66. [66]

    The Pivot Algorithm: A Highly Efficient Monte Carlo Method for the Self-Avoiding Walk

    Madras N, Sokal AD. The Pivot Algorithm: A Highly Efficient Monte Carlo Method for the Self-Avoiding Walk. Journal of Statistical Physics. 1988;50(1):109-86

  67. [67]

    Statistical Power Analysis for the Behavioral Sciences

    Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Lawrence Erlbaum Associates; 1988

  68. [68]

    Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation

    Marquardt DW. Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technomet- rics. 1970;12(3):591-612

  69. [69]

    Tests of Equality Between Sets of Coefficients in Two Linear Regressions

    Chow GC. Tests of Equality Between Sets of Coefficients in Two Linear Regressions. Econometrica. 1960;28(3):591- 605

  70. [70]

    A caution regarding rules of thumb for variance inflation factors

    O’Brien RM. A caution regarding rules of thumb for variance inflation factors. Quality & Quantity. 2007;41(5):673- 90

  71. [71]

    A Redefined Variance Inflation Factor: Overcoming the Limitations of the Variance Inflation Factor

    Salmerón R, García C, García J. A Redefined Variance Inflation Factor: Overcoming the Limitations of the Variance Inflation Factor. Computational Economics. 2025;65(1):337-63

  72. [72]

    Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods

    Platt J. Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In: Advances in Large Margin Classifiers. MIT Press; 1999. p. 61-74

  73. [73]

    An Introduction to the Bootstrap

    Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman & Hall/CRC; 1993

  74. [74]

    The Stationary Bootstrap

    Politis DN, Romano JP. The Stationary Bootstrap. Journal of the American Statistical Association. 1994;89(428):1303-13

  75. [75]

    A Unified Approach to Interpreting Model Predictions

    Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. In: Advances in Neural Information Processing Systems. vol. 30; 2017. p. 4765-74

  76. [76]

    New Introduction to Multiple Time Series Analysis

    Lütkepohl H. New Introduction to Multiple Time Series Analysis. Springer; 2005

  77. [77]

    Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data

    Lawrence AR, Kaiser M, Sampaio R, Sipos M. Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data. In: Causal Discovery & Causality-Inspired Machine Learning Workshop at Neural Information Processing Systems; 2021. p. 1-26

  78. [78]

    TimeGraph: Synthetic Benchmark Datasets for Robust Time-Series Causal Dis- covery

    Ferdous MH, Hossain E, Gani MO. TimeGraph: Synthetic Benchmark Datasets for Robust Time-Series Causal Dis- covery. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’25). Toronto, ON, Canada: ACM; 2025. p. 1-11

  79. [79]

    CausalTime: Realistically Generated Time-series for Benchmark- ing of Causal Discovery

    Cheng Y, Wang Z, Xiao T, Zhong Q, Suo J, He K. CausalTime: Realistically Generated Time-series for Benchmark- ing of Causal Discovery. In: Proceedings of the Twelfth International Conference on Learning Representations; 2024. p. 1-22. Available from: https://openreview.net/forum?id=iad1yyyGme

  80. [80]

    Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

    Hendrycks D, Dietterich T. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. In: Proceedings of the International Conference on Learning Representations (ICLR); 2019. p. 1-15

Showing first 80 references.