PRCD-MAP: Learning How Much to Trust Imperfect Priors in Causal Discovery

Da Zhou; Xihang Shan

arxiv: 2605.01669 · v2 · submitted 2026-05-03 · 📊 stat.ML · cs.LG· stat.ME

PRCD-MAP: Learning How Much to Trust Imperfect Priors in Causal Discovery

Xihang Shan , Da Zhou This is my paper

Pith reviewed 2026-05-09 17:26 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords causal discoveryimperfect priorstrust calibrationempirical BayesMAP estimationregularizationLLM priorspopulation safety guarantee

0 comments

The pith

PRCD-MAP learns per-edge trust for imperfect priors and modulates the MAP objective to achieve bounded safety in causal discovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops PRCD-MAP to resolve the trade-off between blindly trusting or rejecting priors of unknown quality in causal discovery. It introduces a soft prior-consumption layer that assigns a trust score to each edge in the prior, calibrates those scores through empirical Bayes using a Laplace approximation to the marginal likelihood, and propagates them along the prior graph with an MLP so that data-supported neighborhoods raise trust while contradictions lower it. The trust scores then scale a prior-aware L1 term and a prior-weighted L2 term inside the MAP objective. The central result is a population-level guarantee that the procedure is epsilon-safe in expectation over the prior-generation distribution, with the error bound scaling as C times accuracy times one minus accuracy times d squared over T and vanishing at both perfect and useless prior endpoints. When the prior carries no information the learned trust collapses to its floor value and the method reverts to a standard no-prior baseline.

Core claim

PRCD-MAP is a soft prior-consumption layer for causal discovery that assigns per-edge trust to an imperfect prior. Trust is calibrated by empirical Bayes on a Laplace-approximated marginal likelihood and propagated along the prior graph by an MLP. The resulting trust values modulate a prior-aware ell_1 and prior-weighted ell_2 regularizer inside a MAP objective. The method is epsilon-safe in expectation over the prior-generation distribution, with epsilon bounded by C times acc times (1-acc) times d squared over T at the parametric rate and vanishing at the prior-quality endpoints. When the prior is uninformative, learned trust collapses to its floor and the method recovers a no-prior causal

What carries the argument

The per-edge trust variable, calibrated by empirical Bayes on the Laplace-approximated marginal likelihood and propagated by an MLP, that scales the prior-aware L1 and prior-weighted L2 regularizers inside the MAP objective.

If this is right

On real CausalTime datasets the method extracts positive AUROC gains from informative LLM priors while attenuating trust on anonymous-variable stress tests.
It maintains a performance lead at dimension d=300 and under a matched W0-only protocol against the closest soft-Bayesian baseline.
A four-way ablation shows that empirical Bayes calibration together with MLP propagation account for the majority of the gain across datasets.
The calibrated-trust construction extends directly to nonlinear NAM models and to cross-sectional settings without change in principle.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same per-edge trust calibration could be inserted into other regularized causal estimators that already use MAP or penalized likelihood formulations.
Because the safety bound depends only on prior accuracy and sample size, the method supplies a concrete criterion for deciding whether to collect more data or to improve the prior source before running discovery.
The MLP propagation step suggests that trust can be learned jointly with the graph structure in an end-to-end pipeline rather than in a separate post-processing stage.

Load-bearing premise

The prior-generation distribution exists and is sufficiently regular that the expectation of the safety bound can be taken, and the Laplace approximation remains accurate enough for empirical Bayes calibration of trust.

What would settle it

In controlled simulations where prior accuracy is known and fixed, check whether the learned trust values remain bounded away from the floor when the prior is uninformative or whether the observed error exceeds the claimed C times acc times (1-acc) times d squared over T scaling.

Figures

Figures reproduced from arXiv: 2605.01669 by Da Zhou, Xihang Shan.

**Figure 1.** Figure 1: PRCD-MAP causal strength heatmap on the electricity consumption dataset (mean view at source ↗

**Figure 2.** Figure 2: Wall-clock runtime vs. number of variables view at source ↗

**Figure 3.** Figure 3: AUROC under linear vs. nonlinear data-generating mechanisms. PRCD-MAP is stable view at source ↗

**Figure 4.** Figure 4: AUROC vs. number of variables (T=500, acc=0.6). PRCD-MAP’s advantage widens as d increases. {0.001, 0.005, 0.01, 0.05, 0.1}, yielding 30 configurations (d=20, T=500, acc=0.6, Laplace noise, 5 seeds) view at source ↗

**Figure 5.** Figure 5: F1 sensitivity to (λ1, λ2). A broad region of near-optimal performance exists for λ1 ≤ 0.005. 0.001 0.005 0.010 0.050 0.100 2 0.001 0.001 0.003 0.005 0.010 0.050 1 0.99 0.75 0.70 0.95 0.26 1.15 1.11 1.09 0.55 0.42 1.14 1.10 1.09 0.92 0.77 1.18 1.17 1.16 1.04 0.87 1.39 1.38 1.37 1.26 1.10 1.03 1.52 1.52 0.99 0.05 Learned vs ( 1, 2) 0.2 0.4 0.6 0.8 1.0 1.2 1.4 view at source ↗

**Figure 7.** Figure 7: Convergence analysis of PRCD-MAP. Top-left: DAG constraint view at source ↗

**Figure 8.** Figure 8: plots the learned τ (left axis, red) against prior accuracy alongside F1 for three variants (right axis). Three regimes are visible: • Low accuracy (acc ≤ 0.3): τ stays near τmin, effectively mapping the calibrated prior Pb toward 0.5 (Eq. 7). In this regime, the precision mask Ω becomes approximately uniform, and PRCD-MAP reduces to a standard ℓ1 + ℓ2 regularized SVAR. Correspondingly, the learned-τ F1 ma… view at source ↗

**Figure 9.** Figure 9: τ trajectory during ALM training (d=20, T=500). Higher prior accuracy leads to faster and larger τ growth. All trajectories converge well before the final ALM iterations. N Structure-Aware Trust Propagation: Validation 50 100 200 500 Sample Size (T) 0.5 0.6 0.7 0.8 0.9 AUROC +0.16 Prior Knowledge Compensates for Limited Data PRCD-MAP (acc=0.4) PRCD-MAP (acc=0.6) PRCD-MAP (acc=0.9) PCMCI+ DYNOTEARS view at source ↗

**Figure 10.** Figure 10: Prior knowledge compensates for limited data: AUROC vs. sample size view at source ↗

read the original abstract

External priors of unknown reliability create a brittle trade-off in causal discovery: blind trust amplifies errors, blind rejection wastes signal. Real priors are also heterogeneously reliable -- physical laws are trustworthy, LLM-suggested edges are speculative -- yet existing methods either ignore priors or impose them through globally uniform trust. We propose PRCD-MAP, a soft prior-consumption layer that assigns per-edge trust to an imperfect prior and uses it to modulate a prior-aware $\ell_1$ and prior-weighted $\ell_2$ regularizer in a MAP objective. Trust is calibrated by empirical Bayes on a Laplace-approximated marginal likelihood and propagated along the prior graph by an MLP, so data-confirmed neighborhoods boost trust and contradictions suppress it. PRCD-MAP enjoys a population-level safety guarantee: it is $\varepsilon$-safe in expectation over the prior-generation distribution, with $\varepsilon\leq C\cdot\mathrm{acc}(1{-}\mathrm{acc})\cdot d^2/T$ at the parametric $T^{-1}$ rate and vanishing at the prior-quality endpoints. When the prior is uninformative, learned trust provably collapses to its floor and the method recovers a no-prior baseline. Empirically, on real CausalTime data PRCD-MAP exploits informative LLM priors (LLM-prior gain $+0.067/+0.089$ AUROC on AQI/Medical over a no-prior PRCD-MAP backbone; combined backbone+prior lead $+0.123/+0.043$ over PCMCI+), auto-attenuates on the anonymous-variable Traffic stress test, and retains a lead at $d{=}300$; against BayesDAG, the closest soft-Bayesian baseline, PRCD-MAP wins on every CausalTime dataset under a matched $W_0$-only protocol. A four-way ablation isolates each component: EB calibration and MLP trust propagation jointly carry the plurality of the gain, with positive sign on every dataset. Extensions to nonlinear (NAM) and cross-sectional settings show the calibrated-trust principle is setting-agnostic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PRCD-MAP learns per-edge trust via empirical Bayes and MLP propagation but its safety claim is only an expectation over prior generation, not a bound for fixed real priors.

read the letter

The paper introduces a soft way to consume imperfect priors in causal discovery by assigning and learning per-edge trust values instead of using uniform trust or hard constraints. Trust gets set through empirical Bayes on a Laplace-approximated marginal likelihood, then spread along the prior graph with an MLP so that data-supported parts gain weight and contradictions lose it. The main theoretical claim is a population-level ε-safety guarantee that shrinks at rate 1/T and vanishes when prior accuracy is zero or one. When the prior carries no signal, trust drops to its floor and the method falls back to a no-prior baseline. Empirically it reports AUROC lifts on CausalTime datasets when LLM priors are added, stays competitive at d=300, and shows positive ablation results for the calibration and propagation steps. Those gains look real on the reported benchmarks and the four-way ablation isolates the pieces cleanly. The soft spot is the safety guarantee itself. It is stated as an expectation over the distribution that generates the priors, not a high-probability or instance-wise bound for any single fixed prior a user actually supplies. In practice most users bring one concrete prior from an LLM or expert, so the expectation does not directly protect against a bad draw on that instance. The Laplace approximation for the marginal likelihood is also taken as given without much discussion of when it holds. The manuscript is aimed at causal discovery researchers who already work with mixed-quality external knowledge and want something between blind trust and total rejection. Readers who care about soft prior integration and per-edge calibration will get the most from it. The work has enough novelty in the mechanism and enough empirical backing to deserve a serious referee, even though the guarantee needs tighter scrutiny on its applicability to fixed priors.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes PRCD-MAP, a soft prior-consumption method for causal discovery that learns per-edge trust values for an imperfect prior. Trust is calibrated via empirical Bayes on a Laplace-approximated marginal likelihood and propagated by an MLP along the prior graph. These trusts modulate a prior-aware ℓ1 regularizer and a prior-weighted ℓ2 regularizer inside a MAP objective. The central theoretical claim is a population-level ε-safety guarantee: the method is ε-safe in expectation over the prior-generation distribution, with ε ≤ C·acc(1−acc)·d²/T at the parametric rate and vanishing at the acc=0 and acc=1 endpoints; when the prior is uninformative, trust collapses to its floor and the method recovers the no-prior baseline. Empirically, PRCD-MAP yields AUROC gains on CausalTime datasets with LLM priors, auto-attenuates on the anonymous-variable Traffic stress test, outperforms PCMCI+ and BayesDAG under matched protocols, and shows positive gains from EB calibration plus MLP propagation in a four-way ablation; extensions to nonlinear NAM and cross-sectional settings are also reported.

Significance. If the population-level guarantee and the empirical gains are verified, the work addresses a practically important gap in causal discovery by providing a data-driven mechanism to modulate trust in heterogeneous priors without global uniformity. The explicit collapse to baseline when the prior is uninformative, the ablation isolating the contribution of empirical Bayes and MLP propagation, and the extension to nonlinear and cross-sectional regimes are concrete strengths. The parametric-rate bound, even if only in expectation, offers a theoretical anchor that is rarer in applied causal-discovery papers.

major comments (3)

[§3.3] §3.3 (population-level safety guarantee) and the associated derivation in Appendix B: the ε bound is stated only in expectation over the prior-generation distribution. No concentration inequality or high-probability version is supplied for a fixed, realized prior (the practical case, e.g., a single LLM-suggested edge set). Consequently the instance-wise safety claim for the priors actually encountered by users is not established.
[Eq. (5)–(6)] Eq. (5)–(6) (prior-aware ℓ1 and prior-weighted ℓ2 regularizers): the precise functional form by which the learned per-edge trust modulates the penalties is described only qualitatively in the main text. Without the explicit expressions it is impossible to verify that the MAP objective remains well-behaved or that the claimed collapse to the no-prior baseline follows directly from the trust floor.
[§4.1, Table 2] §4.1 and Table 2 (ablation on CausalTime): the four-way ablation isolates EB calibration and MLP propagation as jointly responsible for the plurality of the gain, yet the exact numerical values of the trust parameters before and after calibration, and the precise definition of the “no-prior backbone” baseline, are not reported. This prevents independent verification that the observed AUROC deltas are attributable to the claimed mechanisms rather than to hyper-parameter tuning.

minor comments (2)

[Figure 3] Figure 3 (Traffic stress-test panel): the x-axis label “anonymous-variable fraction” should be accompanied by a brief parenthetical definition of how the anonymous variables are sampled and inserted into the prior graph.
[§2.2] §2.2 (related work): the discussion of soft-Bayesian baselines omits explicit comparison to recent empirical-Bayes approaches in structure learning; adding one or two sentences would clarify the incremental contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the practical relevance of modulating trust in heterogeneous priors. We respond point-by-point to the major comments below, proposing targeted revisions to enhance clarity and verifiability while preserving the manuscript's stated claims.

read point-by-point responses

Referee: [§3.3] §3.3 (population-level safety guarantee) and the associated derivation in Appendix B: the ε bound is stated only in expectation over the prior-generation distribution. No concentration inequality or high-probability version is supplied for a fixed, realized prior (the practical case, e.g., a single LLM-suggested edge set). Consequently the instance-wise safety claim for the priors actually encountered by users is not established.

Authors: The ε-safety guarantee is formulated explicitly as a population-level result in expectation over the prior-generation distribution, consistent with the theoretical setup in which prior accuracy acc characterizes that distribution. The manuscript does not claim an instance-wise guarantee for arbitrary fixed priors; deriving high-probability bounds for a single realized prior would require additional concentration analysis that accounts for graph-induced dependencies and is outside the current scope. The expectation bound nevertheless establishes the key properties (vanishing at acc extremes and parametric rate). We will add a clarifying sentence in §3.3 to emphasize the population-level nature and avoid misinterpretation. revision: partial
Referee: [Eq. (5)–(6)] Eq. (5)–(6) (prior-aware ℓ1 and prior-weighted ℓ2 regularizers): the precise functional form by which the learned per-edge trust modulates the penalties is described only qualitatively in the main text. Without the explicit expressions it is impossible to verify that the MAP objective remains well-behaved or that the claimed collapse to the no-prior baseline follows directly from the trust floor.

Authors: We agree that the explicit functional forms are needed for full verification. In the revised manuscript we will insert the precise expressions for the prior-aware ℓ1 regularizer and the prior-weighted ℓ2 regularizer directly into the main text adjacent to Eqs. (5)–(6), together with a brief argument confirming convexity of the MAP objective and the direct recovery of the no-prior baseline when trust reaches its floor value. revision: yes
Referee: [§4.1, Table 2] §4.1 and Table 2 (ablation on CausalTime): the four-way ablation isolates EB calibration and MLP propagation as jointly responsible for the plurality of the gain, yet the exact numerical values of the trust parameters before and after calibration, and the precise definition of the “no-prior backbone” baseline, are not reported. This prevents independent verification that the observed AUROC deltas are attributable to the claimed mechanisms rather than to hyper-parameter tuning.

Authors: We will revise §4.1 and Table 2 to report the exact numerical trust-parameter values before and after empirical Bayes calibration for each dataset, and to state explicitly that the no-prior backbone is the PRCD-MAP estimator with all trust parameters fixed at their lower bound (i.e., equivalent to standard ℓ1-regularized causal discovery without prior information). These additions will enable independent verification that the AUROC gains arise from the EB calibration and MLP propagation steps. revision: yes

Circularity Check

0 steps flagged

No significant circularity; safety guarantee and baseline recovery are derived properties

full rationale

The central claims—a population-level ε-safety bound in expectation over the prior-generation distribution (with the stated parametric rate and vanishing at acc=0/1) and provable collapse of trust to floor when the prior is uninformative—are presented as theoretical derivations from the method's structure rather than tautological re-statements of fitted values or self-citations. The empirical Bayes calibration step uses data to set per-edge trust but does not make the expectation-based guarantee equivalent to its inputs by construction; the bound holds over the distribution of priors, independent of any single fitted instance. No self-definitional loops, fitted-input-called-predictions, or load-bearing self-citations appear in the derivation chain. The result is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the Laplace approximation being accurate for the marginal likelihood used in empirical Bayes, the existence of a well-defined prior-generation distribution for the safety expectation, and the assumption that the prior graph structure is suitable for MLP-based trust propagation.

axioms (2)

domain assumption Laplace approximation accurately represents the marginal likelihood for empirical Bayes trust calibration
Invoked to enable per-edge trust estimation from data
domain assumption A prior-generation distribution exists over which the ε-safety expectation can be taken
Required for the population-level safety guarantee

pith-pipeline@v0.9.0 · 5692 in / 1606 out tokens · 33306 ms · 2026-05-09T17:26:56.782435+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages

[1]

Advances in Neural Information Processing Systems , volume=

BayesDAG: Gradient-based posterior inference for causal discovery , author=. Advances in Neural Information Processing Systems , volume=

work page
[2]

Journal of Artificial Intelligence Research , volume=

Survey and evaluation of causal discovery methods for time series , author=. Journal of Artificial Intelligence Research , volume=

work page
[3]

Advances in Neural Information Processing Systems , volume=

DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization , author=. Advances in Neural Information Processing Systems , volume=

work page
[4]

1982 , publisher=

Constrained Optimization and Lagrange Multiplier Methods , author=. 1982 , publisher=

work page 1982
[5]

Advances in Neural Information Processing Systems , volume=

Differentiable causal discovery from interventional data , author=. Advances in Neural Information Processing Systems , volume=

work page
[6]

International Conference on Learning Representations (ICLR) , year=

DyCAST: Learning Dynamic Causal Structure from Time Series , author=. International Conference on Learning Representations (ICLR) , year=

work page
[7]

International Conference on Learning Representations (ICLR) , year=

CausalTime: Realistically Generated Time-Series for Benchmarking of Causal Discovery , author=. International Conference on Learning Representations (ICLR) , year=

work page
[8]

International Conference on Learning Representations (ICLR) , year=

CUTS: Neural causal discovery from irregular time-series data , author=. International Conference on Learning Representations (ICLR) , year=

work page
[9]

Learning from data: Artificial intelligence and statistics V , pages=

Learning Bayesian networks is NP-complete , author=. Learning from data: Artificial intelligence and statistics V , pages=. 1996 , publisher=

work page 1996
[10]

Journal of Machine Learning Research , volume=

Optimal structure identification with greedy search , author=. Journal of Machine Learning Research , volume=

work page
[11]

Knowledge and Information Systems , volume=

The impact of prior knowledge on causal structure learning , author=. Knowledge and Information Systems , volume=

work page
[12]

Frontiers in Genetics , volume=

Review of causal discovery methods based on graphical models , author=. Frontiers in Genetics , volume=

work page
[13]

International Conference on Learning Representations (ICLR) , year=

Rhino: Deep causal temporal relationship learning with history-dependent noise , author=. International Conference on Learning Representations (ICLR) , year=

work page
[14]

Econometrica: journal of the Econometric Society , pages=

Investigating causal relations by econometric models and cross-spectral methods , author=. Econometrica: journal of the Econometric Society , pages=

work page
[15]

International Conference on Machine Learning (ICML) , pages=

On calibration of modern neural networks , author=. International Conference on Machine Learning (ICML) , pages=. 2017 , organization=

work page 2017
[16]

Machine Learning , volume=

Learning Bayesian networks: The combination of knowledge and statistical data , author=. Machine Learning , volume=

work page
[17]

Journal of Machine Learning Research , volume=

Estimation of a structural vector autoregression model using non-Gaussianity , author=. Journal of Machine Learning Research , volume=

work page
[18]

Neural Processing Letters , volume=

Unsuitability of NOTEARS for causal graph discovery when dealing with dimensional quantities , author=. Neural Processing Letters , volume=

work page
[19]

Causal reasoning and large language models: Opening a new frontier for causality

Causal reasoning and large language models: Opening a new frontier for causality , author=. arXiv preprint arXiv:2305.00050 , year=

work page arXiv
[20]

International Conference on Learning Representations (ICLR) , year=

Adam: A method for stochastic optimization , author=. International Conference on Learning Representations (ICLR) , year=

work page
[21]

International Conference on Machine Learning (ICML) , pages=

CITRIS: Causal identifiability from temporal intervened sequences , author=. International Conference on Machine Learning (ICML) , pages=. 2022 , organization=

work page 2022
[22]

Proceedings of the Seminar on Predictability, ECMWF , volume=

Predictability: A problem partly solved , author=. Proceedings of the Seminar on Predictability, ECMWF , volume=

work page
[23]

Conference on Causal Learning and Reasoning (CLeaR) , pages=

Amortized causal discovery: Learning to infer causal graphs from time-series data , author=. Conference on Causal Learning and Reasoning (CLeaR) , pages=. 2022 , organization=

work page 2022
[24]

2005 , publisher=

New introduction to multiple time series analysis , author=. 2005 , publisher=

work page 2005
[25]

Conference on Uncertainty in Artificial Intelligence (UAI) , pages=

Causal inference and causal explanation with background knowledge , author=. Conference on Uncertainty in Artificial Intelligence (UAI) , pages=

work page
[26]

Machine Learning and Knowledge Extraction , volume=

Causal discovery with attention-based convolutional neural networks , author=. Machine Learning and Knowledge Extraction , volume=

work page
[27]

Advances in Neural Information Processing Systems , volume=

On the role of sparsity and DAG constraints for learning linear DAGs , author=. Advances in Neural Information Processing Systems , volume=

work page
[28]

International Conference on Artificial Intelligence and Statistics (AISTATS) , pages=

DYNOTEARS: Structure learning from time-series data , author=. International Conference on Artificial Intelligence and Statistics (AISTATS) , pages=. 2020 , organization=

work page 2020
[29]

2017 , publisher=

Elements of causal inference: foundations and learning algorithms , author=. 2017 , publisher=

work page 2017
[30]

Advances in Neural Information Processing Systems , volume=

Beware of the simulated DAG! Causal discovery benchmarks may be easy to game , author=. Advances in Neural Information Processing Systems , volume=

work page
[31]

Conference on Uncertainty in Artificial Intelligence (UAI) , pages=

Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets , author=. Conference on Uncertainty in Artificial Intelligence (UAI) , pages=. 2020 , organization=

work page 2020
[32]

Nature Reviews Earth & Environment , volume=

Causal inference for time series , author=. Nature Reviews Earth & Environment , volume=

work page
[33]

Progress in Artificial Intelligence , volume=

A survey on Bayesian network structure learning from data , author=. Progress in Artificial Intelligence , volume=

work page
[34]

Toward causal representation learning , author=. Proc. IEEE , volume=

work page
[35]

Econometrica , volume=

Macroeconomics and reality , author=. Econometrica , volume=

work page
[36]

2000 , publisher=

Causation, prediction, and search , author=. 2000 , publisher=

work page 2000
[37]

International Conference on Artificial Intelligence and Statistics (AISTATS) , pages=

NTS-NOTEARS: Learning nonparametric DBN structure from time-series data , author=. International Conference on Artificial Intelligence and Statistics (AISTATS) , pages=. 2023 , organization=

work page 2023
[38]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Neural Granger causality , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page
[39]

The Annals of Statistics , volume=

Geometry of the faithfulness assumption in causal inference , author=. The Annals of Statistics , volume=

work page
[40]

D'ya like DAGs? A survey on structure learning and causal discovery , author=. Comput. Surveys , volume=

work page
[41]

International Conference on Machine Learning (ICML) , pages=

DAGs with No Curl: An efficient DAG structure learning approach , author=. International Conference on Machine Learning (ICML) , pages=. 2020 , organization=

work page 2020
[42]

Advances in Neural Information Processing Systems , volume=

Temporally disentangled representation learning under unknown nonstationarity , author=. Advances in Neural Information Processing Systems , volume=

work page
[43]

International Conference on Machine Learning (ICML) , pages=

DAG-GNN: DAG structure learning with graph neural networks , author=. International Conference on Machine Learning (ICML) , pages=. 2019 , organization=

work page 2019
[44]

Advances in Neural Information Processing Systems , volume=

DAGs with NO TEARS: Continuous optimization for structure learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[45]

The validity of posterior expansions based on

Kass, Robert E and Tierney, Luke and Kadane, Joseph B , booktitle=. The validity of posterior expansions based on. 1990 , publisher=

work page 1990
[46]

Kleijn, Bas JK and van der Vaart, Aad W , journal=. The

work page
[47]

Annals of Statistics , volume=

Convergence rates of posterior distributions , author=. Annals of Statistics , volume=

work page
[48]

IEEE Transactions on Information Theory , volume=

Minimax rates of estimation for high-dimensional linear regression over _q -balls , author=. IEEE Transactions on Information Theory , volume=

work page
[49]

An empirical

Robbins, Herbert , booktitle=. An empirical

work page
[50]

Large-Scale Inference: Empirical

Efron, Bradley , year=. Large-Scale Inference: Empirical

work page
[51]

Compound decision theory and empirical

Zhang, Cun-Hui , journal=. Compound decision theory and empirical

work page
[52]

Journal of the American Statistical Association , volume=

The adaptive lasso and its oracle properties , author=. Journal of the American Statistical Association , volume=

work page
[53]

Journal of the American Statistical Association , volume=

Variable selection via nonconcave penalized likelihood and its oracle properties , author=. Journal of the American Statistical Association , volume=

work page
[54]

From query tools to causal architects: Harnessing large language models for advanced causal discovery from data.arXiv preprint arXiv:2306.16902, 2023

From query tools to causal architects: Harnessing large language models for advanced causal discovery from data , author=. arXiv preprint arXiv:2306.16902 , year=

work page arXiv
[55]

Advances in Neural Information Processing Systems , volume=

Differentiable Constraint-Based Causal Discovery , author=. Advances in Neural Information Processing Systems , volume=

work page
[56]

International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

Waxman, Daniel and Butler, Kurt and Djuri. International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

work page
[57]

Advances in Neural Information Processing Systems , volume=

Differentiable Structure Learning with Partial Orders , author=. Advances in Neural Information Processing Systems , volume=

work page
[58]

Advances in Neural Information Processing Systems , volume=

Marrying Causal Representation Learning with Dynamical Systems for Science , author=. Advances in Neural Information Processing Systems , volume=

work page
[59]

Advances in Neural Information Processing Systems , volume=

Amortized Inference for Causal Structure Learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[60]

ACM Computing Surveys , volume=

Causal Discovery from Temporal Data: An Overview and New Perspectives , author=. ACM Computing Surveys , volume=

work page
[61]

Advances in Neural Information Processing Systems , volume=

Neural Additive Models: Interpretable Machine Learning with Neural Nets , author=. Advances in Neural Information Processing Systems , volume=

work page
[62]

Discovery Science , pages=

Neural Additive Vector Autoregression Models for Causal Discovery in Time Series , author=. Discovery Science , pages=. 2021 , publisher=

work page 2021
[63]

Conference on Causal Learning and Reasoning (CLeaR) , year=

Use of Prior Knowledge to Discover Causal Additive Models with Unobserved Variables and its Application to Time Series Data , author=. Conference on Causal Learning and Reasoning (CLeaR) , year=

work page
[64]

Advances in Neural Information Processing Systems , volume=

Prediction-Powered Causal Inferences , author=. Advances in Neural Information Processing Systems , volume=

work page
[65]

Causal Differentiating Concepts: Interpreting

Goyal, Navita and Daum. Causal Differentiating Concepts: Interpreting. Advances in Neural Information Processing Systems , volume=

work page
[66]

ACM Computing Surveys , volume=

Deep Causal Learning: Representation, Discovery and Inference , author=. ACM Computing Surveys , volume=

work page
[67]

2000 , publisher=

Causality: Models, Reasoning, and Inference , author=. 2000 , publisher=

work page 2000
[68]

Journal of the Royal Statistical Society: Series B , volume=

Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society: Series B , volume=

work page
[69]

Annual Review of Statistics and Its Application , volume=

High-dimensional statistics with a view toward applications in biology , author=. Annual Review of Statistics and Its Application , volume=

work page
[70]

On model selection consistency of

Zhao, Peng and Yu, Bin , journal=. On model selection consistency of

work page
[71]

Mixing: properties and examples , author=

work page
[72]

Simultaneous analysis of

Bickel, Peter J and Ritov, Ya'acov and Tsybakov, Alexandre B , journal=. Simultaneous analysis of

work page
[73]

Econometric Theory , volume=

Asymptotics for least absolute deviation regression estimators , author=. Econometric Theory , volume=

work page
[74]

Gradient-based neural

Lachapelle, S. Gradient-based neural. International Conference on Learning Representations (ICLR) , year=

work page
[75]

Advances in Neural Information Processing Systems (NeurIPS) , volume=

Large-scale differentiable causal discovery of factor graphs , author=. Advances in Neural Information Processing Systems (NeurIPS) , volume=

work page
[76]

Advances in Neural Information Processing Systems (NeurIPS) , volume=

Lorch, Lars and Rothfuss, Jonas and Sch. Advances in Neural Information Processing Systems (NeurIPS) , volume=

work page

[1] [1]

Advances in Neural Information Processing Systems , volume=

BayesDAG: Gradient-based posterior inference for causal discovery , author=. Advances in Neural Information Processing Systems , volume=

work page

[2] [2]

Journal of Artificial Intelligence Research , volume=

Survey and evaluation of causal discovery methods for time series , author=. Journal of Artificial Intelligence Research , volume=

work page

[3] [3]

Advances in Neural Information Processing Systems , volume=

DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization , author=. Advances in Neural Information Processing Systems , volume=

work page

[4] [4]

1982 , publisher=

Constrained Optimization and Lagrange Multiplier Methods , author=. 1982 , publisher=

work page 1982

[5] [5]

Advances in Neural Information Processing Systems , volume=

Differentiable causal discovery from interventional data , author=. Advances in Neural Information Processing Systems , volume=

work page

[6] [6]

International Conference on Learning Representations (ICLR) , year=

DyCAST: Learning Dynamic Causal Structure from Time Series , author=. International Conference on Learning Representations (ICLR) , year=

work page

[7] [7]

International Conference on Learning Representations (ICLR) , year=

CausalTime: Realistically Generated Time-Series for Benchmarking of Causal Discovery , author=. International Conference on Learning Representations (ICLR) , year=

work page

[8] [8]

International Conference on Learning Representations (ICLR) , year=

CUTS: Neural causal discovery from irregular time-series data , author=. International Conference on Learning Representations (ICLR) , year=

work page

[9] [9]

Learning from data: Artificial intelligence and statistics V , pages=

Learning Bayesian networks is NP-complete , author=. Learning from data: Artificial intelligence and statistics V , pages=. 1996 , publisher=

work page 1996

[10] [10]

Journal of Machine Learning Research , volume=

Optimal structure identification with greedy search , author=. Journal of Machine Learning Research , volume=

work page

[11] [11]

Knowledge and Information Systems , volume=

The impact of prior knowledge on causal structure learning , author=. Knowledge and Information Systems , volume=

work page

[12] [12]

Frontiers in Genetics , volume=

Review of causal discovery methods based on graphical models , author=. Frontiers in Genetics , volume=

work page

[13] [13]

International Conference on Learning Representations (ICLR) , year=

Rhino: Deep causal temporal relationship learning with history-dependent noise , author=. International Conference on Learning Representations (ICLR) , year=

work page

[14] [14]

Econometrica: journal of the Econometric Society , pages=

Investigating causal relations by econometric models and cross-spectral methods , author=. Econometrica: journal of the Econometric Society , pages=

work page

[15] [15]

International Conference on Machine Learning (ICML) , pages=

On calibration of modern neural networks , author=. International Conference on Machine Learning (ICML) , pages=. 2017 , organization=

work page 2017

[16] [16]

Machine Learning , volume=

Learning Bayesian networks: The combination of knowledge and statistical data , author=. Machine Learning , volume=

work page

[17] [17]

Journal of Machine Learning Research , volume=

Estimation of a structural vector autoregression model using non-Gaussianity , author=. Journal of Machine Learning Research , volume=

work page

[18] [18]

Neural Processing Letters , volume=

Unsuitability of NOTEARS for causal graph discovery when dealing with dimensional quantities , author=. Neural Processing Letters , volume=

work page

[19] [19]

Causal reasoning and large language models: Opening a new frontier for causality

Causal reasoning and large language models: Opening a new frontier for causality , author=. arXiv preprint arXiv:2305.00050 , year=

work page arXiv

[20] [20]

International Conference on Learning Representations (ICLR) , year=

Adam: A method for stochastic optimization , author=. International Conference on Learning Representations (ICLR) , year=

work page

[21] [21]

International Conference on Machine Learning (ICML) , pages=

CITRIS: Causal identifiability from temporal intervened sequences , author=. International Conference on Machine Learning (ICML) , pages=. 2022 , organization=

work page 2022

[22] [22]

Proceedings of the Seminar on Predictability, ECMWF , volume=

Predictability: A problem partly solved , author=. Proceedings of the Seminar on Predictability, ECMWF , volume=

work page

[23] [23]

Conference on Causal Learning and Reasoning (CLeaR) , pages=

Amortized causal discovery: Learning to infer causal graphs from time-series data , author=. Conference on Causal Learning and Reasoning (CLeaR) , pages=. 2022 , organization=

work page 2022

[24] [24]

2005 , publisher=

New introduction to multiple time series analysis , author=. 2005 , publisher=

work page 2005

[25] [25]

Conference on Uncertainty in Artificial Intelligence (UAI) , pages=

Causal inference and causal explanation with background knowledge , author=. Conference on Uncertainty in Artificial Intelligence (UAI) , pages=

work page

[26] [26]

Machine Learning and Knowledge Extraction , volume=

Causal discovery with attention-based convolutional neural networks , author=. Machine Learning and Knowledge Extraction , volume=

work page

[27] [27]

Advances in Neural Information Processing Systems , volume=

On the role of sparsity and DAG constraints for learning linear DAGs , author=. Advances in Neural Information Processing Systems , volume=

work page

[28] [28]

International Conference on Artificial Intelligence and Statistics (AISTATS) , pages=

DYNOTEARS: Structure learning from time-series data , author=. International Conference on Artificial Intelligence and Statistics (AISTATS) , pages=. 2020 , organization=

work page 2020

[29] [29]

2017 , publisher=

Elements of causal inference: foundations and learning algorithms , author=. 2017 , publisher=

work page 2017

[30] [30]

Advances in Neural Information Processing Systems , volume=

Beware of the simulated DAG! Causal discovery benchmarks may be easy to game , author=. Advances in Neural Information Processing Systems , volume=

work page

[31] [31]

Conference on Uncertainty in Artificial Intelligence (UAI) , pages=

Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets , author=. Conference on Uncertainty in Artificial Intelligence (UAI) , pages=. 2020 , organization=

work page 2020

[32] [32]

Nature Reviews Earth & Environment , volume=

Causal inference for time series , author=. Nature Reviews Earth & Environment , volume=

work page

[33] [33]

Progress in Artificial Intelligence , volume=

A survey on Bayesian network structure learning from data , author=. Progress in Artificial Intelligence , volume=

work page

[34] [34]

Toward causal representation learning , author=. Proc. IEEE , volume=

work page

[35] [35]

Econometrica , volume=

Macroeconomics and reality , author=. Econometrica , volume=

work page

[36] [36]

2000 , publisher=

Causation, prediction, and search , author=. 2000 , publisher=

work page 2000

[37] [37]

International Conference on Artificial Intelligence and Statistics (AISTATS) , pages=

NTS-NOTEARS: Learning nonparametric DBN structure from time-series data , author=. International Conference on Artificial Intelligence and Statistics (AISTATS) , pages=. 2023 , organization=

work page 2023

[38] [38]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Neural Granger causality , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page

[39] [39]

The Annals of Statistics , volume=

Geometry of the faithfulness assumption in causal inference , author=. The Annals of Statistics , volume=

work page

[40] [40]

D'ya like DAGs? A survey on structure learning and causal discovery , author=. Comput. Surveys , volume=

work page

[41] [41]

International Conference on Machine Learning (ICML) , pages=

DAGs with No Curl: An efficient DAG structure learning approach , author=. International Conference on Machine Learning (ICML) , pages=. 2020 , organization=

work page 2020

[42] [42]

Advances in Neural Information Processing Systems , volume=

Temporally disentangled representation learning under unknown nonstationarity , author=. Advances in Neural Information Processing Systems , volume=

work page

[43] [43]

International Conference on Machine Learning (ICML) , pages=

DAG-GNN: DAG structure learning with graph neural networks , author=. International Conference on Machine Learning (ICML) , pages=. 2019 , organization=

work page 2019

[44] [44]

Advances in Neural Information Processing Systems , volume=

DAGs with NO TEARS: Continuous optimization for structure learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[45] [45]

The validity of posterior expansions based on

Kass, Robert E and Tierney, Luke and Kadane, Joseph B , booktitle=. The validity of posterior expansions based on. 1990 , publisher=

work page 1990

[46] [46]

Kleijn, Bas JK and van der Vaart, Aad W , journal=. The

work page

[47] [47]

Annals of Statistics , volume=

Convergence rates of posterior distributions , author=. Annals of Statistics , volume=

work page

[48] [48]

IEEE Transactions on Information Theory , volume=

Minimax rates of estimation for high-dimensional linear regression over _q -balls , author=. IEEE Transactions on Information Theory , volume=

work page

[49] [49]

An empirical

Robbins, Herbert , booktitle=. An empirical

work page

[50] [50]

Large-Scale Inference: Empirical

Efron, Bradley , year=. Large-Scale Inference: Empirical

work page

[51] [51]

Compound decision theory and empirical

Zhang, Cun-Hui , journal=. Compound decision theory and empirical

work page

[52] [52]

Journal of the American Statistical Association , volume=

The adaptive lasso and its oracle properties , author=. Journal of the American Statistical Association , volume=

work page

[53] [53]

Journal of the American Statistical Association , volume=

Variable selection via nonconcave penalized likelihood and its oracle properties , author=. Journal of the American Statistical Association , volume=

work page

[54] [54]

From query tools to causal architects: Harnessing large language models for advanced causal discovery from data.arXiv preprint arXiv:2306.16902, 2023

From query tools to causal architects: Harnessing large language models for advanced causal discovery from data , author=. arXiv preprint arXiv:2306.16902 , year=

work page arXiv

[55] [55]

Advances in Neural Information Processing Systems , volume=

Differentiable Constraint-Based Causal Discovery , author=. Advances in Neural Information Processing Systems , volume=

work page

[56] [56]

International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

Waxman, Daniel and Butler, Kurt and Djuri. International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

work page

[57] [57]

Advances in Neural Information Processing Systems , volume=

Differentiable Structure Learning with Partial Orders , author=. Advances in Neural Information Processing Systems , volume=

work page

[58] [58]

Advances in Neural Information Processing Systems , volume=

Marrying Causal Representation Learning with Dynamical Systems for Science , author=. Advances in Neural Information Processing Systems , volume=

work page

[59] [59]

Advances in Neural Information Processing Systems , volume=

Amortized Inference for Causal Structure Learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[60] [60]

ACM Computing Surveys , volume=

Causal Discovery from Temporal Data: An Overview and New Perspectives , author=. ACM Computing Surveys , volume=

work page

[61] [61]

Advances in Neural Information Processing Systems , volume=

Neural Additive Models: Interpretable Machine Learning with Neural Nets , author=. Advances in Neural Information Processing Systems , volume=

work page

[62] [62]

Discovery Science , pages=

Neural Additive Vector Autoregression Models for Causal Discovery in Time Series , author=. Discovery Science , pages=. 2021 , publisher=

work page 2021

[63] [63]

Conference on Causal Learning and Reasoning (CLeaR) , year=

Use of Prior Knowledge to Discover Causal Additive Models with Unobserved Variables and its Application to Time Series Data , author=. Conference on Causal Learning and Reasoning (CLeaR) , year=

work page

[64] [64]

Advances in Neural Information Processing Systems , volume=

Prediction-Powered Causal Inferences , author=. Advances in Neural Information Processing Systems , volume=

work page

[65] [65]

Causal Differentiating Concepts: Interpreting

Goyal, Navita and Daum. Causal Differentiating Concepts: Interpreting. Advances in Neural Information Processing Systems , volume=

work page

[66] [66]

ACM Computing Surveys , volume=

Deep Causal Learning: Representation, Discovery and Inference , author=. ACM Computing Surveys , volume=

work page

[67] [67]

2000 , publisher=

Causality: Models, Reasoning, and Inference , author=. 2000 , publisher=

work page 2000

[68] [68]

Journal of the Royal Statistical Society: Series B , volume=

Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society: Series B , volume=

work page

[69] [69]

Annual Review of Statistics and Its Application , volume=

High-dimensional statistics with a view toward applications in biology , author=. Annual Review of Statistics and Its Application , volume=

work page

[70] [70]

On model selection consistency of

Zhao, Peng and Yu, Bin , journal=. On model selection consistency of

work page

[71] [71]

Mixing: properties and examples , author=

work page

[72] [72]

Simultaneous analysis of

Bickel, Peter J and Ritov, Ya'acov and Tsybakov, Alexandre B , journal=. Simultaneous analysis of

work page

[73] [73]

Econometric Theory , volume=

Asymptotics for least absolute deviation regression estimators , author=. Econometric Theory , volume=

work page

[74] [74]

Gradient-based neural

Lachapelle, S. Gradient-based neural. International Conference on Learning Representations (ICLR) , year=

work page

[75] [75]

Advances in Neural Information Processing Systems (NeurIPS) , volume=

Large-scale differentiable causal discovery of factor graphs , author=. Advances in Neural Information Processing Systems (NeurIPS) , volume=

work page

[76] [76]

Advances in Neural Information Processing Systems (NeurIPS) , volume=

Lorch, Lars and Rothfuss, Jonas and Sch. Advances in Neural Information Processing Systems (NeurIPS) , volume=

work page