pith. sign in

arxiv: 2510.06735 · v2 · submitted 2025-10-08 · 💻 cs.LG · stat.ME

Incorporating Expert Knowledge into Bayesian Causal Discovery of Mixtures of Directed Acyclic Graphs

Pith reviewed 2026-05-18 08:50 UTC · model grok-4.3

classification 💻 cs.LG stat.ME
keywords Bayesian causal discoverymixture of DAGsexpert elicitationvariational inferenceheterogeneous datastructure learningcausal Bayesian networks
0
0 comments X

The pith

Expert feedback elicited through Bayesian experimental design can be turned into a graph prior that lets a variational method recover multiple causal structures from heterogeneous data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that standard Bayesian causal discovery, which assumes one underlying graph, can be extended to handle data generated by several different causal mechanisms at once. It does so by first using Bayesian experimental design to ask a simulated expert targeted questions about possible edges, then folding those answers into an informative prior inside a new variational inference procedure that jointly learns the mixture components and the graphs within them. A sympathetic reader would care because many applied datasets, from clinical records to economic indicators, are heterogeneous; a single-graph model necessarily averages over distinct relationships and loses resolution. If the approach works, analysts obtain not only better structure estimates but also an explicit partitioning of the data into clusters each explained by its own causal Bayesian network.

Core claim

We propose a causal elicitation strategy for heterogeneous settings based on Bayesian experimental design principles together with a variational mixture structure learning method that extends differentiable Bayesian structure learning to iteratively infer mixtures of causal Bayesian networks. We construct an informative graph prior that incorporates the elicited expert feedback directly into the inference of mixtures of CBNs. The method produces a set of alternative causal models and achieves improved structure learning performance on heterogeneous synthetic data when informed by a simulated expert; it is also shown to capture complex distributions in a breast cancer database.

What carries the argument

The variational mixture structure learning (VaMSL) procedure, which jointly infers mixture component assignments and the directed acyclic graphs within each component while conditioning on an expert-derived graph prior obtained via Bayesian experimental design.

If this is right

  • The procedure yields an explicit set of alternative causal models, one per discovered mixture component.
  • Structure-learning accuracy rises on heterogeneous synthetic data once the expert-elicited prior is included.
  • The same pipeline can be applied to real heterogeneous collections such as patient records and still produce interpretable clusters of causal relationships.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same elicitation-plus-mixture strategy could be tested on other heterogeneous domains where partial expert knowledge is already available, such as multi-center clinical trials.
  • Replacing the simulated expert with actual domain specialists would provide a direct check on whether the performance gains survive realistic elicitation noise.
  • If the learned mixture components align with observable covariates, the method supplies a data-driven way to define clinically or scientifically meaningful subgroups.

Load-bearing premise

Expert answers obtained through Bayesian experimental design can be encoded as a graph prior that improves posterior inference over the mixture components without systematically biasing or over-constraining the learned structures.

What would settle it

On synthetic data generated from a known mixture of two or more ground-truth causal graphs, run the method both with and without the expert-informed prior and measure whether the informed version recovers a higher fraction of the true edges and the correct number of mixture components.

Figures

Figures reproduced from arXiv: 2510.06735 by Jorge Lor\'ia, Samuel Kaski, Sophie Wharrie, Zachris Bj\"orkman.

Figure 1
Figure 1. Figure 1: Generative model for expert edge beliefs [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: shows the DGP of our mixture model, using DiBS, and it corresponds to p(π, Z, G, Θ, C, D | DS, DH) = p(π)p(Z | DS)p(G | Z, DH)p(Θ | G) × p(C | π)p(D | C, G, Θ) = p(π) Y k [p(Zk | DS,k)p(Gk | Zk, DH,k)p(Θk | Gk)] × Y n h p(cn | π) Y k p(xn | cnk, Gk, Θk) i , where graphs with d variables are represented by adjacency matrices Gk ∈ {0, 1} d×d . Each compo￾nent of the graph embeddings Zk = [U(k) , V(k) ] with … view at source ↗
Figure 3
Figure 3. Figure 3: Boxplot (and values) of average ESHD be [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Boxplot and values of classification accuracy [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison between BED and random ap￾proaches for query selection by mean and bootstrapped 95% confidence interval for homogeneous experiments with varying number of queries (top) and varying expert reliability (bottom) when inferring Gaussian ER net￾work in the linear (left) and non-linear case (right). effect of the reliability of the expert, showing that more reliable opinions impact the inference more … view at source ↗
Figure 6
Figure 6. Figure 6: Boxplots and values of metrics in held-out observations, simulated with [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Boxplots of metrics of interest for simulations in [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Boxplots of metrics of interest for simulations in [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Boxplots of metrics of interest for simulations in [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison between BED and random querying of the expert with varying number of queries ( [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Boxplots and values for metrics of interest in out-of-sample data for breast cancer data set in methods [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Mean and 95% (bootstrapped) confidence interval of metrics for out-of-sample data using VaMSL [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Boxplots and values for metrics of interest in out-of-sample data for breast cancer data set in methods [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Mean and 95% (bootstrapped) confidence interval of metrics for out-of-sample data using VaMSL [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Mean and 95% (bootstrapped) confidence interval of ESHD for VaMSL with more informative [PITH_FULL_IMAGE:figures/full_fig_p026_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Mean and 95% (bootstrapped) confidence interval of ESHD for VaMSL with less informative prior using [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Average ESHD (SD) for varying levels of α0 and expert reliability in ER graphs, over 20 independent runs. For linear (top) and non-linear graphs (bottom) [PITH_FULL_IMAGE:figures/full_fig_p027_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Average ESHD (SD) for varying levels of α0 and expert reliability in SF graphs, over 20 independent runs. For linear (top) and non-linear graphs (bottom) [PITH_FULL_IMAGE:figures/full_fig_p028_18.png] view at source ↗
read the original abstract

Bayesian causal discovery benefits from prior information elicited from domain experts, and in heterogeneous domains any prior knowledge would be badly needed. However, so far prior elicitation approaches have assumed a single causal graph and hence are not suited to heterogeneous domains. We propose a causal elicitation strategy for heterogeneous settings, based on Bayesian experimental design (BED) principles, and a variational mixture structure learning (VaMSL) method -- extending the earlier differentiable Bayesian structure learning (DiBS) method -- to iteratively infer mixtures of causal Bayesian networks (CBNs). We construct an informative graph prior incorporating elicited expert feedback in the inference of mixtures of CBNs. Our proposed method successfully produces a set of alternative causal models (mixture components or clusters), and achieves an improved structure learning performance on heterogeneous synthetic data when informed by a simulated expert. Finally, we demonstrate that our approach is capable of capturing complex distributions in a breast cancer database.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a Bayesian experimental design (BED) strategy for eliciting expert knowledge suited to heterogeneous domains and introduces VaMSL, a variational extension of DiBS, to infer mixtures of causal Bayesian networks. It incorporates the elicited feedback as an informative graph prior within the variational objective and reports that the approach recovers sets of alternative causal models while achieving improved structure learning performance on heterogeneous synthetic data when using a simulated expert; it also applies the model to a breast cancer database to capture complex distributions.

Significance. If the central empirical claims hold under realistic conditions, the work would address a genuine gap by extending prior elicitation methods beyond single-graph assumptions to mixture settings where heterogeneity is common. The variational formulation for scalable inference over mixture components and the BED-based elicitation are technically sound extensions of existing tools. The paper ships a clear methodological integration of expert feedback into the prior, which is a strength.

major comments (2)
  1. [§4] §4 (synthetic experiments): the reported performance gains rest on a simulated expert whose responses are generated directly from the ground-truth mixture components. This construction implicitly assumes noise-free, component-aware feedback; any deviation in real elicitation would change the effective regularization on mixture weights and component graphs in the VaMSL variational objective, undermining the claim that the prior meaningfully improves posterior inference without systematic bias.
  2. [§5] §5 (breast cancer application): the experiment demonstrates that the mixture model can fit complex distributions but contains no expert feedback or BED elicitation step. This leaves the key prior-construction mechanism—the central contribution—untested outside the idealized synthetic simulation.
minor comments (1)
  1. [Abstract] The abstract provides no quantitative metrics, baseline comparisons, ablation details, or description of how the expert simulation was constructed; adding these would strengthen the empirical section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment point by point below, providing clarifications and outlining planned revisions where appropriate. Our responses aim to strengthen the presentation of the work without overstating its empirical scope.

read point-by-point responses
  1. Referee: [§4] §4 (synthetic experiments): the reported performance gains rest on a simulated expert whose responses are generated directly from the ground-truth mixture components. This construction implicitly assumes noise-free, component-aware feedback; any deviation in real elicitation would change the effective regularization on mixture weights and component graphs in the VaMSL variational objective, undermining the claim that the prior meaningfully improves posterior inference without systematic bias.

    Authors: We appreciate the referee's observation that the simulated expert provides noise-free, component-aware feedback. This design choice was made deliberately to establish an upper-bound performance benchmark and to isolate the effect of incorporating the elicited prior into the VaMSL objective, which is a common practice when first validating new elicitation and inference frameworks. We agree that real expert responses would introduce noise and potential component misassignment. To address this concern directly, we will add a new set of experiments in the revised manuscript that inject controlled noise into the simulated expert responses (e.g., random flips at varying rates and occasional component confusion). These results will be reported alongside the original oracle case to demonstrate robustness of the prior integration. revision: yes

  2. Referee: [§5] §5 (breast cancer application): the experiment demonstrates that the mixture model can fit complex distributions but contains no expert feedback or BED elicitation step. This leaves the key prior-construction mechanism—the central contribution—untested outside the idealized synthetic simulation.

    Authors: We agree that the breast cancer experiment applies only the base VaMSL mixture model without the BED-elicited expert prior, thereby leaving the full pipeline (elicitation plus prior incorporation) untested on real data. The section was included to illustrate that the variational mixture formulation can capture heterogeneous structure in a complex real-world dataset, serving as a necessary sanity check before expert integration. Because obtaining genuine domain-expert responses for the breast cancer variables was outside the scope of the present study, we will revise the manuscript to explicitly label this experiment as a demonstration of the mixture model alone, move the expert-prior results to a dedicated subsection, and add a limitations paragraph discussing the need for future real-expert validation. revision: partial

Circularity Check

0 steps flagged

Minor self-citation to DiBS; central variational derivation and expert prior remain independent

full rationale

The paper extends the prior DiBS method to a variational mixture structure learning (VaMSL) approach for inferring mixtures of CBNs, incorporating BED-elicited expert feedback as a graph prior. This uses standard variational inference without any reported performance metric or posterior quantity reducing by construction to a parameter fitted on the evaluation data itself. The synthetic-data experiments with a simulated expert serve as external validation rather than a definitional loop. Self-citation to DiBS is present but not load-bearing for the new heterogeneous elicitation claims, which rest on the BED-to-prior mapping and mixture inference steps that are independently specified.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling choice that data arises from a finite mixture of DAGs and that expert priors can be elicited and encoded without circular dependence on the target posterior.

free parameters (1)
  • Number of mixture components K
    The model requires choosing or inferring the number of clusters; this choice directly affects the recovered set of causal graphs.
axioms (1)
  • domain assumption Observed data is generated from a mixture of causal Bayesian networks.
    This is the foundational modeling assumption that justifies the mixture structure.

pith-pipeline@v0.9.0 · 5698 in / 1238 out tokens · 28145 ms · 2026-05-18T08:50:00.253497+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    and Cussens, J

    Angelopoulos, N. and Cussens, J. (2008). Bayesian learningofBayesiannetworkswithinformativepriors. Annals of Mathematics and Artificial Intelligence, 54(1):53–98. Annadani, Y., Pawlowski, N., Jennings, J., Bauer, S., Zhang, C., and Gong, W. (2023). BayesDAG: Gradient-Based Posterior Inference for Causal Dis- covery.Advances in Neural Information Processin...

  2. [2]

    Betancourt, M. (2017). Identifying Bayesian mixture models. https://mc- stan.org/users/documentation/case- studies/identifying_mixture_models.html. Bishop, C. M. (2006).Pattern Recognition and Ma- chine Learning. Information Science and Statistics. Springer, New York. Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). Variational Inference: A Revie...

  3. [3]

    Consonni, G., Fouskakis, D., Liseo, B., and Ntzoufras, I. (2018). Prior distributions for objective Bayesian analysis.Bayesian Analysis, 13:627–679. Constantinou, A. C., Guo, Z., and Kitson, N. K. (2023). The impact of prior knowledge on causal structure learning.Knowledge and Information Sys- tems, 65(8):3385–3434. Crouse, D. F. (2016). On implementing 2...

  4. [4]

    Good, I. J. (1950).Probability and the Weighing of Evidence. C. Griffin London. Hasan, U., Hossain, E., and Gani, M. O. (2023). A Survey on Causal Discovery Methods for I.I.D. and Time Series Data.Trans. Mach. Learn. Res.,

  5. [5]

    Heckerman, D. (2022). A Tutorial on Learning With Bayesian Networks. (arXiv:2002.00269). Heckerman, D., Geiger, D., and Chickering, D. M. (1995). Learning Bayesian networks: The combi- nation of knowledge and statistical data.Machine Learning, 20(3):197–243. Kitson, N. K., Constantinou, A. C., Guo, Z., Liu, Y., and Chobtham, K. (2023). A survey of Bayesia...

  6. [6]

    Lorch, L., Rothfuss, J., Schölkopf, B., and Krause, A

    Curran Associates, Inc. Lorch, L., Rothfuss, J., Schölkopf, B., and Krause, A. (2021). DiBS: Differentiable Bayesian Structure Learning. InAdvances in Neural Information Pro- cessing Systems, volume 34, pages 24111–24123. Cur- ran Associates, Inc. Marchant, R., Draca, D., Francis, G., Assadzadeh, S., Varidel, M., Iorfino, F., and Cripps, S. (2025). Covari...

  7. [7]

    First, to approximate the component assignment we use a categorical distribution, initialized as uninformative

    Incorporating Expert Knowledge into Bayesian Causal Discovery of Mixtures of Directed Acyclic Graphs Incorporating Expert Knowledge into Bayesian Causal Discovery of Mixtures of Directed Acyclic Graphs: Supplementary Materials A APPROXIMATING F AMILIES OF DISTRIBUTIONS We split the approximation ofp into three separate parts: approximate the component ass...

  8. [8]

    , P,for a given component k are acyclic

    none of the graphsG∞(Z(p) k ), p = 1, . . . , P,for a given component k are acyclic. We set the maximum number of random restarts to5restarts, and, in the case of using an elicited informative prior, let the reinitialized model start with same prior. The dominating step of our SVGD instantiation of VaMSL lies in computing the (approximate) gradients part ...

  9. [9]

    We provide a proof in Appendix A.2.4

    Proposition 2 (Latent mixture posterior expectation)Under the generative model(5), it holds that: Ep(Gk,Θk|D)[f(G,Θ)] =E p(Zk,Θk,C|D) Ep(Gk|Zk)[f(G k,Θ k)p(Θk |G k)Q n p(xn |G k,Θ k)cnk] Ep(Gk|Zk)[p(Θk |G k)Q n p(xn |G k,Θ k)cnk] . We provide a proof in Appendix A.2.4. A.2.1 Exact Update of Responsibilities and Mixing Weights Making use of Equation (16) (...

  10. [10]

    If however, the assignments are updated before sufficient annealing, Proposition 2 can be employed to compute the expectations via the distributions q(Zk, Θk). For our algorithms, we use Proposition 2 in this manner when updating the responsibilities, except after the final annealing of the particle distributionq(Zk, Θk), in which case we compute Equation...

  11. [11]

    −1 i +constant.(20) The optimal update is, therefore, also Dirichlet distributed andq∗(π) = Dir(α∗), whereα∗ k = αk +P n ˆq∗(cnk = 1). Zachris Björkman, Jorge Loría, Sophie Wharrie, Samuel Kaski A.2.2 Particle Algorithm for Latent Embeddings and Parameters To approximate full posteriors using SVGD, we instantiate a set ofP particles {Z(p) k , Θ(p) k }P p=...

  12. [12]

    =γ(c nk).) Showing the update is a Dirichlet distributionq∗(π) =Dir(α ∗), whereα ∗ k =α k +P n γ(cnk). A.2.4 Derivation of Proposition 2 Below we derive the latent mixture posterior expectation which connects the expectation of a function with respect to a joint distribution over graphsGk and parametersΘ k with an expectation with respect to a distributio...

  13. [13]

    unresponded

    Solving for these case we get the following mapping from expert belief to imaginary observations: fα0,β0(ψ∗ ij) =    Kij = (nij = j ψ∗ ij(α0+β0−2)−α0+1 1−ψ∗ ij k , kij =n ij),ifψ ∗ ij > ψ ∗ ij,0; Kij = (nij = j α0−1−ψ∗ ij(α0+β0−2) ψ∗ ij k , kij = 0),ifψ ∗ ij < ψ ∗ ij,0. Note that, as the prior parametersα0, β0 determine how many observations it would t...

  14. [14]

    defines a method for choosing optimal experiments (with respect to a given utility) under uncertainty. Given a parameter distributionp(Z | D)and a simulatorp(ψ∗ ij | Z, ξij), we apply the BED framework to pick optimal queries about causal edges for the expert based on the information theoretic utility of each query. The optimal query maximizes the expecte...