pith. sign in

arxiv: 2604.10863 · v1 · submitted 2026-04-13 · 📊 stat.ME · stat.CO

Restricted Search Space Graph MCMC via Birth-Death Processes

Pith reviewed 2026-05-10 16:41 UTC · model grok-4.3

classification 📊 stat.ME stat.CO
keywords MCMCDAG inferencebirth-death processestotal variation distancetransdimensional samplingrestricted search spaceBayesian networks
0
0 comments X

The pith

Sharp bounds quantify the total variation error from restricting the edge search space in MCMC for inferring directed acyclic graphs, while a birth-death process allows the restriction to grow or shrink dynamically.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives sharp lower and upper bounds on the total variation distance between the full posterior over DAGs and the posterior obtained when MCMC is confined to a fixed subset of possible edges. These bounds identify parameter regimes where the restriction introduces negligible distortion and regimes where the distortion becomes large. To shrink the error, the authors replace the fixed restriction with a transdimensional sampler whose search space expands or contracts according to birth-and-death rates that place a prior on the collection of allowable spaces. An efficient implementation is described and finite-sample behavior is examined in simulation studies.

Core claim

We derive sharp lower and upper bounds on the total variation distance between the unrestricted posterior distribution and the posterior distribution induced by a state-of-the-art restricted search space MCMC method. These bounds characterize regimes in which the approximation error is negligible and regimes in which it is not. In order to reduce the error, we propose a flexible transdimensional MCMC sampler which allows the search space to expand or contract dynamically as the chain progresses. The sampler is defined by birth-and-death rates that induce a prior distribution on the set of search spaces, rather than assume a fixed restricted search space throughout.

What carries the argument

Birth-and-death rates that induce a prior over possible restricted search spaces, thereby driving a transdimensional MCMC chain whose edge set can change during sampling.

Load-bearing premise

That birth-and-death rates can be chosen to induce a prior on search spaces such that the resulting transdimensional chain mixes well and the approximation error remains controlled without introducing new intractable computations.

What would settle it

For a small DAG whose exact posterior can be enumerated, run both the fixed-restriction and the dynamic birth-death samplers and check whether the empirical total variation distance to the true posterior exceeds the derived upper bound.

Figures

Figures reproduced from arXiv: 2604.10863 by Kieran R Campbell, Morris Greenberg, Radu Craiu.

Figure 1
Figure 1. Figure 1: (a) Example DAG with 4 nodes (b) 2 example topological order representations of the DAG in (a). In both topological orders, parents are always to the right of children. nents. Specifically, in a Bayesian network B = (G, θ), each vertex i ∈ V corresponds to a random variable Xi , EG encodes the dependence structure among the variables, and θ denotes the parameters specifying the functional forms of the cond… view at source ↗
Figure 2
Figure 2. Figure 2: 4-node example of restricted search space: (a) Posterior probabilities of the DAGs without imposing any restrictions. (b) Posterior probabilities of the graphs where the edge 1 → 3 is not included in the restricted search space. Excluded DAGs are shaded in gray. (c) Resulting restricted order posterior probability mass functions for both search spaces. Green (1999) extended this approach to Gaussian Graphi… view at source ↗
Figure 3
Figure 3. Figure 3: Upper and Lower bound of the Total Variation Distance for different thresholds across 3 different constants. Dashed line denotes y = x. 1 − √ 1 − cεH ≤ ϵ α provides a practical lower bound on the gap between πP and πP˜ (even when we cannot neatly evaluate ϵ). This highlights how the quality of H directly influences the accuracy of any approximate kernel (as expressed through ϵ) for order MCMC, and formaliz… view at source ↗
Figure 4
Figure 4. Figure 4: Illustration at step t of combined BROOD Algorithm. Purple dots and arrows correspond to the Q0 kernel in equation (16) (moves across the orders space ≺), and blue dots and arrows correspond to the Q1 kernel (moves across the restricted search spaces H) In our setup, a DAG G can only be used to define L if G ∈ GH, where H ∈ Dp is the restricted search space. In addition, the ordering ≺ dictates the arrange… view at source ↗
Figure 5
Figure 5. Figure 5: (a) Construction of a restricted Cholesky space given restricted search space H and variable order ≺. Light-blue entries are parameters in Θ(H,≺) . Purple entries are in ΘH but not Θ≺. Green entries are in Θ≺ but not ΘH. Red entries are in neither. (b) Two local moves on the space of Gaussian DAG models: Q0 changes the ordering ≺ by relocating node 2 from position 4, and Q1 adds edge 3 → 2 to H. Both modif… view at source ↗
Figure 6
Figure 6. Figure 6: ROC AUC results with Gaussian SEM data, using plus-one sparsity. space sampler after applying the greedy iterative expansion algorithm. We suspect that this result is present for our small dataset experiments but not for our large ones because the posteriors on small datasets are more diffuse than those on larger datasets on average. Un￾der diffuser posteriors, the birth-death process accepts more space ch… view at source ↗
Figure 7
Figure 7. Figure 7: ROC PR results with Gaussian SEM data, using plus-one sparsity. closer the run times are between BROOD and BiDAG. This suggests that in scenarios where the stopping criterion in the BiDAG expansion procedure is difficult to meet, BROOD may be an especially good choice for performing structure inference. Moreover, when BROOD is significantly slower than BiDAG, for p = 200, it still runs 2002 log(200) = 211,… view at source ↗
Figure 8
Figure 8. Figure 8: ROC AUC results with Gaussian SEM data, using fixed sparsity. where BROOD outperforms other methods given a finite budget of computational resources. In addition, empirically, BROOD can only sometimes recover from a poorly-initialized start￾ing search space compared to using existing faster hybrid procedures to search for a high￾quality fixed restricted search space. Two natural directions for future work … view at source ↗
Figure 9
Figure 9. Figure 9: Skeleton version of ROC AUC results, using fixed sparsity. in Kuipers et al. (2021), it is not recommended to use algorithms like Greedy Equivalent Search (Chickering, 2002) to form the initial search space, due to their tendency to include many false positive edges (which would increase the computational cost of expanding the search space). BROOD, on the other hand, can afford to consider starting search … view at source ↗
Figure 10
Figure 10. Figure 10: log10(Run Time) results for Gaussian SEM data, using plus-one sparsity transdimensional samplers beyond the particular choice used by BROOD. BROOD is de￾fined by rates corresponding to a specific prior on the set of the search spaces that (1) meets the conditions required for valid birth-death sampling, and (2) favors search spaces whose associated graphs have high posterior mass on average. Other princip… view at source ↗
Figure 11
Figure 11. Figure 11: P r+ results with Gaussian SEM data, using plus-one sparsity [PITH_FULL_IMAGE:figures/full_fig_p049_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: P r− results with Gaussian SEM data, using plus-one sparsity. 49 [PITH_FULL_IMAGE:figures/full_fig_p049_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Skeleton version of PR AUC results with Gaussian SEM data, using plus-one sparsity [PITH_FULL_IMAGE:figures/full_fig_p050_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Skeleton version of P r+ results with Gaussian SEM data, using plus-one spar￾sity. 50 [PITH_FULL_IMAGE:figures/full_fig_p050_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Skeleton version of P r− results with Gaussian SEM data, using plus-one spar￾sity [PITH_FULL_IMAGE:figures/full_fig_p051_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: ROC AUC results with FCM data, using plus-one sparsity. 51 [PITH_FULL_IMAGE:figures/full_fig_p051_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: PR AUC results with FCM data, using plus-one sparsity [PITH_FULL_IMAGE:figures/full_fig_p052_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: P r+ results with FCM data, using plus-one sparsity. 52 [PITH_FULL_IMAGE:figures/full_fig_p052_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: P r− results with FCM data, using plus-one sparsity [PITH_FULL_IMAGE:figures/full_fig_p053_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Skeleton version ROC AUC results with FCM data, using plus-one sparsity. 53 [PITH_FULL_IMAGE:figures/full_fig_p053_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Skeleton version of PR AUC results with FCM data, using plus-one sparsity [PITH_FULL_IMAGE:figures/full_fig_p054_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Skeleton version of P r+ results with FCM data, using plus-one sparsity. 54 [PITH_FULL_IMAGE:figures/full_fig_p054_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Skeleton version of P r− results with FCM data, using plus-one sparsity. C.2.4 Performance: Gaussian SEMs, Fixed Sparsity Runs [PITH_FULL_IMAGE:figures/full_fig_p055_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: PR AUC results with Gaussian SEM data, using fixed sparsity [PITH_FULL_IMAGE:figures/full_fig_p056_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: P r+ results with Gaussian SEM data, using fixed sparsity. 56 [PITH_FULL_IMAGE:figures/full_fig_p056_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: P r− results with Gaussian SEM data, using fixed sparsity [PITH_FULL_IMAGE:figures/full_fig_p057_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Skeleton version of ROC AUC results with Gaussian SEM data, using fixed sparsity. 57 [PITH_FULL_IMAGE:figures/full_fig_p057_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Skeleton version of PR AUC results with Gaussian SEM data, using fixed spar￾sity [PITH_FULL_IMAGE:figures/full_fig_p058_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Skeleton version of P r+ results with Gaussian SEM data, using fixed sparsity. 58 [PITH_FULL_IMAGE:figures/full_fig_p058_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Skeleton version of P r− results with Gaussian SEM data, using fixed sparsity. to Gaussian SEM data generation when analyzing the impact of initialization with fixed sparsity capping: in settings with posteriors that are expected to be disperse, BROOD tends to be robust to recovering from poorly initialized search spaces, but is more sensitive to the initialization in settings where posteriors are expecte… view at source ↗
Figure 31
Figure 31. Figure 31: ROC AUC results with FCM data, using fixed sparsity [PITH_FULL_IMAGE:figures/full_fig_p060_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: PR AUC results with FCM data, using fixed sparsity. 60 [PITH_FULL_IMAGE:figures/full_fig_p060_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Skeleton version ROC AUC results with FCM data, using fixed sparsity [PITH_FULL_IMAGE:figures/full_fig_p061_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: Skeleton version of PR AUC results with FCM data, using fixed sparsity. 61 [PITH_FULL_IMAGE:figures/full_fig_p061_34.png] view at source ↗
Figure 35
Figure 35. Figure 35: log10(Run Time) results for FCM data, using plus-one sparsity [PITH_FULL_IMAGE:figures/full_fig_p062_35.png] view at source ↗
Figure 36
Figure 36. Figure 36: log10(Run Time) results for Gaussian SEM data, using fixed sparsity 62 [PITH_FULL_IMAGE:figures/full_fig_p062_36.png] view at source ↗
Figure 37
Figure 37. Figure 37: log10(Run Time) results for FCM data, using fixed sparsity 63 [PITH_FULL_IMAGE:figures/full_fig_p063_37.png] view at source ↗
read the original abstract

Inferring directed acyclic graphs (DAGs) from data via Markov chain Monte Carlo (MCMC) is computationally challenging in moderate-to-high dimensional settings because their discrete sampling space grows super-exponentially with the number of nodes. To address scalability, several recent MCMC-based graph inference methods restrict the search space to a subset of edges, at the cost of introducing error into the inference procedure. In this work, we derive sharp lower and upper bounds on the total variation distance between the unrestricted posterior distribution and the posterior distribution induced by a state-of-the-art restricted search space MCMC method. These bounds characterize regimes in which the approximation error is negligible and regimes in which it is not. In order to reduce the error, we propose a flexible transdimensional MCMC sampler which allows the search space to expand or contract dynamically as the chain progresses. The sampler is defined by birth-and-death rates that induce a prior distribution on the set of search spaces, rather than assume a fixed restricted search space throughout. We outline an efficient implementation of the proposed algorithm and demonstrate its finite-sample performance through simulation studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper derives sharp lower and upper bounds on the total variation distance between the unrestricted posterior over DAGs and the posterior induced by a fixed restricted-search-space MCMC sampler. It then proposes a transdimensional birth-death MCMC sampler whose birth and death rates induce a prior over possible search spaces, allowing the restriction to expand or contract dynamically, and outlines an efficient implementation whose finite-sample behavior is illustrated via simulation studies.

Significance. If the TV bounds are indeed sharp and the birth-death construction preserves the exact marginal posterior on graphs while remaining computationally tractable, the work would provide a principled way to control approximation error in scalable Bayesian DAG inference. The explicit bounds and the dynamic-search-space idea are potentially valuable contributions to the literature on structure learning.

major comments (3)
  1. [Section describing the birth-death construction and implementation outline] The TV bounds are stated for a fixed restricted search space. For the dynamic sampler, the marginal posterior on graphs remains exactly the target only if the birth-death rates are chosen so that detailed balance holds for the joint (graph, search-space) process. The manuscript must exhibit the explicit functional form of these rates and prove that they can be evaluated without computing the normalizing constant over admissible search spaces or the posterior mass outside any candidate space (see the skeptic note on rate selection).
  2. [Implementation outline and simulation studies] The abstract claims an 'efficient implementation' exists and that the approximation error can be controlled. However, the provided outline does not include an error analysis showing that the induced prior keeps the TV distance within the derived bounds for the dynamic case, nor does it demonstrate that the rates remain posterior-independent. This is load-bearing for the central claim that the method reduces error without reintroducing intractability.
  3. [Simulation studies] Simulation studies are used to demonstrate finite-sample performance, but without reporting the realized TV distance (or a proxy) against the theoretical bounds, or comparing against a fixed-restriction baseline under the same computational budget, it is difficult to assess whether the dynamic adjustment actually improves upon the static case in the regimes the bounds identify as problematic.
minor comments (1)
  1. Notation for the collection of admissible search spaces and the induced prior should be introduced earlier and used consistently when stating the TV bounds.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for their insightful comments and recommendations. Below we provide point-by-point responses to the major comments and outline the revisions we will undertake.

read point-by-point responses
  1. Referee: [Section describing the birth-death construction and implementation outline] The TV bounds are stated for a fixed restricted search space. For the dynamic sampler, the marginal posterior on graphs remains exactly the target only if the birth-death rates are chosen so that detailed balance holds for the joint (graph, search-space) process. The manuscript must exhibit the explicit functional form of these rates and prove that they can be evaluated without computing the normalizing constant over admissible search spaces or the posterior mass outside any candidate space (see the skeptic note on rate selection).

    Authors: We agree that the explicit rates and detailed balance for the joint process are necessary to establish the exact marginal posterior. In the revised manuscript we will state the functional forms of the birth and death rates explicitly and supply the proof that these rates satisfy detailed balance while depending only on the current graph, the current search space, and the induced prior; no global normalizing constant or mass outside the candidate space is required. revision: yes

  2. Referee: [Implementation outline and simulation studies] The abstract claims an 'efficient implementation' exists and that the approximation error can be controlled. However, the provided outline does not include an error analysis showing that the induced prior keeps the TV distance within the derived bounds for the dynamic case, nor does it demonstrate that the rates remain posterior-independent. This is load-bearing for the central claim that the method reduces error without reintroducing intractability.

    Authors: We acknowledge that the outline in the current draft is concise. The revision will expand the implementation section with an explicit error analysis showing that the dynamic prior keeps the total-variation distance inside the derived bounds, and will demonstrate that the rates are posterior-independent by construction, depending solely on the current state and the prior over search spaces. revision: yes

  3. Referee: [Simulation studies] Simulation studies are used to demonstrate finite-sample performance, but without reporting the realized TV distance (or a proxy) against the theoretical bounds, or comparing against a fixed-restriction baseline under the same computational budget, it is difficult to assess whether the dynamic adjustment actually improves upon the static case in the regimes the bounds identify as problematic.

    Authors: We agree that direct numerical checks against the bounds and matched-budget comparisons would strengthen the empirical section. The revised simulation studies will report realized TV distances (or computable proxies) and will add comparisons to fixed-restriction samplers under identical computational budgets, focusing on the regimes identified by the bounds. revision: yes

Circularity Check

0 steps flagged

No circularity: TV bounds and birth-death sampler derived independently of fitted inputs or self-citation chains

full rationale

The paper's core contributions are a mathematical derivation of sharp TV distance bounds between unrestricted and restricted posteriors, plus a direct construction of a transdimensional sampler via birth-death rates that induce a prior on search spaces. No equations or steps reduce the claimed bounds or sampler performance to quantities defined from the same data by construction, nor do they rely on load-bearing self-citations whose validity is assumed without external verification. The abstract and described claims treat the bounds as new characterizations and the rates as a flexible proposal with an outlined efficient implementation, keeping the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard MCMC convergence theory plus the new birth-death construction; the abstract supplies no explicit free parameters or invented entities beyond the rates themselves.

axioms (1)
  • domain assumption Birth-and-death rates induce a prior distribution on the set of search spaces
    Explicitly stated as the definition of the proposed sampler.

pith-pipeline@v0.9.0 · 5485 in / 1098 out tokens · 55960 ms · 2026-05-10T16:41:03.816290+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

  1. [1]

    Byrne, S

    Morgan Kaufmann, San Francisco (CA). Byrne, S. and Dawid, A. P. (2015). Structural Markov graph laws for Bayesian model uncertainty.The Annals of Statistics, 43(4):1647–1681. Cao,X.,Khare,K.,andGhosh,M.(2019). Posteriorgraphselectionandestimationconsis- tency for high-dimensional Bayesian DAG models.The Annals of Statistics, 47(1):319– 348. Caron, A., Lia...

  2. [2]

    banned” score table tabulating all of the valid scores that arecompatiblewithparticular“banned

    We will do so now, showing thatγis well-defined for any 2 random edges outside of EH,e 1, e2. We need to show that b(θ(H+e1 ,≺), θe2) D(θ(H+e1 ,≺), θe2) b(θ(H,≺), θe1) D(θ(H,≺), θe1) = b(θ(H+e2 ,≺), θe1) D(θ(H+e2 ,≺), θe1) b(θ(H,≺), θe2) D(θ(H,≺), θe2) (32) To prove 32, we define the following 4 disjoint sets of graphs: G0 :={G:G∈ G H} G1 :={G:e 1 ∈G, G∈ ...

  3. [3]

    banned” score table takesO(2Kp)to compute. Creating the plus “banned

    score tables takeO(K3p2K)to compute, BGe plus score tables takeO(K3p2K(p− K−1))tocompute(byrepeatingthescoretableprocessforeachofthe(p− |pa H(i)| −1) excluded nodes fromH, and the “banned” score table takesO(2Kp)to compute. Creating the plus “banned” score table is alsoO(2Kp), as the row-matching process to create theBi values illustrated in table 1 can b...

  4. [4]

    A single-block cluster withmin(0.1,4 p)connection probability

  5. [5]

    A two-block cluster with(1 3 , 2 3)probability of assignment and the same connection probability matrix as the SBM in (38)

  6. [6]

    This hierarchical design yields graphs with both community and anti-community structure

    Athree-blockclusterwith( 1 6 , 1 3 , 1 2)probabilityofassignmentandaconnectionprob- ability matrix of   min(0.2, 8 p) min(0.01, 1 p) min(0.2, 8 p) min(0.01, 1 p) 0 min(0.08, 4 p) min(0.2, 8 p) min(0.08, 4 p) min(0.08, 4 p)   The third cluster introduces more complex relationships both within and between blocks; while blocks 1 and 3 within this setup e...