Structure Learning for Directed Trees with Zero-Inflated Compositional Nodes

Bani K. Mallick; Shuangjie Zhang; Yang Ni

arxiv: 2605.03178 · v2 · submitted 2026-05-04 · 📊 stat.ME

Structure Learning for Directed Trees with Zero-Inflated Compositional Nodes

Shuangjie Zhang , Bani K. Mallick , Yang Ni This is my paper

Pith reviewed 2026-05-08 17:34 UTC · model grok-4.3

classification 📊 stat.ME

keywords structure learningdirected treescompositional datazero-inflatedKullback-Leibler divergencetransition matrixmicrobiomeconsistency

0 comments

The pith

Directed trees over compositional nodes are identifiable and consistently recoverable from data using a KL-scored mixture model with column-stochastic transitions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Compositional data consist of proportion vectors that live on the probability simplex and appear in microbiome abundances and cell-type mixtures. The paper builds a directed-tree model in which each child's composition is expressed as a mixture of a baseline and a parent-influenced term driven by a column-stochastic transition matrix; the mixture respects the simplex constraint and accommodates zeros. A non-degeneracy condition on those matrices makes edge directions identifiable from observational samples alone. The resulting penalized Kullback-Leibler estimator is shown to recover the exact tree structure with high probability once the sample size exceeds an explicit bound that depends on the signal gap, dimension, and penalty level.

Core claim

The paper establishes that, under a non-degeneracy condition on the transition matrices, the directed tree structure among zero-inflated compositional nodes is identifiable from observational data; a scoring function based on Kullback-Leibler divergence combined with a suitable penalty yields a consistent estimator whose finite-sample sample-size requirement is characterized explicitly in terms of the minimum signal gap, node dimension, and penalty strength.

What carries the argument

The column-stochastic transition matrix that parameterizes the parent-driven component inside the mixture model for the conditional expectation of each child composition.

If this is right

The recovered directed tree supplies an interpretable ordering of compositional nodes that aligns with known biological mechanisms in microbiome and single-cell applications.
Sample-size requirements scale explicitly with signal gap, dimension, and penalty, giving practitioners a concrete guide for experimental design.
Zero inflation is handled without ad-hoc imputation because the mixture formulation naturally produces zero entries.
The same identifiability argument shows that observational data suffice to orient edges, removing the need for interventional experiments in this setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The tree restriction could be relaxed to DAGs if the identifiability proof is extended to graphs with multiple parents while preserving the simplex geometry.
Because the transition matrices are column-stochastic, the method may supply a natural bridge to causal inference frameworks that already use stochastic matrices for compositional outcomes.
The finite-sample bounds suggest that the approach remains practical for moderately high-dimensional nodes provided the signal gap is not too small.
Applications beyond microbiome and single-cell data, such as topic-model proportions or asset-allocation weights, become feasible once the same scoring and penalty are adopted.

Load-bearing premise

The non-degeneracy condition on the transition matrix is required for edge directions to be identifiable from data alone; if it fails, directions cannot be recovered and the consistency guarantee collapses.

What would settle it

A simulation or real dataset in which the transition matrices satisfy the modeling assumptions yet the estimator returns a tree whose edge directions differ from the known ground-truth directions, even when the sample size exceeds the paper's stated finite-sample bound.

Figures

Figures reproduced from arXiv: 2605.03178 by Bani K. Mallick, Shuangjie Zhang, Yang Ni.

**Figure 1.** Figure 1: The learned tree structure for the MOMS-PI microbiome data. The model view at source ↗

**Figure 2.** Figure 2: Estimated transition matrices Mjk for the two selected cross-site microbiome links: (a) vagina to cervix and (b) rectum to feces. The matrices display the transition weights between bacterial genera in the parent (x-axis) and child (y-axis) communities. Each column of Mjk sums to 1, with color intensity (white to red) indicating increasing weight. 23 view at source ↗

read the original abstract

Compositional data, which are vectors of proportions constrained to the probability simplex, arise frequently in modern scientific applications, including microbiome relative abundances across body sites and cell-type mixture weights derived from single-cell genomics. While regression methods for compositional data are well developed, no existing graphical model framework addresses the problem of learning conditional dependence structures among multiple compositional vectors. This paper introduces a novel framework for directed tree structure learning over compositional nodes. We employ the Kullback-Leibler divergence as the scoring function and model the conditional expectation of each child composition as a mixture of a baseline composition and a parent-driven component parameterized by a column-stochastic transition matrix. This formulation respects the simplex geometry, handles zero-inflated compositions gracefully, and, combined with a non-degeneracy condition on the transition matrix, ensures identifiability of edge directions from observational data. We prove consistency of structure recovery and derive finite-sample guarantees that characterize the required sample size in terms of the signal gap, node dimension, and penalty level. The efficacy of our approach is demonstrated through simulations and applications to multi-site microbiome data and single-cell data, yielding interpretable directed structures that align with known biological mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first directed tree model for multiple zero-inflated compositional vectors via KL scoring and column-stochastic mixtures, but the consistency claims rest on a non-degeneracy condition that zero inflation can easily violate in practice.

read the letter

The paper introduces a directed tree structure learner for compositional nodes that can contain many zeros. It scores candidate trees with Kullback-Leibler divergence and expresses the conditional expectation of a child composition as a mixture of a baseline term and a parent-driven term given by a column-stochastic transition matrix. This keeps everything on the simplex and is intended to recover edge directions from observational data under a non-degeneracy assumption on the matrix. Consistency of the recovered tree and finite-sample bounds on the required sample size are stated as theorems in terms of signal gap, dimension, and penalty level. Simulations plus two real-data examples on multi-site microbiome and single-cell data are shown to produce structures that line up with known biology.

Referee Report

2 major / 2 minor

Summary. The paper proposes a novel framework for learning directed tree structures among multiple zero-inflated compositional nodes. It models each child's conditional expectation as a mixture of a baseline composition and a parent-driven term via a column-stochastic transition matrix, employs KL divergence as the scoring function for tree selection, establishes identifiability of edge directions under a non-degeneracy condition on the transition matrix, proves consistency of structure recovery together with finite-sample bounds on the required sample size in terms of signal gap, dimension, and penalty, and illustrates the method on simulations plus real microbiome and single-cell datasets.

Significance. If the consistency and finite-sample results hold, the work addresses an important gap: no prior graphical-model framework existed for learning directed conditional dependence structures among compositional vectors. The explicit handling of zero inflation, the simplex-respecting parameterization, and the provision of sample-size guarantees tied to observable quantities would make the method practically useful in microbiome and single-cell applications where such data are common.

major comments (2)

[§3, Theorem 1] §3 (Identifiability and Consistency), Theorem 1: The proof of consistent structure recovery and the finite-sample bound both invoke a non-degeneracy condition on the column-stochastic transition matrix A to guarantee identifiability of edge directions. Under zero inflation the observed supports become sparse; the paper does not show that the effective (data-dependent) matrix remains non-degenerate with high probability when the population A satisfies the condition, nor does it quantify how zero inflation shrinks the signal gap that appears in the sample-size bound.
[§2.2, Eq. (3)–(5)] §2.2 (Model Specification), Eq. (3)–(5): The conditional expectation is written as a convex combination of a baseline composition and a parent-driven term. It is not shown that this construction automatically maps back into the probability simplex when the observed child vector contains structural zeros; the subsequent KL scoring and the derivation of the finite-sample bound appear to treat the compositions as interior points.

minor comments (2)

[Abstract and §3] The notation for the penalty level and the signal gap is introduced in the abstract but first defined only in the theorem statement; a forward reference or early definition would improve readability.
[Simulation section] Simulation section: the reported recovery rates are given without standard errors across replications; adding variability measures would strengthen the empirical support for the finite-sample claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. We address each major comment point by point below, indicating the revisions we will make to strengthen the presentation.

read point-by-point responses

Referee: [§3, Theorem 1] §3 (Identifiability and Consistency), Theorem 1: The proof of consistent structure recovery and the finite-sample bound both invoke a non-degeneracy condition on the column-stochastic transition matrix A to guarantee identifiability of edge directions. Under zero inflation the observed supports become sparse; the paper does not show that the effective (data-dependent) matrix remains non-degenerate with high probability when the population A satisfies the condition, nor does it quantify how zero inflation shrinks the signal gap that appears in the sample-size bound.

Authors: We appreciate this observation regarding the interplay between zero inflation and the non-degeneracy condition. The identifiability result in Theorem 1 and the consistency proof are established at the population level under the stated non-degeneracy assumption on A. The finite-sample bound is expressed directly in terms of the signal gap (defined via the KL divergence between conditional distributions), which inherently reflects any shrinkage induced by zero inflation through the model parameters. While an explicit high-probability guarantee that the empirical transition matrix remains non-degenerate is not derived in the current version, the consistency theorem ensures convergence to the population quantities as n grows, and standard concentration arguments for multinomial or Dirichlet-multinomial data can be applied to control the deviation of the observed supports. In the revision we will add a remark after Theorem 1 clarifying that the bounds hold conditionally on the observed data satisfying the non-degeneracy condition with high probability for sufficiently large n, and we will make explicit how zero inflation enters the signal-gap term in the sample-size expression. revision: yes
Referee: [§2.2, Eq. (3)–(5)] §2.2 (Model Specification), Eq. (3)–(5): The conditional expectation is written as a convex combination of a baseline composition and a parent-driven term. It is not shown that this construction automatically maps back into the probability simplex when the observed child vector contains structural zeros; the subsequent KL scoring and the derivation of the finite-sample bound appear to treat the compositions as interior points.

Authors: The conditional expectation in Equations (3)–(5) is defined as a convex combination of the baseline composition (which lies in the simplex) and the image of the parent composition under the column-stochastic matrix A (which maps the simplex to itself). Consequently, the resulting vector is always a valid composition, including cases where it lies on the boundary of the simplex. Structural zeros appear in the observed realizations of the child node, but the conditional expectation itself remains a well-defined point in the simplex; it may have zero entries when the linear combination produces them. For the KL scoring function we employ the standard additive-smoothing convention (pseudo-counts) to ensure the divergence is well-defined when estimated probabilities contain zeros, consistent with common practice in compositional data analysis. The finite-sample bounds rely on bounded random variables and concentration inequalities that apply to distributions supported on the simplex without requiring strict interiority. We will insert a short clarifying paragraph in Section 2.2 stating the simplex-preservation property explicitly and describing the zero-handling convention used for the KL score and the subsequent analysis. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper models conditional expectations via a column-stochastic transition matrix and KL scoring, then states consistency of tree recovery under an explicit non-degeneracy assumption on the matrix for identifiability. This assumption is invoked as a premise rather than derived from the fitted model or data, and the finite-sample bounds are expressed directly in terms of the signal gap, dimension, and penalty without reducing to a tautological re-expression of the inputs. No self-citations are load-bearing for the central theorems, no ansatz is smuggled via prior work, and no fitted parameter is relabeled as a prediction. The derivation chain remains self-contained against the stated assumptions and does not collapse by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework depends on a parametric conditional model (mixture of baseline and parent-driven composition via column-stochastic matrix) whose parameters are estimated during structure search; identifiability is secured by an explicit non-degeneracy assumption rather than derived from first principles.

free parameters (1)

column-stochastic transition matrices
One matrix per potential edge; entries are estimated from data to define the parent-driven component of each child's conditional expectation.

axioms (1)

domain assumption non-degeneracy condition on the transition matrix
Invoked to guarantee that edge directions are identifiable from observational data alone.

pith-pipeline@v0.9.0 · 5507 in / 1373 out tokens · 57637 ms · 2026-05-08T17:34:27.085538+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost (J(x) = ½(x+x⁻¹) − 1) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We employ the Kullback-Leibler divergence as the scoring function and model the conditional expectation of each child composition as a mixture of a baseline composition and a parent-driven component parameterized by a column-stochastic transition matrix.
Foundation (zero-parameter forcing chain) reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sample complexity is thus O(γ⁻² log²(1/ϵ₀)(D_max log(D_max/γ) + log p))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

[1]

Journal of Applied Probability & Statistics , volume=

Modelling compositional data using Dirichlet regression models , author=. Journal of Applied Probability & Statistics , volume=

work page
[2]

Advances in Neural Information Processing Systems , volume=

Directed cyclic graph for causal discovery from multivariate functional data , author=. Advances in Neural Information Processing Systems , volume=

work page
[3]

Nature communications , volume=

The microbiota continuum along the female reproductive tract and its relation to uterine-related diseases , author=. Nature communications , volume=. 2017 , publisher=

work page 2017
[4]

Phylogenetically informed

Chung, Hee Cheol and Gaynanova, Irina and Ni, Yang , journal=. Phylogenetically informed. 2022 , publisher=

work page 2022
[5]

Joint microbial and metabolomic network estimation with the censored

Ma, Jing , journal=. Joint microbial and metabolomic network estimation with the censored. 2021 , publisher=

work page 2021
[6]

Biometrics , volume=

Bayesian compositional regression with structured priors for microbiome feature selection , author=. Biometrics , volume=. 2021 , publisher=

work page 2021
[7]

Koslovsky, Matthew D and Hoffman, Kristi L and Daniel, Carrie R and Vannucci, Marina , journal=. A. 2020 , publisher=

work page 2020
[8]

Journal of the Royal Statistical Society: Series B (Methodological) , volume=

The statistical analysis of compositional data , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1982 , publisher=

work page 1982
[9]

Biometrika , pages=

Log contrast models for experiments with mixtures , author=. Biometrika , pages=. 1984 , publisher=

work page 1984
[10]

Biometrika , volume=

Variable selection in regression with compositional covariates , author=. Biometrika , volume=. 2014 , publisher=

work page 2014
[11]

The Annals of Applied Statistics , volume=

Regression analysis for microbiome compositional data , author=. The Annals of Applied Statistics , volume=

work page
[12]

Biometrics , volume=

A transformation-free linear regression for compositional outcomes and predictors , author=. Biometrics , volume=. 2022 , publisher=

work page 2022
[13]

Frontiers in microbiology , volume=

Characterization of the gut microbiome using 16S or shotgun metagenomics , author=. Frontiers in microbiology , volume=. 2016 , publisher=

work page 2016
[14]

International Conference on Probabilistic Graphical Models , pages=

The functional lingam , author=. International Conference on Probabilistic Graphical Models , pages=. 2022 , organization=

work page 2022
[15]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Functional structural equation model , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=

work page 2022
[16]

Biometrical Journal , volume=

Overview of object oriented data analysis , author=. Biometrical Journal , volume=. 2014 , publisher=

work page 2014
[17]

Biometrics , volume=

Functional Bayesian networks for discovering causality from multivariate functional data , author=. Biometrics , volume=. 2023 , publisher=

work page 2023
[18]

2009 , publisher=

Probabilistic graphical models: principles and techniques , author=. 2009 , publisher=

work page 2009
[19]

Annual Review of Statistics and Its Application , volume=

Causal structure learning , author=. Annual Review of Statistics and Its Application , volume=. 2018 , publisher=

work page 2018
[20]

Advances in neural information processing systems , volume=

Dags with no tears: Continuous optimization for structure learning , author=. Advances in neural information processing systems , volume=

work page
[21]

Journal of Research of the national Bureau of Standards B , volume=

Optimum branchings , author=. Journal of Research of the national Bureau of Standards B , volume=

work page
[22]

Uncertainty in Artificial Intelligence , pages=

Properties of Bayesian belief network learning algorithms , author=. Uncertainty in Artificial Intelligence , pages=. 1994 , organization=

work page 1994
[23]

Journal of machine learning research , volume=

Optimal structure identification with greedy search , author=. Journal of machine learning research , volume=

work page
[24]

Machine learning , volume=

A Bayesian method for the induction of probabilistic networks from data , author=. Machine learning , volume=. 1992 , publisher=

work page 1992
[25]

Learning in graphical models , pages=

A tutorial on learning with Bayesian networks , author=. Learning in graphical models , pages=. 1998 , publisher=

work page 1998
[26]

arXiv preprint arXiv:1304.2736 , year=

The recovery of causal poly-trees from statistical data , author=. arXiv preprint arXiv:1304.2736 , year=

work page arXiv
[27]

, author=

Order-independent constraint-based causal structure learning. , author=. J. Mach. Learn. Res. , volume=

work page
[28]

Frontiers in genetics , volume=

Review of causal discovery methods based on graphical models , author=. Frontiers in genetics , volume=. 2019 , publisher=

work page 2019
[29]

Learning from data: Artificial intelligence and statistics V , pages=

Learning Bayesian networks is NP-complete , author=. Learning from data: Artificial intelligence and statistics V , pages=. 1996 , publisher=

work page 1996
[30]

Innovations in Machine Learning: Theory and Applications , pages=

A Bayesian approach to causal discovery , author=. Innovations in Machine Learning: Theory and Applications , pages=. 2006 , publisher=

work page 2006
[31]

IEEE Transactions on Information Theory , volume=

Approximating discrete probability distributions with dependence trees , author=. IEEE Transactions on Information Theory , volume=

work page
[32]

2000 , publisher=

Causation, prediction, and search , author=. 2000 , publisher=

work page 2000
[33]

Journal of Machine Learning Research , volume=

Functional directed acyclic graphs , author=. Journal of Machine Learning Research , volume=

work page
[34]

Frontiers in microbiology , volume=

Microbiome datasets are compositional: and this is not optional , author=. Frontiers in microbiology , volume=. 2017 , publisher=

work page 2017
[35]

Bioinformatics , volume=

APE: analyses of phylogenetics and evolution in R language , author=. Bioinformatics , volume=. 2004 , publisher=

work page 2004
[36]

Proceedings of the 22nd international conference on Machine learning , pages=

Bayesian hierarchical clustering , author=. Proceedings of the 22nd international conference on Machine learning , pages=

work page
[37]

PLOS Computational Biology , publisher =

Inferring Correlation Networks from Genomic Survey Data , year =. PLOS Computational Biology , publisher =. doi:10.1371/journal.pcbi.1002687 , author =

work page doi:10.1371/journal.pcbi.1002687
[38]

2015 , publisher=

Modeling and analysis of compositional data , author=. 2015 , publisher=

work page 2015
[39]

, author=

A linear non-Gaussian acyclic model for causal discovery. , author=. Journal of Machine Learning Research , volume=

work page
[40]

Advances in neural information processing systems , volume=

Nonlinear causal discovery with additive noise models , author=. Advances in neural information processing systems , volume=

work page
[41]

The Journal of Machine Learning Research , volume=

Causal discovery with continuous additive noise models , author=. The Journal of Machine Learning Research , volume=. 2014 , publisher=

work page 2014
[42]

Journal of the American Statistical Association , volume=

Robust Bayesian inference via coarsening , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=

work page 2019
[43]

Journal of the American Statistical Association , volume=

Generalized Bayes quantification learning under dataset shift , author=. Journal of the American Statistical Association , volume=. 2022 , publisher=

work page 2022
[44]

Nature medicine , volume=

The vaginal microbiome and preterm birth , author=. Nature medicine , volume=. 2019 , publisher=

work page 2019
[45]

Science , volume=

Single-cell eQTL mapping identifies cell type--specific genetic control of autoimmune disease , author=. Science , volume=. 2022 , publisher=

work page 2022
[46]

Electronic Journal of Statistics , volume=

High-dimensional covariance estimation by minimizing _1 -penalized log-determinant divergence , author=. Electronic Journal of Statistics , volume=. 2011 , publisher=

work page 2011
[47]

Statistica sinica , pages=

An asymptotic theory for linear model selection , author=. Statistica sinica , pages=. 1997 , publisher=

work page 1997
[48]

The Annals of Statistics , volume=

_0 -penalized maximum likelihood for sparse directed acyclic graphs , author=. The Annals of Statistics , volume=

work page
[49]

2000 , publisher=

Asymptotic statistics , author=. 2000 , publisher=

work page 2000
[50]

1996 , publisher=

Weak Convergence and Empirical Processes , author=. 1996 , publisher=

work page 1996
[51]

The Annals of Statistics , volume=

Consistency of cross validation for comparing regression procedures , author=. The Annals of Statistics , volume=. 2007 , publisher=

work page 2007
[52]

Nature , volume=

The human microbiome project , author=. Nature , volume=. 2007 , publisher=

work page 2007
[53]

nature , volume=

A human gut microbial gene catalogue established by metagenomic sequencing , author=. nature , volume=. 2010 , publisher=

work page 2010
[54]

Structure, function and diversity of the healthy human microbiome , journal=

Human Microbiome Project Consortium , number=. Structure, function and diversity of the healthy human microbiome , journal=. 2012 , publisher=

work page 2012
[55]

2016 , publisher=

Janeway's immunobiology , author=. 2016 , publisher=

work page 2016
[56]

Nature , volume=

Two subsets of memory T lymphocytes with distinct homing potentials and effector functions , author=. Nature , volume=. 1999 , publisher=

work page 1999
[57]

Nature Reviews Immunology , volume=

Human memory T cells: generation, compartmentalization and homeostasis , author=. Nature Reviews Immunology , volume=. 2014 , publisher=

work page 2014
[58]

Biometrika , volume=

Bayesian clustering of high-dimensional data via latent repulsive mixtures , author=. Biometrika , volume=. 2025 , publisher=

work page 2025
[59]

Gut Microbes , volume=

Fecal samples and rectal swabs adequately reflect the human colonic luminal microbiota , author=. Gut Microbes , volume=. 2024 , publisher=

work page 2024
[60]

Cox, D. R. (1972). Regression models and life tables (with

work page 1972
[61]

Hastie, T., Tibshirani, R., and Friedman, J. (2001). The

work page 2001

[1] [1]

Journal of Applied Probability & Statistics , volume=

Modelling compositional data using Dirichlet regression models , author=. Journal of Applied Probability & Statistics , volume=

work page

[2] [2]

Advances in Neural Information Processing Systems , volume=

Directed cyclic graph for causal discovery from multivariate functional data , author=. Advances in Neural Information Processing Systems , volume=

work page

[3] [3]

Nature communications , volume=

The microbiota continuum along the female reproductive tract and its relation to uterine-related diseases , author=. Nature communications , volume=. 2017 , publisher=

work page 2017

[4] [4]

Phylogenetically informed

Chung, Hee Cheol and Gaynanova, Irina and Ni, Yang , journal=. Phylogenetically informed. 2022 , publisher=

work page 2022

[5] [5]

Joint microbial and metabolomic network estimation with the censored

Ma, Jing , journal=. Joint microbial and metabolomic network estimation with the censored. 2021 , publisher=

work page 2021

[6] [6]

Biometrics , volume=

Bayesian compositional regression with structured priors for microbiome feature selection , author=. Biometrics , volume=. 2021 , publisher=

work page 2021

[7] [7]

Koslovsky, Matthew D and Hoffman, Kristi L and Daniel, Carrie R and Vannucci, Marina , journal=. A. 2020 , publisher=

work page 2020

[8] [8]

Journal of the Royal Statistical Society: Series B (Methodological) , volume=

The statistical analysis of compositional data , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1982 , publisher=

work page 1982

[9] [9]

Biometrika , pages=

Log contrast models for experiments with mixtures , author=. Biometrika , pages=. 1984 , publisher=

work page 1984

[10] [10]

Biometrika , volume=

Variable selection in regression with compositional covariates , author=. Biometrika , volume=. 2014 , publisher=

work page 2014

[11] [11]

The Annals of Applied Statistics , volume=

Regression analysis for microbiome compositional data , author=. The Annals of Applied Statistics , volume=

work page

[12] [12]

Biometrics , volume=

A transformation-free linear regression for compositional outcomes and predictors , author=. Biometrics , volume=. 2022 , publisher=

work page 2022

[13] [13]

Frontiers in microbiology , volume=

Characterization of the gut microbiome using 16S or shotgun metagenomics , author=. Frontiers in microbiology , volume=. 2016 , publisher=

work page 2016

[14] [14]

International Conference on Probabilistic Graphical Models , pages=

The functional lingam , author=. International Conference on Probabilistic Graphical Models , pages=. 2022 , organization=

work page 2022

[15] [15]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Functional structural equation model , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=

work page 2022

[16] [16]

Biometrical Journal , volume=

Overview of object oriented data analysis , author=. Biometrical Journal , volume=. 2014 , publisher=

work page 2014

[17] [17]

Biometrics , volume=

Functional Bayesian networks for discovering causality from multivariate functional data , author=. Biometrics , volume=. 2023 , publisher=

work page 2023

[18] [18]

2009 , publisher=

Probabilistic graphical models: principles and techniques , author=. 2009 , publisher=

work page 2009

[19] [19]

Annual Review of Statistics and Its Application , volume=

Causal structure learning , author=. Annual Review of Statistics and Its Application , volume=. 2018 , publisher=

work page 2018

[20] [20]

Advances in neural information processing systems , volume=

Dags with no tears: Continuous optimization for structure learning , author=. Advances in neural information processing systems , volume=

work page

[21] [21]

Journal of Research of the national Bureau of Standards B , volume=

Optimum branchings , author=. Journal of Research of the national Bureau of Standards B , volume=

work page

[22] [22]

Uncertainty in Artificial Intelligence , pages=

Properties of Bayesian belief network learning algorithms , author=. Uncertainty in Artificial Intelligence , pages=. 1994 , organization=

work page 1994

[23] [23]

Journal of machine learning research , volume=

Optimal structure identification with greedy search , author=. Journal of machine learning research , volume=

work page

[24] [24]

Machine learning , volume=

A Bayesian method for the induction of probabilistic networks from data , author=. Machine learning , volume=. 1992 , publisher=

work page 1992

[25] [25]

Learning in graphical models , pages=

A tutorial on learning with Bayesian networks , author=. Learning in graphical models , pages=. 1998 , publisher=

work page 1998

[26] [26]

arXiv preprint arXiv:1304.2736 , year=

The recovery of causal poly-trees from statistical data , author=. arXiv preprint arXiv:1304.2736 , year=

work page arXiv

[27] [27]

, author=

Order-independent constraint-based causal structure learning. , author=. J. Mach. Learn. Res. , volume=

work page

[28] [28]

Frontiers in genetics , volume=

Review of causal discovery methods based on graphical models , author=. Frontiers in genetics , volume=. 2019 , publisher=

work page 2019

[29] [29]

Learning from data: Artificial intelligence and statistics V , pages=

Learning Bayesian networks is NP-complete , author=. Learning from data: Artificial intelligence and statistics V , pages=. 1996 , publisher=

work page 1996

[30] [30]

Innovations in Machine Learning: Theory and Applications , pages=

A Bayesian approach to causal discovery , author=. Innovations in Machine Learning: Theory and Applications , pages=. 2006 , publisher=

work page 2006

[31] [31]

IEEE Transactions on Information Theory , volume=

Approximating discrete probability distributions with dependence trees , author=. IEEE Transactions on Information Theory , volume=

work page

[32] [32]

2000 , publisher=

Causation, prediction, and search , author=. 2000 , publisher=

work page 2000

[33] [33]

Journal of Machine Learning Research , volume=

Functional directed acyclic graphs , author=. Journal of Machine Learning Research , volume=

work page

[34] [34]

Frontiers in microbiology , volume=

Microbiome datasets are compositional: and this is not optional , author=. Frontiers in microbiology , volume=. 2017 , publisher=

work page 2017

[35] [35]

Bioinformatics , volume=

APE: analyses of phylogenetics and evolution in R language , author=. Bioinformatics , volume=. 2004 , publisher=

work page 2004

[36] [36]

Proceedings of the 22nd international conference on Machine learning , pages=

Bayesian hierarchical clustering , author=. Proceedings of the 22nd international conference on Machine learning , pages=

work page

[37] [37]

PLOS Computational Biology , publisher =

Inferring Correlation Networks from Genomic Survey Data , year =. PLOS Computational Biology , publisher =. doi:10.1371/journal.pcbi.1002687 , author =

work page doi:10.1371/journal.pcbi.1002687

[38] [38]

2015 , publisher=

Modeling and analysis of compositional data , author=. 2015 , publisher=

work page 2015

[39] [39]

, author=

A linear non-Gaussian acyclic model for causal discovery. , author=. Journal of Machine Learning Research , volume=

work page

[40] [40]

Advances in neural information processing systems , volume=

Nonlinear causal discovery with additive noise models , author=. Advances in neural information processing systems , volume=

work page

[41] [41]

The Journal of Machine Learning Research , volume=

Causal discovery with continuous additive noise models , author=. The Journal of Machine Learning Research , volume=. 2014 , publisher=

work page 2014

[42] [42]

Journal of the American Statistical Association , volume=

Robust Bayesian inference via coarsening , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=

work page 2019

[43] [43]

Journal of the American Statistical Association , volume=

Generalized Bayes quantification learning under dataset shift , author=. Journal of the American Statistical Association , volume=. 2022 , publisher=

work page 2022

[44] [44]

Nature medicine , volume=

The vaginal microbiome and preterm birth , author=. Nature medicine , volume=. 2019 , publisher=

work page 2019

[45] [45]

Science , volume=

Single-cell eQTL mapping identifies cell type--specific genetic control of autoimmune disease , author=. Science , volume=. 2022 , publisher=

work page 2022

[46] [46]

Electronic Journal of Statistics , volume=

High-dimensional covariance estimation by minimizing _1 -penalized log-determinant divergence , author=. Electronic Journal of Statistics , volume=. 2011 , publisher=

work page 2011

[47] [47]

Statistica sinica , pages=

An asymptotic theory for linear model selection , author=. Statistica sinica , pages=. 1997 , publisher=

work page 1997

[48] [48]

The Annals of Statistics , volume=

_0 -penalized maximum likelihood for sparse directed acyclic graphs , author=. The Annals of Statistics , volume=

work page

[49] [49]

2000 , publisher=

Asymptotic statistics , author=. 2000 , publisher=

work page 2000

[50] [50]

1996 , publisher=

Weak Convergence and Empirical Processes , author=. 1996 , publisher=

work page 1996

[51] [51]

The Annals of Statistics , volume=

Consistency of cross validation for comparing regression procedures , author=. The Annals of Statistics , volume=. 2007 , publisher=

work page 2007

[52] [52]

Nature , volume=

The human microbiome project , author=. Nature , volume=. 2007 , publisher=

work page 2007

[53] [53]

nature , volume=

A human gut microbial gene catalogue established by metagenomic sequencing , author=. nature , volume=. 2010 , publisher=

work page 2010

[54] [54]

Structure, function and diversity of the healthy human microbiome , journal=

Human Microbiome Project Consortium , number=. Structure, function and diversity of the healthy human microbiome , journal=. 2012 , publisher=

work page 2012

[55] [55]

2016 , publisher=

Janeway's immunobiology , author=. 2016 , publisher=

work page 2016

[56] [56]

Nature , volume=

Two subsets of memory T lymphocytes with distinct homing potentials and effector functions , author=. Nature , volume=. 1999 , publisher=

work page 1999

[57] [57]

Nature Reviews Immunology , volume=

Human memory T cells: generation, compartmentalization and homeostasis , author=. Nature Reviews Immunology , volume=. 2014 , publisher=

work page 2014

[58] [58]

Biometrika , volume=

Bayesian clustering of high-dimensional data via latent repulsive mixtures , author=. Biometrika , volume=. 2025 , publisher=

work page 2025

[59] [59]

Gut Microbes , volume=

Fecal samples and rectal swabs adequately reflect the human colonic luminal microbiota , author=. Gut Microbes , volume=. 2024 , publisher=

work page 2024

[60] [60]

Cox, D. R. (1972). Regression models and life tables (with

work page 1972

[61] [61]

Hastie, T., Tibshirani, R., and Friedman, J. (2001). The

work page 2001