Testing properties of trees in graphical models with covariance queries

Francisco Calvillo; G\'abor Lugosi; Piotr Zwiernik; Sofiya Burova

arxiv: 2605.15996 · v1 · pith:OLZKI52Bnew · submitted 2026-05-15 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Testing properties of trees in graphical models with covariance queries

Sofiya Burova , Francisco Calvillo , G\'abor Lugosi , Piotr Zwiernik This is my paper

Pith reviewed 2026-05-19 19:12 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords graphical modelstree structurecovariance queriesproperty testingquery complexityhigh-dimensional statisticsstructural propertiesrandomized algorithms

0 comments

The pith

Global structural properties of trees in graphical models can be tested with sub-quadratic covariance queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper considers testing properties of tree graphs that underlie high-dimensional graphical models when only covariance queries are available. It establishes that full reconstruction of the tree is costly in query terms, yet randomized tests can efficiently verify several global properties using a sub-quadratic number of queries. Explicit procedures and bounds are given for testing the number of leaves, the maximum degree, the typical distance, and the diameter, with the complexity depending on chosen thresholds and error tolerances. A sympathetic reader would care because this separates the cost of learning the full structure from the cost of checking whether the structure satisfies basic global conditions that often matter in applications.

Core claim

The main results show that while reconstructing the entire tree may be costly, certain global structural properties can be tested efficiently. In particular, randomized tests for global structural properties use a sub-quadratic number of queries. Testing procedures are developed for the number of leaves, the maximum degree, the typical distance, and the diameter of the tree, each with explicit query complexity bounds that depend on the target threshold and tolerance parameters.

What carries the argument

The covariance query model that returns the covariance between any chosen pair of observed variables, which is used to design randomized tests that decide whether a tree property meets a given threshold.

If this is right

The number of leaves can be tested to within a given additive tolerance using fewer than quadratic queries.
The maximum degree of the tree admits a sub-quadratic query test with explicit dependence on the degree threshold.
Typical distance and diameter each admit sub-quadratic randomized tests controlled by tolerance parameters.
Property testing remains cheaper than full tree reconstruction whenever only these global features are needed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same query model might support efficient tests for other global invariants such as the number of internal nodes once similar concentration arguments are developed.
In high-dimensional data pipelines, these tests could serve as cheap sanity checks before committing to expensive structure-learning algorithms.
The separation between reconstruction cost and testing cost suggests analogous savings may exist for other sparse graphical models beyond trees.

Load-bearing premise

The underlying graph is exactly a tree and the covariance query model applies directly to the testing procedures.

What would settle it

A family of trees where every algorithm that correctly tests one of the listed properties (number of leaves, maximum degree, typical distance, or diameter) requires quadratic or more covariance queries in the worst case would falsify the sub-quadratic claim.

read the original abstract

We consider the problem of testing properties of graphs underlying high-dimensional graphical models. We adopt the model of covariance queries introduced by Lugosi, Truszkowski, Velona, and Zwiernik (2021). We study the case when the underlying graph is a tree. The main results of the paper show that, while reconstructing the entire tree may be costly, certain global structural properties can be tested efficiently. In particular, we design randomized tests for global structural properties that use a sub-quadratic number of queries. We develop testing procedures for several fundamental properties, including the number of leaves, the maximum degree, the typical distance, and the diameter of the tree. For each property, we obtain explicit query complexity bounds that depend on the target threshold and tolerance parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows you can test global tree properties like leaf count, max degree, distance, and diameter with sub-quadratic covariance queries instead of full reconstruction.

read the letter

The key point here is that full tree reconstruction under the covariance query model is expensive, but you can test several global properties with noticeably fewer queries. The authors give randomized procedures for the number of leaves, maximum degree, typical distance, and diameter, each with explicit bounds that scale sub-quadratically in the number of nodes and depend on the threshold and tolerance you set. That is the concrete advance over the 2021 reconstruction work they cite. The approach stays within the covariance oracle and the exact-tree assumption, sampling pairs to extract path-length statistics without trying to recover every edge. The stress-test note confirms the argument is internally consistent and does not rely on extra hidden conditions like rapid correlation decay. The main limitation is the restriction to trees; if the underlying graph has even a few extra edges the guarantees do not apply, and the paper does not discuss how sensitive the tests are to small violations of the tree assumption. The query bounds also grow with the tolerance parameters, so the practical savings depend on how loose an answer the user is willing to accept. This work is aimed at researchers who already use covariance queries for high-dimensional graphical models and want to check structural features without paying for complete recovery. A reader interested in property testing or network inference will find the explicit complexity statements useful. The paper is coherent on its own terms and supplies the kind of concrete algorithmic result that deserves referee time, even if the proofs need careful checking in review.

Referee Report

2 major / 3 minor

Summary. The paper studies testing global structural properties of tree-structured graphical models under the covariance query model of Lugosi et al. (2021). While full tree reconstruction can be query-expensive, the authors design randomized testers for the number of leaves, maximum degree, typical distance, and diameter that succeed with a sub-quadratic number of queries. Explicit query-complexity bounds are stated in terms of the target threshold and additive tolerance parameters.

Significance. If the stated bounds and correctness proofs hold, the work is significant for separating the query cost of global property testing from full reconstruction in tree graphical models. The explicit dependence on threshold and tolerance, together with the focus on fundamental tree invariants, supplies concrete, falsifiable complexity statements that can guide both theory and algorithm design in high-dimensional covariance-based inference.

major comments (2)

[§3] §3 (number-of-leaves tester): the sub-quadratic bound is derived by sampling pairs and thresholding their covariances to estimate the fraction of leaves; the variance analysis must explicitly track dependence on the minimum edge covariance (or correlation strength) to ensure the claimed O(n^{3/2} poly(log n, 1/ε)) scaling remains uniform over all valid tree models.
[§5] §5 (diameter tester): the reduction from diameter estimation to typical-distance queries via random sampling of root-to-leaf paths assumes that the covariance oracle returns exact values; any additive noise model or finite-precision arithmetic would inflate the stated query bound by an extra 1/δ factor that is not currently accounted for in the tolerance analysis.

minor comments (3)

[Abstract] The abstract and introduction should state the precise polynomial dependence on n (number of nodes) rather than only saying “sub-quadratic.”
[§2] Notation for the covariance query oracle (Definition 2.1) should be cross-referenced each time a new tester is introduced so that the reader can immediately see which pairs are queried.
[§6] A short table summarizing the four query bounds side-by-side (with their dependence on threshold τ and tolerance ε) would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment of significance, and recommendation for minor revision. We address each major comment below.

read point-by-point responses

Referee: [§3] §3 (number-of-leaves tester): the sub-quadratic bound is derived by sampling pairs and thresholding their covariances to estimate the fraction of leaves; the variance analysis must explicitly track dependence on the minimum edge covariance (or correlation strength) to ensure the claimed O(n^{3/2} poly(log n, 1/ε)) scaling remains uniform over all valid tree models.

Authors: We thank the referee for highlighting this point. The variance of the empirical fraction of leaf pairs indeed depends on the minimum edge covariance ρ_min, since covariances between non-adjacent leaves are bounded by ρ_min raised to the path length. Our current proof absorbs this into the logarithmic factors under the implicit assumption that ρ_min is bounded below by a positive constant (standard to avoid degenerate models). To make the dependence explicit and ensure the bound holds uniformly, we will revise Theorem 3.1 and its proof to include an explicit multiplicative factor of O(1/ρ_min²) in the query complexity. This change preserves the sub-quadratic scaling for any fixed ρ_min > 0 while addressing the uniformity concern. revision: yes
Referee: [§5] §5 (diameter tester): the reduction from diameter estimation to typical-distance queries via random sampling of root-to-leaf paths assumes that the covariance oracle returns exact values; any additive noise model or finite-precision arithmetic would inflate the stated query bound by an extra 1/δ factor that is not currently accounted for in the tolerance analysis.

Authors: The referee is correct that the analysis relies on the exact covariance oracle model introduced by Lugosi et al. (2021). Under this model, the oracle returns precise values, so no additional 1/δ factor appears. We agree that an additive δ-noise model would require an extra O(1/δ²) factor in the sample size to control the error in typical-distance estimates. Since the manuscript is situated in the exact-query setting, we will add a short remark in Section 5 noting this modeling assumption and the implication for noisy extensions, without altering the stated bounds. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper adopts the covariance query model as an external assumption from the 2021 Lugosi et al. reference and restricts to the exact tree case to derive randomized testers for properties such as number of leaves, maximum degree, typical distance, and diameter. These testers are constructed via sampling of covariance pairs to extract path-length statistics, yielding explicit sub-quadratic query bounds that depend directly on the stated threshold and tolerance parameters. No step equates a derived quantity to a fitted input by construction, invokes a self-citation as the sole justification for a uniqueness claim, or renames an empirical pattern as a new result. The central claims follow from the model definitions and sampling arguments without reducing to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the domain assumption that the graph is a tree and on the covariance query model from prior work; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption The underlying graph is a tree.
Explicitly stated as the case studied in the abstract.
domain assumption Covariance queries follow the model from Lugosi et al. (2021).
The paper adopts this model for all testing procedures.

pith-pipeline@v0.9.0 · 5670 in / 1183 out tokens · 33381 ms · 2026-05-19T19:12:53.204897+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We design randomized tests for global structural properties that use a sub-quadratic number of queries... testing procedures for the number of leaves, the maximum degree, the typical distance, and the diameter of the tree.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Boucheron, G

St ´ephane Boucheron, G ´abor Lugosi, and Pascal Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013. ISBN 9780199535255. doi: 10.1093/acprof:oso/9780199535255.001.0001

work page doi:10.1093/acprof:oso/9780199535255.001.0001 2013
[2]

The recovery of trees from measures of dissimilarity.Mathematics in the archae- ological and historical sciences, 1971

Peter Buneman. The recovery of trees from measures of dissimilarity.Mathematics in the archae- ological and historical sciences, 1971

work page 1971
[3]

Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation.The Annals of Statistics, 44(2):455–488, 2016

T Tony Cai, Weidong Liu, Harrison H Zhou, et al. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation.The Annals of Statistics, 44(2):455–488, 2016

work page 2016
[4]

Active learning algo- rithms for graphical model selection

Gautamd Dasarathy, Aarti Singh, Maria-Florina Balcan, and Jong H Park. Active learning algo- rithms for graphical model selection. InArtificial Intelligence and Statistics, pages 1356–1364, 2016

work page 2016
[5]

Property testing in graphical models: testing small separation numbers.Information and Inference: A Journal of the IMA, 2026, to appear

Luc Devroye, G ´abor Lugosi, and Piotr Zwiernik. Property testing in graphical models: testing small separation numbers.Information and Inference: A Journal of the IMA, 2026, to appear

work page 2026
[6]

Learning latent tree models with small query complexity.Bernoulli, 2026, to appear

Luc Devroye, G ´abor Lugosi, and Piotr Zwiernik. Learning latent tree models with small query complexity.Bernoulli, 2026, to appear

work page 2026
[7]

Structure learning in graphical modeling.Annual Review of Statistics and Its Application, 4:365–393, 2017

Mathias Drton and Marloes H Maathuis. Structure learning in graphical modeling.Annual Review of Statistics and Its Application, 4:365–393, 2017

work page 2017
[8]

Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

work page 1963
[9]

Reconstructing evolutionary trees from dna and protein sequences: paralinear distances.Proceedings of the National Academy of Sciences, 91(4):1455–1459, 1994

James A Lake. Reconstructing evolutionary trees from dna and protein sequences: paralinear distances.Proceedings of the National Academy of Sciences, 91(4):1455–1459, 1994

work page 1994
[10]

Clarendon Press, 1996

Steffen L Lauritzen.Graphical models, volume 17. Clarendon Press, 1996

work page 1996
[11]

Learning partial corre- lation graphs and graphical models by covariance queries.Journal of Machine Learning Research, 22(203):1–41, 2021

G ´abor Lugosi, Jakub Truszkowski, Vasiliki Velona, and Piotr Zwiernik. Learning partial corre- lation graphs and graphical models by covariance queries.Journal of Machine Learning Research, 22(203):1–41, 2021

work page 2021
[12]

Combinatorial inference for graphical models.The Annals of Statistics, 47(2):795–827, 2019

Matey Neykov, Junwei Lu, and Han Liu. Combinatorial inference for graphical models.The Annals of Statistics, 47(2):795–827, 2019

work page 2019
[13]

Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988

Judea Pearl.Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988. ISBN 1558604790

work page 1988
[14]

Oxford University Press on Demand, 2003

Charles Semple and Mike Steel.Phylogenetics, volume 24. Oxford University Press on Demand, 2003

work page 2003
[15]

Hierarchical latent class models for cluster analysis.Journal of Machine Learning Research, 5(6):697–723, 2004

Nevin L Zhang. Hierarchical latent class models for cluster analysis.Journal of Machine Learning Research, 5(6):697–723, 2004

work page 2004
[16]

Latent tree models

Piotr Zwiernik. Latent tree models. InHandbook of graphical models, pages 265–288. CRC Press, 2018. 19

work page 2018

[1] [1]

Boucheron, G

St ´ephane Boucheron, G ´abor Lugosi, and Pascal Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013. ISBN 9780199535255. doi: 10.1093/acprof:oso/9780199535255.001.0001

work page doi:10.1093/acprof:oso/9780199535255.001.0001 2013

[2] [2]

The recovery of trees from measures of dissimilarity.Mathematics in the archae- ological and historical sciences, 1971

Peter Buneman. The recovery of trees from measures of dissimilarity.Mathematics in the archae- ological and historical sciences, 1971

work page 1971

[3] [3]

Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation.The Annals of Statistics, 44(2):455–488, 2016

T Tony Cai, Weidong Liu, Harrison H Zhou, et al. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation.The Annals of Statistics, 44(2):455–488, 2016

work page 2016

[4] [4]

Active learning algo- rithms for graphical model selection

Gautamd Dasarathy, Aarti Singh, Maria-Florina Balcan, and Jong H Park. Active learning algo- rithms for graphical model selection. InArtificial Intelligence and Statistics, pages 1356–1364, 2016

work page 2016

[5] [5]

Property testing in graphical models: testing small separation numbers.Information and Inference: A Journal of the IMA, 2026, to appear

Luc Devroye, G ´abor Lugosi, and Piotr Zwiernik. Property testing in graphical models: testing small separation numbers.Information and Inference: A Journal of the IMA, 2026, to appear

work page 2026

[6] [6]

Learning latent tree models with small query complexity.Bernoulli, 2026, to appear

Luc Devroye, G ´abor Lugosi, and Piotr Zwiernik. Learning latent tree models with small query complexity.Bernoulli, 2026, to appear

work page 2026

[7] [7]

Structure learning in graphical modeling.Annual Review of Statistics and Its Application, 4:365–393, 2017

Mathias Drton and Marloes H Maathuis. Structure learning in graphical modeling.Annual Review of Statistics and Its Application, 4:365–393, 2017

work page 2017

[8] [8]

Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

work page 1963

[9] [9]

Reconstructing evolutionary trees from dna and protein sequences: paralinear distances.Proceedings of the National Academy of Sciences, 91(4):1455–1459, 1994

James A Lake. Reconstructing evolutionary trees from dna and protein sequences: paralinear distances.Proceedings of the National Academy of Sciences, 91(4):1455–1459, 1994

work page 1994

[10] [10]

Clarendon Press, 1996

Steffen L Lauritzen.Graphical models, volume 17. Clarendon Press, 1996

work page 1996

[11] [11]

Learning partial corre- lation graphs and graphical models by covariance queries.Journal of Machine Learning Research, 22(203):1–41, 2021

G ´abor Lugosi, Jakub Truszkowski, Vasiliki Velona, and Piotr Zwiernik. Learning partial corre- lation graphs and graphical models by covariance queries.Journal of Machine Learning Research, 22(203):1–41, 2021

work page 2021

[12] [12]

Combinatorial inference for graphical models.The Annals of Statistics, 47(2):795–827, 2019

Matey Neykov, Junwei Lu, and Han Liu. Combinatorial inference for graphical models.The Annals of Statistics, 47(2):795–827, 2019

work page 2019

[13] [13]

Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988

Judea Pearl.Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988. ISBN 1558604790

work page 1988

[14] [14]

Oxford University Press on Demand, 2003

Charles Semple and Mike Steel.Phylogenetics, volume 24. Oxford University Press on Demand, 2003

work page 2003

[15] [15]

Hierarchical latent class models for cluster analysis.Journal of Machine Learning Research, 5(6):697–723, 2004

Nevin L Zhang. Hierarchical latent class models for cluster analysis.Journal of Machine Learning Research, 5(6):697–723, 2004

work page 2004

[16] [16]

Latent tree models

Piotr Zwiernik. Latent tree models. InHandbook of graphical models, pages 265–288. CRC Press, 2018. 19

work page 2018