Revealing the building blocks of tree balance: fundamental units of the Sackin and Colless Indices
Pith reviewed 2026-05-18 19:03 UTC · model grok-4.3
The pith
Sackin and Colless indices decompose into elementary components that each qualify as independent tree imbalance measures.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Sackin and Colless indices are compound in nature and can be decomposed into more elementary components that independently satisfy the defining properties of a tree (im)balance index. The difference Colless minus Sackin results in another imbalance index minimized by all Colless minimal trees, while Sackin minus Colless forms a balance index. The building blocks are compared to these indices and to the stairs2 index, and the echelon tree is investigated with a first non-recursive algorithm for its construction.
What carries the argument
The elementary components obtained by decomposing the Sackin and Colless indices, each functioning as a standalone tree (im)balance index.
If this is right
- The difference Colless minus Sackin yields a valid imbalance index minimized on Colless-minimal trees.
- The difference Sackin minus Colless yields a valid balance index.
- The building blocks provide a basis for comparing and explaining disagreements between established indices such as stairs2.
- A non-recursive algorithm constructs the echelon tree, a shape central to multiple imbalance indices.
Where Pith is reading between the lines
- These components could be recombined to generate new indices that isolate particular aspects of tree shape.
- The decomposition approach may extend to other established indices to reveal similar modular structure.
- Using the building blocks separately could reduce computational cost when scoring large sets of phylogenetic trees.
Load-bearing premise
The derived elementary components each satisfy every required property of a tree imbalance index without additional constraints or interactions that would break their independence.
What would settle it
Direct computation of the proposed elementary components on a collection of small trees to verify that each component alone produces scores satisfying all mathematical properties of an imbalance index.
Figures
read the original abstract
(Im)balance indices can be used to quantify the (im)balance of trees by assigning numerical scores to them. An easy way to generate a new index is to construct a compound index, e.g., a linear combination of established indices. Two of the most prominent and widely used imbalance indices are the Sackin index and the Colless index. In this study, we show that these classic indices are themselves compound in nature: they can be decomposed into more elementary components that independently satisfy the defining properties of a tree (im)balance index. We further show that the difference Colless minus Sackin results in another imbalance index that is minimized (amongst others) by all Colless minimal trees. Conversely, the difference Sackin minus Colless forms a balance index. Finally, we compare the building blocks of which the Sackin and the Colless indices consist to these indices as well as to the stairs2 index, which is another index from the literature. Our results suggest that the elementary building blocks we identify are not only foundational to established indices but also valuable tools for analyzing disagreement among indices when comparing the balance of different trees. Along the way, we investigate the so-called echelon tree, which plays an important role for several (im)balance indices, and present the first non-recursive algorithm to construct it.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the Sackin and Colless indices can be decomposed into elementary components (local contributions) that each independently satisfy the defining properties of a tree imbalance index. It further shows that Colless minus Sackin yields a valid imbalance index minimized by Colless-minimal trees (including the echelon tree), while Sackin minus Colless forms a balance index; these building blocks are compared to the stairs2 index, and a non-recursive algorithm is given for constructing the echelon tree.
Significance. If the central decompositions hold with independent satisfaction of the axioms, the work supplies a finer-grained view of two foundational indices, potentially enabling new compound indices and clarifying sources of disagreement among balance measures. The non-recursive echelon-tree algorithm is a concrete computational contribution that strengthens the manuscript.
major comments (1)
- [Decomposition of Sackin and Colless (main technical sections following the definitions)] The claim that each elementary component independently satisfies the full set of defining properties (non-negativity on all trees, zero exclusively on balanced trees, and appropriate monotonicity under imbalance-increasing moves) is load-bearing for the central result but is not yet shown to hold without possible cancellations. The decomposition into sums of local terms does not automatically guarantee independence if the axioms include global constraints; explicit verification or counter-example checks for isolated components are needed (see the decomposition of Sackin/Colless and the subsequent difference-index claims).
minor comments (1)
- [Echelon tree construction] The non-recursive algorithm for the echelon tree would be clearer with pseudocode or a small worked example on a 5- or 6-taxon tree.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable comments. We appreciate the positive assessment of the significance of our work and the recommendation for major revision. We will revise the manuscript to address the concerns raised regarding the independent satisfaction of the axioms by the elementary components.
read point-by-point responses
-
Referee: [Decomposition of Sackin and Colless (main technical sections following the definitions)] The claim that each elementary component independently satisfies the full set of defining properties (non-negativity on all trees, zero exclusively on balanced trees, and appropriate monotonicity under imbalance-increasing moves) is load-bearing for the central result but is not yet shown to hold without possible cancellations. The decomposition into sums of local terms does not automatically guarantee independence if the axioms include global constraints; explicit verification or counter-example checks for isolated components are needed (see the decomposition of Sackin/Colless and the subsequent difference-index claims).
Authors: We agree with the referee that explicit verification is required to establish that each elementary component satisfies the defining properties independently. Although the sums yield valid indices, we recognize that this does not preclude the need for direct checks on the components to ensure no hidden cancellations or global dependencies. In the revised manuscript, we will add explicit proofs and/or counter-example verifications demonstrating that each local building block is non-negative, vanishes only on balanced trees, and is monotonic with respect to imbalance-increasing moves. We will extend this verification to the difference indices Colless minus Sackin and Sackin minus Colless as well. These additions will be placed in the relevant technical sections following the definitions. revision: yes
Circularity Check
Decomposition relies on prior index properties; no reduction to self-referential inputs
full rationale
The manuscript decomposes the Sackin and Colless indices into sums of local elementary contributions and verifies that each component satisfies the standard axioms for an imbalance index (non-negativity, zero only on balanced trees, monotonicity under certain rearrangements). These axioms are drawn from the existing literature on tree balance indices rather than being redefined inside the paper. No equation in the derivation equates a derived quantity to a fitted parameter or renames an input as an output. Self-citations appear when recalling the definitions of the classic indices and the echelon tree, but they serve only as background; the new algorithmic construction of the echelon tree and the explicit component-wise verification are performed directly on the trees. The central result therefore remains independent of any self-citation chain and does not collapse to a tautology or statistical fit.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A quantity qualifies as a tree imbalance index if it satisfies a set of defining properties (such as increasing with imbalance).
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
S(T) = Na(T) + Nb(T) and C(T) = Na(T) − Nb(T); Na is an imbalance index, Nb a balance index (Theorem 1, Theorem 6)
-
IndisputableMonolith/Foundation/DimensionForcing.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Echelon tree T_be_n defined via binary expansion and fully-balanced blocks of size 2^k
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Blum, M. G. B. ; François, O.: On statistical tests of phylogenetic tree imbalance: the Sackin and other indices revisited. In: Mathematical Biosciences 195 (2005), Nr. 2, S. 141–153. http: //dx.doi.org/10.1016/j.mbs.2005.03.003. – DOI 10.1016/j.mbs.2005.03.003
-
[2]
Blum, M. G. B. ; François, O. ; Janson, S.: The mean, variance and limiting distribu- tion of two statistics sensitive to phylogenetic tree balance. In: Annals of Applied Probabil- ity 16 (2006), Nr. 4, S. 2195–2214. http://dx.doi.org/10.1214/105051606000000547. – DOI 10.1214/105051606000000547
-
[3]
Cardona, Gabriel ; Mir, Arnau ; Rosselló, Francesc: Exact formulas for the variance of several balance indices under the Yule model. In: J. Math. Biol. 67 (2013), Dezember, Nr. 6-7, S. 1833–1846
work page 2013
-
[4]
https://arxiv.org/abs/2502.12854
Cleary, Sean ; Fischer, Mareike ; St.John, Katherine: The GFB tree and tree imbalance indices . https://arxiv.org/abs/2502.12854. Version: 2025
- [5]
-
[6]
Phylogenetics: The theory and practice of phylogenetic systematics
Colless, D.: Review of “Phylogenetics: The theory and practice of phylogenetic systematics” . In: Systematic Zoology 31 (1982), Nr. 1, S. 100–104
work page 1982
-
[7]
Coronado, T.M. ; Fischer, M. ; Herbst, L. et a.: On the minimum value of the Colless index and the bifurcating trees that achieve it. In: Journal of Mathematical Biology 80 (2020). http: //dx.doi.org/10.1007/s00285-020-01488-9 . – DOI 10.1007/s00285–020–01488–9
-
[8]
Currie, Bryan ; Wicke, Kristina: On the maximum value of the stairs2 index. In: Adv. Appl. Math. 159 (2024), August, Nr. 102732, S. 102732 27
work page 2024
-
[9]
In: Annals of Combina- torics 25 (2021), Nr
Fischer, Mareike: Extremal values of the Sackin tree balance index. In: Annals of Combina- torics 25 (2021), Nr. 2, S. 515–541. http://dx.doi.org/10.1007/s00026-021-00539-2 . – DOI 10.1007/s00026–021–00539–2
-
[10]
Wicke, Kristina: Tree balance indices
Fischer, Mareike ; Herbst, Lina ; Kersting, Sophie ; Kühn, Annemarie L. ; Wicke, Kristina: Tree balance indices. 2023. Cham, Switzerland : Springer International Publishing, 2023
work page 2023
-
[11]
Furnas, George W.: The generation of random, binary unordered trees. In: J. Classif. 1 (1984), Dezember, Nr. 1, S. 187–233
work page 1984
-
[12]
Hamoudi, Yassine ; Laplante, Sophie ; Mantaci, Roberto: Balanced mobiles with applications to phylogenetic trees and Huffman-like problems . 2017. – Preprint on webpage https://www.irif. fr/~hamoudi/files/publications/BalancedMobiles.pdf
work page 2017
-
[13]
Inc., Wolfram R.: Mathematica, Version 13.3 . 2023. – Champaign, IL, 2023
work page 2023
-
[14]
Kayondo, Hassan W. ; Ssekagiri, Alfred ; Nabakooza, Grace ; Bbosa, Nicholas ; Ssemwanga, Deogratius ; Kaleebu, Pontiano ; Mwalili, Samuel ; Mango, John M. ; Leigh Brown, Andrew J. ; Saenz, Roberto A. ; Galiwango, Ronald ; Kitayimbwa, John M.: Employing phylogenetic tree shape statistics to resolve the underlying host population structure. In: BMC Bioinfor...
work page 2021
-
[15]
Kersting, S.: Genetic programming as a means for generating improved tree balance indices , Universität Greifswald, Master’s thesis, 2020. – Master thesis
work page 2020
-
[16]
Kersting, S. J. ; Wicke, K. ; Fischer, M.: Tree balance in phylogenetic models. In: Philosophical Transactions of the Royal Society B: Biological Sciences 380 (2025), Februar, Nr. 1919. http: //dx.doi.org/10.1098/rstb.2023.0303. – DOI 10.1098/rstb.2023.0303
-
[17]
Kersting, Sophie J. ; Fischer, Mareike: Measuring tree balance using symmetry nodes - A new balance index and its extremal properties. In: Math. Biosci. 341 (2021), November, Nr. 108690, S. 108690
work page 2021
-
[18]
In: Evolution 47 (1993), August, Nr
Kirkpatrick, Mark ; Slatkin, Montgomery: Searching for evolutionary patterns in the shape of a phylogenetic tree. In: Evolution 47 (1993), August, Nr. 4, S. 1171–1181
work page 1993
- [19]
- [20]
-
[21]
Sackin, M. J.: “Good” and “bad” phenograms. In: Systematic Biology 21 (1972), Nr. 2, S. 225–226. http://dx.doi.org/10.1093/sysbio/21.2.225. – DOI 10.1093/sysbio/21.2.225
-
[22]
In: Systematic Biology 39 (1990), 09, Nr
Shao, Kwang-Tsao ; Sokal, Robert R.: Tree Balance. In: Systematic Biology 39 (1990), 09, Nr. 3, 266-276. http://dx.doi.org/10.2307/2992186. – DOI 10.2307/2992186
-
[23]
: The On-Line Encyclopedia of Integer Sequences
The OEIS Foundation Inc. : The On-Line Encyclopedia of Integer Sequences. https://oeis.org, 2024
work page 2024
-
[24]
Wicke, Kristina ; Fischer, Mareike: Combinatorial views on persistent characters in phylogenetics. In: Adv. Appl. Math. 119 (2020), August, Nr. 102046, S. 102046 Appendix We here state the second part of the proof of Proposition 3. 28 Proof of Proposition 3. It remains to consider d2n+1. Note that if j is even, then 2n + 1 − j is odd and vice versa. Hence...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.