pith. sign in

arxiv: 2509.04995 · v2 · submitted 2025-09-05 · 🧬 q-bio.PE · math.CO

Revealing the building blocks of tree balance: fundamental units of the Sackin and Colless Indices

Pith reviewed 2026-05-18 19:03 UTC · model grok-4.3

classification 🧬 q-bio.PE math.CO
keywords Sackin indexColless indextree balanceimbalance indicesphylogenetic treesechelon treetree decomposition
0
0 comments X

The pith

Sackin and Colless indices decompose into elementary components that each qualify as independent tree imbalance measures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the Sackin and Colless indices are built from simpler parts rather than being basic measures themselves. Each of those parts meets the complete requirements to serve as a tree imbalance index on its own. This decomposition also produces new indices from their differences, with one minimized on Colless-minimal trees and the reverse acting as a balance measure. The building blocks further allow direct comparison to the stairs2 index and support a non-recursive construction method for the echelon tree. A sympathetic reader would care because the result clarifies why different imbalance scores sometimes disagree and supplies modular units for future index construction.

Core claim

The Sackin and Colless indices are compound in nature and can be decomposed into more elementary components that independently satisfy the defining properties of a tree (im)balance index. The difference Colless minus Sackin results in another imbalance index minimized by all Colless minimal trees, while Sackin minus Colless forms a balance index. The building blocks are compared to these indices and to the stairs2 index, and the echelon tree is investigated with a first non-recursive algorithm for its construction.

What carries the argument

The elementary components obtained by decomposing the Sackin and Colless indices, each functioning as a standalone tree (im)balance index.

If this is right

  • The difference Colless minus Sackin yields a valid imbalance index minimized on Colless-minimal trees.
  • The difference Sackin minus Colless yields a valid balance index.
  • The building blocks provide a basis for comparing and explaining disagreements between established indices such as stairs2.
  • A non-recursive algorithm constructs the echelon tree, a shape central to multiple imbalance indices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These components could be recombined to generate new indices that isolate particular aspects of tree shape.
  • The decomposition approach may extend to other established indices to reveal similar modular structure.
  • Using the building blocks separately could reduce computational cost when scoring large sets of phylogenetic trees.

Load-bearing premise

The derived elementary components each satisfy every required property of a tree imbalance index without additional constraints or interactions that would break their independence.

What would settle it

Direct computation of the proposed elementary components on a collection of small trees to verify that each component alone produces scores satisfying all mathematical properties of an imbalance index.

Figures

Figures reproduced from arXiv: 2509.04995 by Linda Kn\"uver, Mareike Fischer.

Figure 1
Figure 1. Figure 1: Trees T gf b n , T be n and T ′ as used in the proof of Proposition 4. Subtrees are depicted as triangles. Note that the diamond nodes highlight the nodes which differ between T be n and T ′ . Our proof strategy now is as follows. We want to show that the assumption ∆CS(T be n ) > ∆CS(T gf b n ) leads to a contradiction, and we do this in the following steps: We first show that T ′ a is an echelon 11 [PIT… view at source ↗
Figure 2
Figure 2. Figure 2: Trees Te and T ′ as used in the proof of Lemma 4. Note that both trees employ the same pending subtrees and the only vertices in which they differ are highlighted with diamonds. By Proposition 4, T be na is, just like Ta, ∆CS minimal, so we must have ∆CS(Ta) = ∆CS(T be na ), and – as both trees share the same subtree Tb – we in summary have (using Proposition 1) that ∆CS(T) = ∆CS(Te). However, note that 2 … view at source ↗
Figure 3
Figure 3. Figure 3: A depiction of all ∆CS minimal (and ∆SC as well as Nb maximal) trees for n = 9: The set consists of T gf b 9 , T be 9 and T = (Ta, Tb). Note that at the root, T has a Colless partition as na = n gf b a = 5 and nb = n gf b b = 4. However, T is not Colless minimal as it employs a non-Colless minimal subtree Ta = T be 5 . As this subtree is ∆CS minimal, too, so is T. Note that T can be regarded as a “combinat… view at source ↗
Figure 4
Figure 4. Figure 4: Trees Te and T ′ as needed in the proof of Theorem 5. Note that both trees employ the same pending subtrees and the only vertices in which they differ are highlighted with diamonds. We now consider tree T ′ from [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The direct, non-recursive construction of T be n for n = 7 and n = 8. The first step is to calculate the binary weight w(n) of n, i.e., to count the numbers of 1’s in the binary representation of n. In our example we have n = 7 = 1112 and n = 8 = 10002. Hence, w(7) = 3 and w(8) = 1. Then, a caterpillar of size w(n) (depicted in gray) is iteratively filled up with fully balanced trees of sizes 2 i , where t… view at source ↗
Figure 6
Figure 6. Figure 6: Ranking comparisons of the indices Na, S, C, ∆CS (whose induced rankings are the same as those of −∆SC and −Nb) and stairs2. Note that the depicted plots were generated with the computer algebra system Mathematica [13]. The y-axis contains all trees ordered according to the so-called Furnas rank [10, 11, 18], starting with the maximally balanced tree and proceeding to the caterpillar. The blue line corresp… view at source ↗
read the original abstract

(Im)balance indices can be used to quantify the (im)balance of trees by assigning numerical scores to them. An easy way to generate a new index is to construct a compound index, e.g., a linear combination of established indices. Two of the most prominent and widely used imbalance indices are the Sackin index and the Colless index. In this study, we show that these classic indices are themselves compound in nature: they can be decomposed into more elementary components that independently satisfy the defining properties of a tree (im)balance index. We further show that the difference Colless minus Sackin results in another imbalance index that is minimized (amongst others) by all Colless minimal trees. Conversely, the difference Sackin minus Colless forms a balance index. Finally, we compare the building blocks of which the Sackin and the Colless indices consist to these indices as well as to the stairs2 index, which is another index from the literature. Our results suggest that the elementary building blocks we identify are not only foundational to established indices but also valuable tools for analyzing disagreement among indices when comparing the balance of different trees. Along the way, we investigate the so-called echelon tree, which plays an important role for several (im)balance indices, and present the first non-recursive algorithm to construct it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that the Sackin and Colless indices can be decomposed into elementary components (local contributions) that each independently satisfy the defining properties of a tree imbalance index. It further shows that Colless minus Sackin yields a valid imbalance index minimized by Colless-minimal trees (including the echelon tree), while Sackin minus Colless forms a balance index; these building blocks are compared to the stairs2 index, and a non-recursive algorithm is given for constructing the echelon tree.

Significance. If the central decompositions hold with independent satisfaction of the axioms, the work supplies a finer-grained view of two foundational indices, potentially enabling new compound indices and clarifying sources of disagreement among balance measures. The non-recursive echelon-tree algorithm is a concrete computational contribution that strengthens the manuscript.

major comments (1)
  1. [Decomposition of Sackin and Colless (main technical sections following the definitions)] The claim that each elementary component independently satisfies the full set of defining properties (non-negativity on all trees, zero exclusively on balanced trees, and appropriate monotonicity under imbalance-increasing moves) is load-bearing for the central result but is not yet shown to hold without possible cancellations. The decomposition into sums of local terms does not automatically guarantee independence if the axioms include global constraints; explicit verification or counter-example checks for isolated components are needed (see the decomposition of Sackin/Colless and the subsequent difference-index claims).
minor comments (1)
  1. [Echelon tree construction] The non-recursive algorithm for the echelon tree would be clearer with pseudocode or a small worked example on a 5- or 6-taxon tree.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments. We appreciate the positive assessment of the significance of our work and the recommendation for major revision. We will revise the manuscript to address the concerns raised regarding the independent satisfaction of the axioms by the elementary components.

read point-by-point responses
  1. Referee: [Decomposition of Sackin and Colless (main technical sections following the definitions)] The claim that each elementary component independently satisfies the full set of defining properties (non-negativity on all trees, zero exclusively on balanced trees, and appropriate monotonicity under imbalance-increasing moves) is load-bearing for the central result but is not yet shown to hold without possible cancellations. The decomposition into sums of local terms does not automatically guarantee independence if the axioms include global constraints; explicit verification or counter-example checks for isolated components are needed (see the decomposition of Sackin/Colless and the subsequent difference-index claims).

    Authors: We agree with the referee that explicit verification is required to establish that each elementary component satisfies the defining properties independently. Although the sums yield valid indices, we recognize that this does not preclude the need for direct checks on the components to ensure no hidden cancellations or global dependencies. In the revised manuscript, we will add explicit proofs and/or counter-example verifications demonstrating that each local building block is non-negative, vanishes only on balanced trees, and is monotonic with respect to imbalance-increasing moves. We will extend this verification to the difference indices Colless minus Sackin and Sackin minus Colless as well. These additions will be placed in the relevant technical sections following the definitions. revision: yes

Circularity Check

0 steps flagged

Decomposition relies on prior index properties; no reduction to self-referential inputs

full rationale

The manuscript decomposes the Sackin and Colless indices into sums of local elementary contributions and verifies that each component satisfies the standard axioms for an imbalance index (non-negativity, zero only on balanced trees, monotonicity under certain rearrangements). These axioms are drawn from the existing literature on tree balance indices rather than being redefined inside the paper. No equation in the derivation equates a derived quantity to a fitted parameter or renames an input as an output. Self-citations appear when recalling the definitions of the classic indices and the echelon tree, but they serve only as background; the new algorithmic construction of the echelon tree and the explicit component-wise verification are performed directly on the trees. The central result therefore remains independent of any self-citation chain and does not collapse to a tautology or statistical fit.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard combinatorial definitions of tree imbalance indices and the echelon tree from prior literature; no new free parameters or invented entities are introduced beyond the derived components.

axioms (1)
  • domain assumption A quantity qualifies as a tree imbalance index if it satisfies a set of defining properties (such as increasing with imbalance).
    Invoked when stating that the elementary components independently satisfy the properties.

pith-pipeline@v0.9.0 · 5771 in / 1230 out tokens · 32335 ms · 2026-05-18T19:03:02.418263+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    Blum, M. G. B. ; François, O.: On statistical tests of phylogenetic tree imbalance: the Sackin and other indices revisited. In: Mathematical Biosciences 195 (2005), Nr. 2, S. 141–153. http: //dx.doi.org/10.1016/j.mbs.2005.03.003. – DOI 10.1016/j.mbs.2005.03.003

  2. [2]

    Blum, M. G. B. ; François, O. ; Janson, S.: The mean, variance and limiting distribu- tion of two statistics sensitive to phylogenetic tree balance. In: Annals of Applied Probabil- ity 16 (2006), Nr. 4, S. 2195–2214. http://dx.doi.org/10.1214/105051606000000547. – DOI 10.1214/105051606000000547

  3. [3]

    Cardona, Gabriel ; Mir, Arnau ; Rosselló, Francesc: Exact formulas for the variance of several balance indices under the Yule model. In: J. Math. Biol. 67 (2013), Dezember, Nr. 6-7, S. 1833–1846

  4. [4]

    https://arxiv.org/abs/2502.12854

    Cleary, Sean ; Fischer, Mareike ; St.John, Katherine: The GFB tree and tree imbalance indices . https://arxiv.org/abs/2502.12854. Version: 2025

  5. [5]

    In: Evol

    Colijn, Caroline ; Gardy, Jennifer: Phylogenetic tree shapes resolve disease transmission patterns. In: Evol. Med. Public Health 2014 (2014), Juni, Nr. 1, S. 96–108

  6. [6]

    Phylogenetics: The theory and practice of phylogenetic systematics

    Colless, D.: Review of “Phylogenetics: The theory and practice of phylogenetic systematics” . In: Systematic Zoology 31 (1982), Nr. 1, S. 100–104

  7. [7]

    Fischer, M

    Coronado, T.M. ; Fischer, M. ; Herbst, L. et a.: On the minimum value of the Colless index and the bifurcating trees that achieve it. In: Journal of Mathematical Biology 80 (2020). http: //dx.doi.org/10.1007/s00285-020-01488-9 . – DOI 10.1007/s00285–020–01488–9

  8. [8]

    Currie, Bryan ; Wicke, Kristina: On the maximum value of the stairs2 index. In: Adv. Appl. Math. 159 (2024), August, Nr. 102732, S. 102732 27

  9. [9]

    In: Annals of Combina- torics 25 (2021), Nr

    Fischer, Mareike: Extremal values of the Sackin tree balance index. In: Annals of Combina- torics 25 (2021), Nr. 2, S. 515–541. http://dx.doi.org/10.1007/s00026-021-00539-2 . – DOI 10.1007/s00026–021–00539–2

  10. [10]

    Wicke, Kristina: Tree balance indices

    Fischer, Mareike ; Herbst, Lina ; Kersting, Sophie ; Kühn, Annemarie L. ; Wicke, Kristina: Tree balance indices. 2023. Cham, Switzerland : Springer International Publishing, 2023

  11. [11]

    Furnas, George W.: The generation of random, binary unordered trees. In: J. Classif. 1 (1984), Dezember, Nr. 1, S. 187–233

  12. [12]

    Hamoudi, Yassine ; Laplante, Sophie ; Mantaci, Roberto: Balanced mobiles with applications to phylogenetic trees and Huffman-like problems . 2017. – Preprint on webpage https://www.irif. fr/~hamoudi/files/publications/BalancedMobiles.pdf

  13. [13]

    Inc., Wolfram R.: Mathematica, Version 13.3 . 2023. – Champaign, IL, 2023

  14. [14]

    Ssekagiri, Alfred ; Nabakooza, Grace ; Bbosa, Nicholas ; Ssemwanga, Deogratius ; Kaleebu, Pontiano ; Mwalili, Samuel ; Mango, John M

    Kayondo, Hassan W. ; Ssekagiri, Alfred ; Nabakooza, Grace ; Bbosa, Nicholas ; Ssemwanga, Deogratius ; Kaleebu, Pontiano ; Mwalili, Samuel ; Mango, John M. ; Leigh Brown, Andrew J. ; Saenz, Roberto A. ; Galiwango, Ronald ; Kitayimbwa, John M.: Employing phylogenetic tree shape statistics to resolve the underlying host population structure. In: BMC Bioinfor...

  15. [15]

    – Master thesis

    Kersting, S.: Genetic programming as a means for generating improved tree balance indices , Universität Greifswald, Master’s thesis, 2020. – Master thesis

  16. [16]

    Kersting, S. J. ; Wicke, K. ; Fischer, M.: Tree balance in phylogenetic models. In: Philosophical Transactions of the Royal Society B: Biological Sciences 380 (2025), Februar, Nr. 1919. http: //dx.doi.org/10.1098/rstb.2023.0303. – DOI 10.1098/rstb.2023.0303

  17. [17]

    Fischer, Mareike: Measuring tree balance using symmetry nodes - A new balance index and its extremal properties

    Kersting, Sophie J. ; Fischer, Mareike: Measuring tree balance using symmetry nodes - A new balance index and its extremal properties. In: Math. Biosci. 341 (2021), November, Nr. 108690, S. 108690

  18. [18]

    In: Evolution 47 (1993), August, Nr

    Kirkpatrick, Mark ; Slatkin, Montgomery: Searching for evolutionary patterns in the shape of a phylogenetic tree. In: Evolution 47 (1993), August, Nr. 4, S. 1171–1181

  19. [19]

    In: Syst

    Matsen, Frederick A.: A geometric approach to tree shape statistics. In: Syst. Biol. 55 (2006), August, Nr. 4, S. 652–661

  20. [20]

    In: Math

    Mir, Arnau ; Rosselló, Francesc ; Rotger, Lucía: A new balance index for phylogenetic trees. In: Math. Biosci. 241 (2013), Januar, Nr. 1, S. 125–136

  21. [21]

    Good” and “bad

    Sackin, M. J.: “Good” and “bad” phenograms. In: Systematic Biology 21 (1972), Nr. 2, S. 225–226. http://dx.doi.org/10.1093/sysbio/21.2.225. – DOI 10.1093/sysbio/21.2.225

  22. [22]

    In: Systematic Biology 39 (1990), 09, Nr

    Shao, Kwang-Tsao ; Sokal, Robert R.: Tree Balance. In: Systematic Biology 39 (1990), 09, Nr. 3, 266-276. http://dx.doi.org/10.2307/2992186. – DOI 10.2307/2992186

  23. [23]

    : The On-Line Encyclopedia of Integer Sequences

    The OEIS Foundation Inc. : The On-Line Encyclopedia of Integer Sequences. https://oeis.org, 2024

  24. [24]

    Wicke, Kristina ; Fischer, Mareike: Combinatorial views on persistent characters in phylogenetics. In: Adv. Appl. Math. 119 (2020), August, Nr. 102046, S. 102046 Appendix We here state the second part of the proof of Proposition 3. 28 Proof of Proposition 3. It remains to consider d2n+1. Note that if j is even, then 2n + 1 − j is odd and vice versa. Hence...