Nonextensive Statistical Signatures of the Bilaterian Transition in Proteome Length Distributions

Sertac Eroglu

arxiv: 2606.27985 · v1 · pith:NJWM34AAnew · submitted 2026-06-26 · ❄️ cond-mat.stat-mech · physics.bio-ph

Nonextensive Statistical Signatures of the Bilaterian Transition in Proteome Length Distributions

Sertac Eroglu This is my paper

Pith reviewed 2026-06-29 02:31 UTC · model grok-4.3

classification ❄️ cond-mat.stat-mech physics.bio-ph

keywords proteome length distributionsTsallis entropynonextensive statistical mechanicsbilaterian transitionq-exponential distributionprotein lengthsevolutionary complexitystatistical signatures

0 comments

The pith

The Tsallis entropic index q marks a shift in proteome length distributions at the bilaterian transition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper fits truncated discrete q-exponential distributions to the complementary cumulative distribution functions of protein lengths from 22 reference proteomes. It reports that the resulting entropic index q falls into three regimes: values below 1 for prokaryotes and basal eukaryotes, intervals spanning 1 for cnidarians and the basal bilaterian C. teleta, and values above 1 that increase from 1.033 to 1.147 across higher bilaterians. The q-exponential fit improves relative to the ordinary exponential and other two-parameter forms as complexity rises. A sympathetic reader would care because the index supplies a single, physically motivated number that tracks the rise in hierarchical proteome organization during the origin of bilaterian animals.

Core claim

Maximum likelihood fitting of truncated discrete q-exponential distributions to the complementary cumulative distribution functions of protein lengths in reference proteomes identifies three distinct regimes for the Tsallis entropic index q: values below 1 for prokaryotes, unicellular and non-animal multicellular eukaryotes, and basal animals; confidence intervals spanning 1 for the cnidarians and basal bilaterian Capitella teleta; and values above 1 for higher bilaterians, monotonically increasing from 1.033 in Strongylocentrotus purpuratus to 1.147 in Homo sapiens.

What carries the argument

The Tsallis entropic index q from maximum-likelihood fits of the truncated discrete q-exponential to proteome-length CCDFs, which quantifies nonextensivity and tracks organizational complexity.

If this is right

The q-exponential outperforms the ordinary exponential distribution across all 22 proteomes.
The relative performance of the q-exponential against other two-parameter distributions improves as proteome complexity increases.
q increases monotonically across the four sampled deuterostomes from sea urchin to human.
The boundary regime in q coincides with the phylogenetic position of cnidarians and the basal bilaterian C. teleta.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

A broader survey of lophotrochozoan and ecdysozoan proteomes could test whether the boundary regime is a general feature of all basal bilaterians.
If q indexes long-range correlations in protein lengths, it may correlate with independent measures such as cell-type diversity or regulatory network depth.
The same fitting procedure applied to metagenomic protein-length data could place unsequenced lineages on the same q scale.

Load-bearing premise

The 22 reference proteomes supply an unbiased sample of the bilaterian transition zone and the truncated discrete q-exponential is the correct functional form whose q values can be compared directly across domains.

What would settle it

Sequencing additional proteomes from more cnidarians, basal bilaterians, and early deuterostomes and finding q values that fall outside the reported regime boundaries or reverse the monotonic increase would falsify the claimed transition pattern.

read the original abstract

Protein length distributions across the tree of life carry a quantitative signature of organismal complexity. Nonextensive statistical mechanics, through the Tsallis generalized entropy formalism, provides a natural framework for describing complex systems characterized by long-range correlations, scale invariance, and hierarchical organization -- features that classical Boltzmann-Gibbs statistics cannot accommodate. In this work, the complementary cumulative distribution function (CCDF) of protein lengths is analyzed within this framework for the reference proteomes of 22 fully sequenced organisms spanning the domains Archaea, Bacteria, and Eukarya, with deliberate sampling across the animal transition zone from sponges and cnidarians to higher bilaterians. Maximum likelihood (MLE) fitting of truncated discrete q-exponential distributions, with bootstrap 95% confidence intervals (CIs) reveals that the entropic index q resolves into three statistically distinct regimes: superextensive (q < 1) for prokaryotes, unicellular and non-animal multicellular eukaryotes, and basal animals; a boundary regime (CI on spanning unity) for the two cnidarians studied and the basal bilaterian C. teleta; and subextensive (q > 1) for all higher bilaterians, with q increasing monotonically across the four deuterostomes sampled from S. purpuratus (1.033) to H. sapiens (1.147). The q-exponential outperforms the ordinary exponential distribution across all 22 proteomes and becomes progressively more competitive against alternative two-parameter distributions as proteome complexity increases. These results identify the Tsallis entropic index as a continuous, physically interpretable indicator of proteome organizational complexity and extend the applicability of nonextensive statistical mechanics to proteomic systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper fits Tsallis q to protein length CCDFs across 22 proteomes and reports three regimes with a shift at the bilaterian transition, but the boundary regime depends on only three organisms.

read the letter

The core observation is that maximum-likelihood fits of the truncated discrete q-exponential to protein length distributions yield q below 1 for prokaryotes and basal eukaryotes, values straddling 1 for the two cnidarians and one basal bilaterian, and q above 1 for higher bilaterians, with a steady rise from 1.033 to 1.147 across the four deuterostomes sampled. They also show the q-exponential beats the ordinary exponential in every case and grows more competitive against other two-parameter forms as complexity increases.

What stands out is the deliberate sampling across the animal transition and the use of bootstrap confidence intervals to argue for statistical separation of the regimes. That gives a concrete, continuous parameter tied to a major evolutionary step rather than just another descriptive fit.

The main limitation is the small number of proteomes in the boundary zone. With only three organisms whose intervals cross or touch q=1, the tripartite structure could shift if one more species were added or if truncation rules changed even modestly. The sampling is targeted rather than exhaustive, and the abstract gives no sensitivity table on cutoff choice, normalization, or substitution of alternative heavy-tailed models. Those checks matter because the headline claim is defined by where the intervals sit relative to 1.

No internal contradiction appears in the reported procedure, and the fitting approach itself looks standard. The work is observational, so the patterns are what they are once the fits are accepted.

This is worth a look for anyone tracking quantitative markers of proteome complexity or testing nonextensive statistics on biological data. A reader already working with Tsallis distributions or evolutionary proteomics could extract the q values and test them against their own sets. It is not yet a finished story on robustness, but the question it raises is clear enough that a serious editor should send it out for refereeing so the fitting details and sampling effects can be examined directly.

Referee Report

3 major / 1 minor

Summary. The manuscript fits truncated discrete q-exponential distributions via MLE (with bootstrap 95% CIs) to the complementary cumulative distribution functions of protein lengths in 22 reference proteomes. It reports that the fitted Tsallis index q partitions the organisms into three regimes—superextensive (q < 1) for prokaryotes plus unicellular/non-animal multicellular eukaryotes and basal animals, a boundary regime straddling q = 1 for two cnidarians and the basal bilaterian C. teleta, and subextensive (q > 1) for higher bilaterians with monotonic increase from S. purpuratus (1.033) to H. sapiens (1.147)—and claims that the q-exponential outperforms the ordinary exponential while becoming more competitive with other two-parameter heavy-tailed forms as complexity increases.

Significance. If the reported regime separation and monotonic trend prove robust, the work would supply a continuous, physically interpretable parameter linking nonextensive statistical mechanics to proteome organizational complexity across the bilaterian transition. The explicit use of bootstrap confidence intervals and direct model comparison against the exponential distribution are positive features; however, the deliberate rather than exhaustive sampling and absence of sensitivity checks on truncation and functional form limit the strength of the central claim.

major comments (3)

[Abstract] Abstract: the tripartite regime structure and statistical separation rest on bootstrap 95% CIs from only three boundary-zone proteomes (two cnidarians + C. teleta). Because the headline partition is defined solely by whether these CIs lie below, straddle, or lie above q = 1, even modest changes in truncation cutoff or normalization could move one or more intervals across unity and erase the claimed structure.
[Abstract] Abstract and implied Methods: no explicit statement is given of the precise truncation rules, length-exclusion criteria, or normalization convention used for the discrete q-exponential; these choices directly affect the MLE value of q and therefore the placement of the three boundary CIs that define the central claim.
[Abstract] Abstract: the reported monotonic increase in q across the four deuterostomes is presented without any accompanying sensitivity table showing that the ordering or the CI placements survive substitution of an alternative two-parameter heavy-tailed model or variation of the lower truncation threshold.

minor comments (1)

[Abstract] The abstract states that the q-exponential 'becomes progressively more competitive against alternative two-parameter distributions' but supplies no quantitative comparison (e.g., likelihood ratios or AIC differences) that would allow the reader to assess the strength of that statement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of robustness and methodological transparency. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the tripartite regime structure and statistical separation rest on bootstrap 95% CIs from only three boundary-zone proteomes (two cnidarians + C. teleta). Because the headline partition is defined solely by whether these CIs lie below, straddle, or lie above q = 1, even modest changes in truncation cutoff or normalization could move one or more intervals across unity and erase the claimed structure.

Authors: The tripartite partition is indeed anchored by the three boundary proteomes whose bootstrap CIs straddle q=1, as these were deliberately sampled to probe the transition zone. The remaining 19 proteomes show CIs lying unambiguously below or above unity, and the monotonic rise in q among higher bilaterians provides supporting internal consistency. While we agree that the boundary placement is sensitive to modeling choices, the observed separation aligns with the biological sampling strategy across the bilaterian transition. We will add a brief discussion of this reliance and the rationale for focused sampling. revision: partial
Referee: [Abstract] Abstract and implied Methods: no explicit statement is given of the precise truncation rules, length-exclusion criteria, or normalization convention used for the discrete q-exponential; these choices directly affect the MLE value of q and therefore the placement of the three boundary CIs that define the central claim.

Authors: We accept that the Methods section lacks an explicit, self-contained description of the truncation rules, length-exclusion criteria, and normalization convention. These were implemented following standard practice for truncated discrete distributions, but the manuscript will be revised to include a dedicated paragraph detailing the exact procedures, including the lower cutoff selection and normalization. revision: yes
Referee: [Abstract] Abstract: the reported monotonic increase in q across the four deuterostomes is presented without any accompanying sensitivity table showing that the ordering or the CI placements survive substitution of an alternative two-parameter heavy-tailed model or variation of the lower truncation threshold.

Authors: The monotonic ordering is reported from the primary q-exponential fits. We will add a supplementary table that varies the lower truncation threshold for the four deuterostomes and recomputes the q values and CIs, confirming that the ordering is preserved. Full substitution of every alternative two-parameter model for the trend alone was not performed, as the primary model comparison already showed the q-exponential outperforming the exponential and becoming competitive with other heavy-tailed forms; however, the added table will address truncation sensitivity directly. revision: yes

Circularity Check

0 steps flagged

No circularity; q regimes are direct observational classification of MLE fits

full rationale

The paper performs MLE fits of a truncated discrete q-exponential to each proteome's length CCDF and then classifies the resulting q values (with bootstrap CIs) into superextensive, boundary, and subextensive regimes. No equation in the provided text derives q from any other quantity, renames a fitted parameter as a prediction, or reduces the tripartite structure to a self-citation or ansatz; the reported monotonic trend among deuterostomes is likewise a direct reporting of the fitted sequence. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that protein length CCDFs are well-described by truncated discrete q-exponentials and that the 22 sampled proteomes represent the evolutionary transition without selection bias.

free parameters (1)

q (Tsallis entropic index)
Fitted by maximum likelihood to each proteome's CCDF; the reported regime distinctions and monotonic trend are defined by the values of this fitted parameter.

axioms (1)

domain assumption The complementary cumulative distribution function of protein lengths follows a truncated discrete q-exponential distribution
Invoked as the model form for all MLE fits; the statistical separation of regimes depends on this functional choice.

pith-pipeline@v0.9.1-grok · 5840 in / 1430 out tokens · 80887 ms · 2026-06-29T02:31:50.613592+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 24 canonical work pages

[1]

Tsallis, Possible generalization of Boltzmann-Gibbs statistics

C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 52(1-2), 479–487 (1988) https://doi.org/10.1007/BF01016429

work page doi:10.1007/bf01016429 1988
[2]

Picoli Jr., R.S

S. Picoli Jr., R.S. Mendes, L.C. Malacarne, R.P.B. Santos, q-distributions in complex systems: a brief review. Braz. J. Phys. 39(2A), 468–474 (2009) https://doi.org/10.1590/S0103-97332009000400023

work page doi:10.1590/s0103-97332009000400023 2009
[3]

Moghaddasi, K

H. Moghaddasi, K. Khalifeh, A. Darooneh, Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels. Sci. Rep. 7, 41543 (2017) https://doi.org/10.1038/srep41543

work page doi:10.1038/srep41543 2017
[4]

Tsallis, Non-additive entropies and statistical mechanics at the edge of chaos: a bridge between natural and social sciences

C. Tsallis, Non-additive entropies and statistical mechanics at the edge of chaos: a bridge between natural and social sciences. Philos. Trans. A Math. Phys. Eng. Sci. 381(2256), 20220293 (2023) https://doi.org/10.1098/rsta.2022.0293

work page doi:10.1098/rsta.2022.0293 2023
[5]

Tsallis, Entropic nonextensivity: a possible measure of complexity

C. Tsallis, Entropic nonextensivity: a possible measure of complexity. Chaos Soliton. Fract. 13(3), 371–391 (2002) https://doi.org/10.1016/S0960-0779(01)00019-4

work page doi:10.1016/s0960-0779(01)00019-4 2002
[6]

Zhang, Protein-length distribution for the three domains of life

J. Zhang, Protein-length distribution for the three domains of life. Trends Genet. 16(3), 107–109, (2000) https://doi.org/10.1016/S0168-9525(99)01922-8

work page doi:10.1016/s0168-9525(99)01922-8 2000
[7]

Oikonomou, A

Th. Oikonomou, A. Provata, Non-extensive trends in the size distribution of coding and non-coding DNA sequences in the human genome. Eur. Phys. J. B 50, 259–264 (2006) https://doi.org/10.1140/epjb/e2006-00121-2

work page doi:10.1140/epjb/e2006-00121-2 2006
[8]

Oikonomou, A

Th. Oikonomou, A. Provata, U. Tirnakli, Nonextensive statistical approach to non-coding human DNA. Physica A 387(11), 2653–2659 (2008) https://doi.org/10.1016/j.physa.2007.11.051

work page doi:10.1016/j.physa.2007.11.051 2008
[9]

R. Jain, S. Ramakumar, Stochastic dynamics modeling of the protein sequence length distribution in genomes: implications for microbial evolution. Physica A 273(3-4), 476– 485 (1999) https://doi.org/10.1016/S0378-4371(99)00370-2

work page doi:10.1016/s0378-4371(99)00370-2 1999
[10]

Tiessen, P

A. Tiessen, P. Pérez-Rodríguez, L.J. Delaye-Arredondo, Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Res. Notes 5, 85 (2012) https://doi.org/10.1186...

work page doi:10.1186/1756-0500-5-85 2012
[11]

Eroglu, Language-like behavior of protein length distribution in proteomes

S. Eroglu, Language-like behavior of protein length distribution in proteomes. Complexity, 10(2), 12-21 (2014) https://doi.org/10.1002/cplx.21498

work page doi:10.1002/cplx.21498 2014
[12]

Eroglu, Information content estimate of model proteomes: a primary structure perspective

S. Eroglu, Information content estimate of model proteomes: a primary structure perspective. Curr. Bioinform. 12(6), 490–497 (2017) https://doi.org/10.2174/1574893612666161215165052

work page doi:10.2174/1574893612666161215165052 2017
[13]

Nevers, N.M

Y. Nevers, N.M. Glover, C. Dessimoz, O. Lecompte, Protein length distribution is remarkably uniform across the tree of life. Genome Biol. 24, 135 (2023) https://doi.org/10.1186/s13059-023-02973-2 MANUSCRIPT — Nonextensive Statistical Signatures of the Bilaterian Transition in Proteome Length Distributions 44

work page doi:10.1186/s13059-023-02973-2 2023
[14]

Tsallis, R.S

C. Tsallis, R.S. Mendes, A.R. Plastino, The role of constraints within generalized nonextensive statistics. Physica A 261(3-4), 534–554 (1998) https://doi.org/10.1016/S0378-4371(98)00437-3

work page doi:10.1016/s0378-4371(98)00437-3 1998
[15]

Tsallis, G

C. Tsallis, G. Bemski, R.S. Mendes, Is reassociation in folded proteins a case of nonextensivity?. Phys. Lett. A 257(1-2), 93–98 (1999) https://doi.org/10.1016/S0375- 9601(99)00270-4

work page doi:10.1016/s0375- 1999
[16]

Mandelbrot, The Fractal Geometry of Nature, updated and augmented edn

B.B. Mandelbrot, The Fractal Geometry of Nature, updated and augmented edn. (Freeman, New York, 1983)

1983
[17]

Plastino, in Nonextensive Statistical Mechanics and Its Applications, ed

A.R. Plastino, in Nonextensive Statistical Mechanics and Its Applications, ed. S. Abe, Y. Okamoto (Springer, Berlin, 2001), pp. 157–191 https://doi.org/10.1007/3-540-40919-X

work page doi:10.1007/3-540-40919-x 2001
[18]

Accessed 7 December 2025

UniProtKB, https://www.uniprot.org/proteomes/. Accessed 7 December 2025

2025
[19]

Gudlaugsdottir, D.R

S. Gudlaugsdottir, D.R. Boswell, G.R. Wood, J. Ma, Exon size distribution and the origin of introns. Genetica 131, 299–306 (2007) https://doi.org/10.1007/s10709-007-9139-4

work page doi:10.1007/s10709-007-9139-4 2007
[20]

Beal, Biochemical complexity drives log-normal variation in genetic expression

J. Beal, Biochemical complexity drives log-normal variation in genetic expression. Eng. Biol. 1(1), 55-60 (2017) https://doi.org/10.1049/enb.2017.0004

work page doi:10.1049/enb.2017.0004 2017
[21]

Akaike, A new look at the statistical model identification

H. Akaike, A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19(6), 716–723 (1974) https://doi.org/10.1109/TAC.1974.1100705

work page doi:10.1109/tac.1974.1100705 1974
[22]

Eroglu, q-exponential fitting for proteomic protein length distribution, (Zenodo, software) https://zenodo.org/records/19914964

S. Eroglu, q-exponential fitting for proteomic protein length distribution, (Zenodo, software) https://zenodo.org/records/19914964. Accessed April 10 2026

arXiv 2026
[23]

Glasauer, S.C.F

A. Glasauer, S.C.F. Neuhauss, Whole-genome duplication in teleost fishes and its evolutionary consequences. Mol. Genet. Genomics 289(6), 1045–1060 (2014) https://doi.org/10.1007/s00438-014-0889-2

work page doi:10.1007/s00438-014-0889-2 2014
[24]

Bickel, B.J

D.R. Bickel, B.J. West, Multiplicative and Fractal Process in DNA Evolution. Fractals 6(3), 211–217 (1998) https://doi.org/10.1142/S0218348X98000262

work page doi:10.1142/s0218348x98000262 1998
[25]

Wolf, P.S

Y.I. Wolf, P.S. Novichkov, G.P. Karev, E.V. Koonin, D.J. Lipman, The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc. Natl. Acad. Sci. U.S.A. 106(18), 7273–7280 (2009) https://doi.org/10.1073/pnas.0901808106

work page doi:10.1073/pnas.0901808106 2009
[26]

Muro, F.J

E.M. Muro, F.J. Ballesteros, B. Luque, J. Bascompte, The emergence of eukaryotes as an evolutionary algorithmic phase transition. Proc. Natl. Acad. Sci. U.S.A. 122(13) e2422968122 (2025) https://doi.org/10.1073/pnas.2422968122

work page doi:10.1073/pnas.2422968122 2025
[27]

Burnham, D.R

K.P. Burnham, D.R. Anderson, Model Selection and Inference: A Practical Information- Theoretic Approach, 2nd edn. (Springer, New York, 2002), pp. 70–72 https://doi.org/10.1007/b97636

work page doi:10.1007/b97636 2002
[28]

Altmann, Prolegomena to Menzerath’s law

G. Altmann, Prolegomena to Menzerath’s law. Glottometrika 2, 1–10 (1980) MANUSCRIPT — Nonextensive Statistical Signatures of the Bilaterian Transition in Proteome Length Distributions 45 TABLES Table 1 Overview of the analyzed reference proteome set: the associated credentials and the statistical information Organism (Abbr. name, UniProtKB ID) Proteome si...

1980

[1] [1]

Tsallis, Possible generalization of Boltzmann-Gibbs statistics

C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 52(1-2), 479–487 (1988) https://doi.org/10.1007/BF01016429

work page doi:10.1007/bf01016429 1988

[2] [2]

Picoli Jr., R.S

S. Picoli Jr., R.S. Mendes, L.C. Malacarne, R.P.B. Santos, q-distributions in complex systems: a brief review. Braz. J. Phys. 39(2A), 468–474 (2009) https://doi.org/10.1590/S0103-97332009000400023

work page doi:10.1590/s0103-97332009000400023 2009

[3] [3]

Moghaddasi, K

H. Moghaddasi, K. Khalifeh, A. Darooneh, Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels. Sci. Rep. 7, 41543 (2017) https://doi.org/10.1038/srep41543

work page doi:10.1038/srep41543 2017

[4] [4]

Tsallis, Non-additive entropies and statistical mechanics at the edge of chaos: a bridge between natural and social sciences

C. Tsallis, Non-additive entropies and statistical mechanics at the edge of chaos: a bridge between natural and social sciences. Philos. Trans. A Math. Phys. Eng. Sci. 381(2256), 20220293 (2023) https://doi.org/10.1098/rsta.2022.0293

work page doi:10.1098/rsta.2022.0293 2023

[5] [5]

Tsallis, Entropic nonextensivity: a possible measure of complexity

C. Tsallis, Entropic nonextensivity: a possible measure of complexity. Chaos Soliton. Fract. 13(3), 371–391 (2002) https://doi.org/10.1016/S0960-0779(01)00019-4

work page doi:10.1016/s0960-0779(01)00019-4 2002

[6] [6]

Zhang, Protein-length distribution for the three domains of life

J. Zhang, Protein-length distribution for the three domains of life. Trends Genet. 16(3), 107–109, (2000) https://doi.org/10.1016/S0168-9525(99)01922-8

work page doi:10.1016/s0168-9525(99)01922-8 2000

[7] [7]

Oikonomou, A

Th. Oikonomou, A. Provata, Non-extensive trends in the size distribution of coding and non-coding DNA sequences in the human genome. Eur. Phys. J. B 50, 259–264 (2006) https://doi.org/10.1140/epjb/e2006-00121-2

work page doi:10.1140/epjb/e2006-00121-2 2006

[8] [8]

Oikonomou, A

Th. Oikonomou, A. Provata, U. Tirnakli, Nonextensive statistical approach to non-coding human DNA. Physica A 387(11), 2653–2659 (2008) https://doi.org/10.1016/j.physa.2007.11.051

work page doi:10.1016/j.physa.2007.11.051 2008

[9] [9]

R. Jain, S. Ramakumar, Stochastic dynamics modeling of the protein sequence length distribution in genomes: implications for microbial evolution. Physica A 273(3-4), 476– 485 (1999) https://doi.org/10.1016/S0378-4371(99)00370-2

work page doi:10.1016/s0378-4371(99)00370-2 1999

[10] [10]

Tiessen, P

A. Tiessen, P. Pérez-Rodríguez, L.J. Delaye-Arredondo, Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Res. Notes 5, 85 (2012) https://doi.org/10.1186...

work page doi:10.1186/1756-0500-5-85 2012

[11] [11]

Eroglu, Language-like behavior of protein length distribution in proteomes

S. Eroglu, Language-like behavior of protein length distribution in proteomes. Complexity, 10(2), 12-21 (2014) https://doi.org/10.1002/cplx.21498

work page doi:10.1002/cplx.21498 2014

[12] [12]

Eroglu, Information content estimate of model proteomes: a primary structure perspective

S. Eroglu, Information content estimate of model proteomes: a primary structure perspective. Curr. Bioinform. 12(6), 490–497 (2017) https://doi.org/10.2174/1574893612666161215165052

work page doi:10.2174/1574893612666161215165052 2017

[13] [13]

Nevers, N.M

Y. Nevers, N.M. Glover, C. Dessimoz, O. Lecompte, Protein length distribution is remarkably uniform across the tree of life. Genome Biol. 24, 135 (2023) https://doi.org/10.1186/s13059-023-02973-2 MANUSCRIPT — Nonextensive Statistical Signatures of the Bilaterian Transition in Proteome Length Distributions 44

work page doi:10.1186/s13059-023-02973-2 2023

[14] [14]

Tsallis, R.S

C. Tsallis, R.S. Mendes, A.R. Plastino, The role of constraints within generalized nonextensive statistics. Physica A 261(3-4), 534–554 (1998) https://doi.org/10.1016/S0378-4371(98)00437-3

work page doi:10.1016/s0378-4371(98)00437-3 1998

[15] [15]

Tsallis, G

C. Tsallis, G. Bemski, R.S. Mendes, Is reassociation in folded proteins a case of nonextensivity?. Phys. Lett. A 257(1-2), 93–98 (1999) https://doi.org/10.1016/S0375- 9601(99)00270-4

work page doi:10.1016/s0375- 1999

[16] [16]

Mandelbrot, The Fractal Geometry of Nature, updated and augmented edn

B.B. Mandelbrot, The Fractal Geometry of Nature, updated and augmented edn. (Freeman, New York, 1983)

1983

[17] [17]

Plastino, in Nonextensive Statistical Mechanics and Its Applications, ed

A.R. Plastino, in Nonextensive Statistical Mechanics and Its Applications, ed. S. Abe, Y. Okamoto (Springer, Berlin, 2001), pp. 157–191 https://doi.org/10.1007/3-540-40919-X

work page doi:10.1007/3-540-40919-x 2001

[18] [18]

Accessed 7 December 2025

UniProtKB, https://www.uniprot.org/proteomes/. Accessed 7 December 2025

2025

[19] [19]

Gudlaugsdottir, D.R

S. Gudlaugsdottir, D.R. Boswell, G.R. Wood, J. Ma, Exon size distribution and the origin of introns. Genetica 131, 299–306 (2007) https://doi.org/10.1007/s10709-007-9139-4

work page doi:10.1007/s10709-007-9139-4 2007

[20] [20]

Beal, Biochemical complexity drives log-normal variation in genetic expression

J. Beal, Biochemical complexity drives log-normal variation in genetic expression. Eng. Biol. 1(1), 55-60 (2017) https://doi.org/10.1049/enb.2017.0004

work page doi:10.1049/enb.2017.0004 2017

[21] [21]

Akaike, A new look at the statistical model identification

H. Akaike, A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19(6), 716–723 (1974) https://doi.org/10.1109/TAC.1974.1100705

work page doi:10.1109/tac.1974.1100705 1974

[22] [22]

Eroglu, q-exponential fitting for proteomic protein length distribution, (Zenodo, software) https://zenodo.org/records/19914964

S. Eroglu, q-exponential fitting for proteomic protein length distribution, (Zenodo, software) https://zenodo.org/records/19914964. Accessed April 10 2026

arXiv 2026

[23] [23]

Glasauer, S.C.F

A. Glasauer, S.C.F. Neuhauss, Whole-genome duplication in teleost fishes and its evolutionary consequences. Mol. Genet. Genomics 289(6), 1045–1060 (2014) https://doi.org/10.1007/s00438-014-0889-2

work page doi:10.1007/s00438-014-0889-2 2014

[24] [24]

Bickel, B.J

D.R. Bickel, B.J. West, Multiplicative and Fractal Process in DNA Evolution. Fractals 6(3), 211–217 (1998) https://doi.org/10.1142/S0218348X98000262

work page doi:10.1142/s0218348x98000262 1998

[25] [25]

Wolf, P.S

Y.I. Wolf, P.S. Novichkov, G.P. Karev, E.V. Koonin, D.J. Lipman, The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc. Natl. Acad. Sci. U.S.A. 106(18), 7273–7280 (2009) https://doi.org/10.1073/pnas.0901808106

work page doi:10.1073/pnas.0901808106 2009

[26] [26]

Muro, F.J

E.M. Muro, F.J. Ballesteros, B. Luque, J. Bascompte, The emergence of eukaryotes as an evolutionary algorithmic phase transition. Proc. Natl. Acad. Sci. U.S.A. 122(13) e2422968122 (2025) https://doi.org/10.1073/pnas.2422968122

work page doi:10.1073/pnas.2422968122 2025

[27] [27]

Burnham, D.R

K.P. Burnham, D.R. Anderson, Model Selection and Inference: A Practical Information- Theoretic Approach, 2nd edn. (Springer, New York, 2002), pp. 70–72 https://doi.org/10.1007/b97636

work page doi:10.1007/b97636 2002

[28] [28]

Altmann, Prolegomena to Menzerath’s law

G. Altmann, Prolegomena to Menzerath’s law. Glottometrika 2, 1–10 (1980) MANUSCRIPT — Nonextensive Statistical Signatures of the Bilaterian Transition in Proteome Length Distributions 45 TABLES Table 1 Overview of the analyzed reference proteome set: the associated credentials and the statistical information Organism (Abbr. name, UniProtKB ID) Proteome si...

1980