pith. machine review for the scientific record. sign in

arxiv: 2604.15599 · v1 · submitted 2026-04-17 · 🧮 math.CO

Making ends meet or just meeting at the ends? Assessing end-to-end distance in folded RNA sequences and other branched structures

Pith reviewed 2026-05-10 08:36 UTC · model grok-4.3

classification 🧮 math.CO
keywords RNA foldingbranched structuresend-to-end distancecombinatorial modelsanalytic combinatoricsmultivariate generating functionssequence analysis
0
0 comments X

The pith

The ends of branched structures like folded RNA are almost certainly close together.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that the observed closeness between the ends of folded RNA sequences follows directly from their branched topology rather than requiring special biological features. The authors develop combinatorial models of branching with increasing detail and apply multivariate analytic combinatorics to prove that end-to-end distances remain small with high probability. They derive exact expressions for the mean and variance of these distances and test the resulting distributions against both real RNA datasets and randomized shuffles of the same sequences. Readers should care because the result reframes an empirical puzzle in molecular biology as a generic consequence of branching, shifting attention from why ends meet to how real structures deviate from the generic case.

Core claim

Using combinatorial branching models of increasing complexity and multivariate analytic combinatorics, we prove that the ends of branched structures are almost certainly close. We completely characterize parameters tracking end-to-end distance, including means and variances. Comparisons to existing datasets of known RNA structures and minimum free-energy structures of randomized shuffles show that shuffled structures resemble the theoretical distributions while known RNA structures have similar parameter values but are more concentrated.

What carries the argument

Combinatorial branching models of increasing complexity, analyzed with multivariate analytic combinatorics, that generate and track the distribution of end-to-end distances.

If this is right

  • End-to-end distance remains bounded on average even as the number of branches or sequence length increases.
  • The variance of end-to-end distance is finite and can be computed explicitly from the model parameters.
  • Randomized shuffles of RNA sequences produce minimum free-energy structures whose distance statistics closely match the theoretical predictions.
  • Known biological RNA structures show means and variances similar to the models but with tighter concentration around small distances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Branching geometry by itself may suffice to explain end proximity in many other folded or tree-like molecules without invoking additional selection pressures.
  • The same analytic approach could be applied to non-RNA branched polymers or to different branching rules to test how sensitive the closeness result is to model details.
  • Discrepancies between real RNA data and the models could be used to identify which structural features beyond simple branching are biologically significant.

Load-bearing premise

The combinatorial branching models of increasing complexity accurately represent the geometry and connectivity of real folded RNA sequences.

What would settle it

Measuring end-to-end distances in large ensembles of real or simulated branched RNA structures that grow linearly with sequence length, as they do for a random linear chain, would refute the claim.

Figures

Figures reproduced from arXiv: 2604.15599 by Christine Heitsch, Torin Greenwood.

Figure 1
Figure 1. Figure 1: An rna secondary structure, as in the radial diagram (center), consists of runs of stacked base pairs, called helices, separated by singled-stranded regions called loops. Since no pairings cross, the arrangement of helices can be abstracted to a plane tree, which is equivalent to a Dyck path under a pre-order walk. A secondary structure is written as a dot-bracket sequence (equivalently a Motzkin path) by … view at source ↗
Figure 2
Figure 2. Figure 2: The length of the first helix in an rna secondary structure is equal to the depth of the corresponding Motzkin path or Dyck path, and the length of the first stem corresponds to the lowest valley or plateau excluding the starting and ending sides of the mountain. 4.1 Closing helix length in Dyck and Motzkin paths In terms of Dyck and Motzkin paths, HEL corresponds to the height of its lowest valley or plat… view at source ↗
Figure 3
Figure 3. Figure 3: A decomposition of Dyck paths according to the first return to the line [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The distribution of DEG versus UNP across the ArchiveII dataset. The intensity of each color represents the percent of structures with the corresponding DEG and UNP values. Colors represent approximate ETE distances, color-coded as red: 1.5 - 2.5, orange: 2.5-3.5, yellow; 3.5-4.5, green: 4.5-5.5, blue: 5.5-6.5, purple: 6.5-7.5. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The distribution of LEN among sequences in each database. On the left, the original sequences. On the right, shuffled sequences. For the ArchiveII data and the ribosomal rna, each sequence was shuffled 5 times and the MFE structure was calculated, while for the mrna and lncrna, each sequence was shuffled 1000 times. Graphs for the other parameters are in the appendix, Section B. 18 [PITH_FULL_IMAGE:figure… view at source ↗
read the original abstract

Researchers have repeatedly found that the ends of an RNA sequence are significantly closer than expected for a random linear chain. However, we prove that the ends of a branched structure are almost certainly close. Our results are obtained via combinatorial branching models of increasing complexity using tools from multivariate analytic combinatorics. We completely characterize parameters tracking end-to-end distance, including means and variances. Then, we compare to existing datasets of known RNA structures, as well as the minimum free-energy structures of randomized shuffles. We find that the shuffled structures resemble our theoretical distributions while the known RNA structures have similar parameter values but are more concentrated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper derives means, variances, and concentration results for end-to-end distance parameters directly from combinatorial branching models of increasing complexity via multivariate analytic combinatorics. These are first-principles generating-function characterizations, not obtained by fitting to the target RNA data. Empirical comparisons to known structures and independent shuffled MFE structures are presented only after the theoretical results and serve as external validation rather than inputs. No load-bearing step reduces to self-definition, a fitted parameter renamed as a prediction, or a self-citation chain; the central claim remains independent of the datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard tools of multivariate analytic combinatorics and combinatorial enumeration of branched structures; no free parameters or new entities are introduced in the abstract description.

axioms (1)
  • standard math Generating functions and singularity analysis can be applied to count end-to-end distances in branching models
    Invoked to derive means and variances for parameters tracking end-to-end distance

pith-pipeline@v0.9.0 · 5399 in / 1053 out tokens · 28351 ms · 2026-05-10T08:36:50.016997+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 24 canonical work pages

  1. [1]

    A two-length-scale polymer theory for RNA loop free energies and helix stacking

    Daniel P. Aalberts and Nagarajan Nandagopal. “A two-length-scale polymer theory for RNA loop free energies and helix stacking”. In:RNA16.7 (May 2010), pp. 1350–1355.issn: 1469-9001.doi: 10.1261/rna.1831710.url:http://dx.doi.org/10.1261/rna.1831710

  2. [2]

    Block decomposition of permutations and Schur- positivity

    Ron M. Adin, Eli Bagno, and Yuval Roichman. “Block decomposition of permutations and Schur- positivity”. In:Journal of Algebraic Combinatorics47.4 (June 2018), pp. 603–622.issn: 1572-9192.doi: 10.1007/s10801-017-0788-9.url:https://link.springer.com/content/pdf/10.1007/s10801- 017-0788-9.pdf

  3. [3]

    Local time for lattice paths and the associated limit laws

    Cyril Banderier and Michael Wallner. “Local time for lattice paths and the associated limit laws”. In: GASCOM 2018, Athens, Greece. June 2018, pp. 69–78

  4. [4]

    The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs

    Jamie J Cannone et al. “The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs”. In:BMC Bioinformatics 3.1 (Jan. 2002).issn: 1471-2105.doi:10.1186/1471-2105-3-2.url:http://dx.doi.org/10.1186/ 1471-2105-3-2

  5. [5]

    The Comparative RNA Web (CRW) Site – version 2: a database of covariation-based secondary structures

    C. X. Chan et al. “The Comparative RNA Web (CRW) Site – version 2: a database of covariation-based secondary structures”. Manuscript in preparation. 2023

  6. [6]

    Expected distance between terminal nucleotides of RNA secondary structures

    Peter Clote, Yann Ponty, and Jean-Marc Steyaert. “Expected distance between terminal nucleotides of RNA secondary structures”. In:Journal of Mathematical Biology65.3 (2011), pp. 581–599.issn: 1432- 1416.doi:10.1007/s00285-011-0467-8.url:http://dx.doi.org/10.1007/s00285-011-0467-8

  7. [7]

    Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction

    Robin D Dowell and Sean R Eddy. “Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction”. In:BMC Bioinformatics5.1 (2004).issn: 1471-2105.doi: 10.1186/1471-2105-5-71.url:http://dx.doi.org/10.1186/1471-2105-5-71

  8. [8]

    Restricted Motzkin permutations, Motzkin paths, continued frac- tions, and Chebyshev polynomials

    Sergi Elizalde and Toufik Mansour. “Restricted Motzkin permutations, Motzkin paths, continued frac- tions, and Chebyshev polynomials”. In:Discrete Mathematics305.1 (2005), pp. 170–189.issn: 0012- 365X.doi:https://doi.org/10.1016/j.disc.2005.10.010.url:https://www.sciencedirect. com/science/article/pii/S0012365X05005224

  9. [9]

    Making ends meet: New functions of mRNA secondary structure

    Dmitri N. Ermolenko and David H. Mathews. “Making ends meet: New functions of mRNA secondary structure”. In:WIREs RNA12.2 (2020).issn: 1757-7012.doi:10 . 1002 / wrna . 1611.url:http : //dx.doi.org/10.1002/wrna.1611

  10. [10]

    The end-to-end distance of RNA as a randomly self-paired polymer

    Li Tai Fang. “The end-to-end distance of RNA as a randomly self-paired polymer”. In:Journal of The- oretical Biology280.1 (2011), pp. 101–107.issn: 0022-5193.doi:https://doi.org/10.1016/j.jtbi. 2011.04.010.url:https://www.sciencedirect.com/science/article/pii/S0022519311002001

  11. [11]

    An analysis of large rRNA sequences folded by a thermodynamic method

    Dana S. Fields and Robin R. Gutell. “An analysis of large rRNA sequences folded by a thermodynamic method”. In:Folding and Design1.6 (1996), pp. 419–430.issn: 1359-0278.doi:https://doi.org/ 10.1016/S1359-0278(96)00058-2.url:https://www.sciencedirect.com/science/article/pii/ S1359027896000582

  12. [12]

    Cambridge: Cambridge University Press, 2009.isbn: 0521898064

    Phillipe Flajolet and Robert Sedgewick.Analytic combinatorics. Cambridge: Cambridge University Press, 2009.isbn: 0521898064

  13. [13]

    Force-Induced Denaturation of RNA

    Ulrich Gerland, Ralf Bundschuh, and Terence Hwa. “Force-Induced Denaturation of RNA”. In:Bio- physical Journal81.3 (2001), pp. 1324–1332

  14. [14]

    Rfam: an RNA family database

    S. Griffiths-Jones. “Rfam: an RNA family database”. In:Nucleic Acids Reseaarch31.1 (Jan. 2003), pp. 439–441.issn: 1362-4962.doi:10.1093/nar/gkg006.url:http://dx.doi.org/10.1093/nar/ gkg006

  15. [15]

    The 5’-3’ Distance of RNA Secondary Structures

    Hillary S.W. Han and Christian M. Reidys. “The 5’-3’ Distance of RNA Secondary Structures”. In: Journal of Computational Biology19.7 (2012). PMID: 22731624, pp. 867–878.doi:10.1089/cmb. 2011.0301. eprint:https://doi.org/10.1089/cmb.2011.0301.url:https://doi.org/10.1089/ cmb.2011.0301. 29

  16. [16]

    Combinatorics of RNA secondary structures

    Ivo L. Hofacker, Peter Schuster, and Peter F. Stadler. “Combinatorics of RNA secondary structures”. In:Discrete Applied Mathematics88.1 (1998). Computational Molecular Biology DAM - CMB Series, pp. 207–237.issn: 0166-218X.doi:https : / / doi . org / 10 . 1016 / S0166 - 218X(98 ) 00073 - 0.url: https://www.sciencedirect.com/science/article/pii/S0166218X98000730

  17. [17]

    uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts

    Minghui Jiang et al. “uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts”. In:BMC Bioinformatics9.1 (Apr. 2008).issn: 1471-2105.doi:10.1186/1471-2105-9-192. url:http://dx.doi.org/10.1186/1471-2105-9-192

  18. [18]

    Pfold: RNA secondary structure prediction using stochastic context- free grammars

    Bjarne Knudsen and Jotun Hein. “Pfold: RNA secondary structure prediction using stochastic context- free grammars”. In:Nucleic Acids Research31.13 (2003), pp. 3423–3428.issn: 0305-1048.doi:10. 1093/nar/gkg614.url:http://dx.doi.org/10.1093/nar/gkg614

  19. [19]

    A comparison of thermodynamic foldings with comparatively derived structures of 16S and 16S-like rRNAs

    D. A. M. Konings and Robin R. Gutell. “A comparison of thermodynamic foldings with comparatively derived structures of 16S and 16S-like rRNAs”. In:RNA1.6 (1995), pp. 559–574

  20. [20]

    J., Holmes, S

    C. Krattenthaler. “Permutations with Restricted Patterns and Dyck Paths”. In:Advances in Applied Mathematics27.2 (2001), pp. 510–530.issn: 0196-8858.doi:https://doi.org/10.1006/aama.2001. 0747.url:https://www.sciencedirect.com/science/article/pii/S019688580190747X

  21. [21]

    Singer, Xuguang Ai, Po-Ting Lai, Zhizheng Wang, Vip- ina K

    Wan-Jung C. Lai et al. “mRNAs and lncRNAs intrinsically form secondary structures with short end- to-end distances”. In:Nature Communications9.1 (Oct. 2018).issn: 2041-1723.doi:10.1038/s41467- 018-06792-z.url:http://dx.doi.org/10.1038/s41467-018-06792-z

  22. [22]

    The separation between the 5’-3’ ends in long RNA molecules is short and nearly constant

    Nehem´ ıas Leija-Mart´ ınez et al. “The separation between the 5’-3’ ends in long RNA molecules is short and nearly constant”. In:Nucleic Acids Research42.22 (Nov. 2014), pp. 13963–13968.issn: 0305-1048. doi:10.1093/nar/gku1249. eprint:https://academic.oup.com/nar/article-pdf/42/22/13963/ 17424887/gku1249.pdf.url:https://doi.org/10.1093/nar/gku1249

  23. [23]

    ViennaRNA Package 2.0

    Ronny Lorenz et al. “ViennaRNA Package 2.0”. In:Algorithms for Molecular Biology6.1 (Nov. 2011). issn: 1748-7188.doi:10.1186/1748-7188-6-26.url:http://dx.doi.org/10.1186/1748-7188-6- 26

  24. [24]

    Efficient calculation of exact probability distri- butions of integer features on RNA secondary structures

    Ryota Mori, Michiaki Hamada, and Kiyoshi Asai. “Efficient calculation of exact probability distri- butions of integer features on RNA secondary structures”. In:BMC Genomics15.10 (Dec. 2014), S6. issn: 1471-2164.doi:10.1186/1471-2164-15-S10-S6.url:https://link.springer.com/content/ pdf/10.1186/1471-2164-15-S10-S6.pdf

  25. [25]

    Asymptotic distribution of motifs in a stochastic context-free grammar model of RNA folding

    Svetlana Poznanovic and Christine E. Heitsch. “Asymptotic distribution of motifs in a stochastic context-free grammar model of RNA folding”. In:Journal of Mathematical Biology69.6-7 (2014). issn: 1432-1416.doi:10.1007/s00285-013-0750-y.url:http://dx.doi.org/10.1007/s00285- 013-0750-y

  26. [26]

    RNAstructure: software for RNA secondary structure prediction and analysis

    Jessica S. Reuter and David H. Mathews. “RNAstructure: software for RNA secondary structure prediction and analysis”. In:BMC Bioinformatics11 (2010), p. 129.doi:10.1186/1471-2105-11- 129

  27. [27]

    Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures

    Michael F. Sloma and David H. Mathews. “Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures”. In:RNA22.12 (Oct. 2016), pp. 1808–1818.issn: 1469- 9001.doi:10.1261/rna.053694.115.url:http://dx.doi.org/10.1261/rna.053694.115

  28. [28]

    From knotted to nested RNA structures: A variety of computational methods for pseudoknot removal

    Sandra Smit et al. “From knotted to nested RNA structures: A variety of computational methods for pseudoknot removal”. In:RNA14.3 (Jan. 2008), pp. 410–416.issn: 1469-9001.doi:10.1261/rna. 881308.url:http://dx.doi.org/10.1261/rna.881308

  29. [29]

    NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure

    Douglas H. Turner and David H. Mathews. “NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure”. In:Nucleic Acids Research38.Database issue (Oct. 2009), pp. D280–D282.issn: 0305-1048.doi:10.1093/nar/gkp892.url:https://doi.org/ 10.1093/nar/gkp892

  30. [30]

    The ends of a large RNA molecule are necessarily close

    Aron M. Yoffe et al. “The ends of a large RNA molecule are necessarily close”. In:Nucleic Acids Research39.1 (2010), pp. 292–299.issn: 0305-1048.doi:10.1093/nar/gkq642.url:http://dx.doi. org/10.1093/nar/gkq642. 30