arxiv: 2604.15599 · v1 · submitted 2026-04-17 · 🧮 math.CO

Making ends meet or just meeting at the ends? Assessing end-to-end distance in folded RNA sequences and other branched structures

Torin Greenwood , Christine Heitsch This is my paper

Pith reviewed 2026-05-10 08:36 UTC · model grok-4.3

classification 🧮 math.CO

keywords RNA foldingbranched structuresend-to-end distancecombinatorial modelsanalytic combinatoricsmultivariate generating functionssequence analysis

0 comments

The pith

The ends of branched structures like folded RNA are almost certainly close together.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that the observed closeness between the ends of folded RNA sequences follows directly from their branched topology rather than requiring special biological features. The authors develop combinatorial models of branching with increasing detail and apply multivariate analytic combinatorics to prove that end-to-end distances remain small with high probability. They derive exact expressions for the mean and variance of these distances and test the resulting distributions against both real RNA datasets and randomized shuffles of the same sequences. Readers should care because the result reframes an empirical puzzle in molecular biology as a generic consequence of branching, shifting attention from why ends meet to how real structures deviate from the generic case.

Core claim

Using combinatorial branching models of increasing complexity and multivariate analytic combinatorics, we prove that the ends of branched structures are almost certainly close. We completely characterize parameters tracking end-to-end distance, including means and variances. Comparisons to existing datasets of known RNA structures and minimum free-energy structures of randomized shuffles show that shuffled structures resemble the theoretical distributions while known RNA structures have similar parameter values but are more concentrated.

What carries the argument

Combinatorial branching models of increasing complexity, analyzed with multivariate analytic combinatorics, that generate and track the distribution of end-to-end distances.

If this is right

End-to-end distance remains bounded on average even as the number of branches or sequence length increases.
The variance of end-to-end distance is finite and can be computed explicitly from the model parameters.
Randomized shuffles of RNA sequences produce minimum free-energy structures whose distance statistics closely match the theoretical predictions.
Known biological RNA structures show means and variances similar to the models but with tighter concentration around small distances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Branching geometry by itself may suffice to explain end proximity in many other folded or tree-like molecules without invoking additional selection pressures.
The same analytic approach could be applied to non-RNA branched polymers or to different branching rules to test how sensitive the closeness result is to model details.
Discrepancies between real RNA data and the models could be used to identify which structural features beyond simple branching are biologically significant.

Load-bearing premise

The combinatorial branching models of increasing complexity accurately represent the geometry and connectivity of real folded RNA sequences.

What would settle it

Measuring end-to-end distances in large ensembles of real or simulated branched RNA structures that grow linearly with sequence length, as they do for a random linear chain, would refute the claim.

Figures

Figures reproduced from arXiv: 2604.15599 by Christine Heitsch, Torin Greenwood.

**Figure 1.** Figure 1: An rna secondary structure, as in the radial diagram (center), consists of runs of stacked base pairs, called helices, separated by singled-stranded regions called loops. Since no pairings cross, the arrangement of helices can be abstracted to a plane tree, which is equivalent to a Dyck path under a pre-order walk. A secondary structure is written as a dot-bracket sequence (equivalently a Motzkin path) by … view at source ↗

**Figure 2.** Figure 2: The length of the first helix in an rna secondary structure is equal to the depth of the corresponding Motzkin path or Dyck path, and the length of the first stem corresponds to the lowest valley or plateau excluding the starting and ending sides of the mountain. 4.1 Closing helix length in Dyck and Motzkin paths In terms of Dyck and Motzkin paths, HEL corresponds to the height of its lowest valley or plat… view at source ↗

**Figure 3.** Figure 3: A decomposition of Dyck paths according to the first return to the line [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: The distribution of DEG versus UNP across the ArchiveII dataset. The intensity of each color represents the percent of structures with the corresponding DEG and UNP values. Colors represent approximate ETE distances, color-coded as red: 1.5 - 2.5, orange: 2.5-3.5, yellow; 3.5-4.5, green: 4.5-5.5, blue: 5.5-6.5, purple: 6.5-7.5. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: The distribution of LEN among sequences in each database. On the left, the original sequences. On the right, shuffled sequences. For the ArchiveII data and the ribosomal rna, each sequence was shuffled 5 times and the MFE structure was calculated, while for the mrna and lncrna, each sequence was shuffled 1000 times. Graphs for the other parameters are in the appendix, Section B. 18 [PITH_FULL_IMAGE:figure… view at source ↗

read the original abstract

Researchers have repeatedly found that the ends of an RNA sequence are significantly closer than expected for a random linear chain. However, we prove that the ends of a branched structure are almost certainly close. Our results are obtained via combinatorial branching models of increasing complexity using tools from multivariate analytic combinatorics. We completely characterize parameters tracking end-to-end distance, including means and variances. Then, we compare to existing datasets of known RNA structures, as well as the minimum free-energy structures of randomized shuffles. We find that the shuffled structures resemble our theoretical distributions while the known RNA structures have similar parameter values but are more concentrated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Branched combinatorial models for RNA put the ends close with high probability, and the paper gives exact characterizations plus data checks to explain the observation.

read the letter

The main thing to know is that this work proves the ends of branched structures are close almost surely under their models, which directly accounts for the repeated finding in real RNA without needing extra assumptions about folding. They do this by extending analytic combinatorics tools to branching models of rising complexity, fully pinning down the mean and variance of end-to-end distance parameters. That step is new relative to earlier linear-chain treatments. The comparisons to known RNA structures and to minimum free-energy folds of shuffled sequences then show the shuffles track the theoretical distributions while the real ones sit tighter, which is a useful empirical anchor. The combinatorial derivations look clean and first-principles, with no evident circularity in the setup. The main soft spot is that the data side stays high-level in the abstract, so the exact degree of match and any statistical measures are not visible yet; if the full paper has only summary plots, that section would benefit from more concrete numbers or tests on revision. The models themselves are abstractions, but the shuffle baseline mitigates the risk that they are just fitting noise. This is worth a reading group for anyone working on enumeration of branched polymers or on RNA structural statistics. A serious editor should send it to referees because the central claim is sharp, the math is grounded, and the result supplies a baseline that biologists and designers can actually use.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper derives means, variances, and concentration results for end-to-end distance parameters directly from combinatorial branching models of increasing complexity via multivariate analytic combinatorics. These are first-principles generating-function characterizations, not obtained by fitting to the target RNA data. Empirical comparisons to known structures and independent shuffled MFE structures are presented only after the theoretical results and serve as external validation rather than inputs. No load-bearing step reduces to self-definition, a fitted parameter renamed as a prediction, or a self-citation chain; the central claim remains independent of the datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard tools of multivariate analytic combinatorics and combinatorial enumeration of branched structures; no free parameters or new entities are introduced in the abstract description.

axioms (1)

standard math Generating functions and singularity analysis can be applied to count end-to-end distances in branching models
Invoked to derive means and variances for parameters tracking end-to-end distance

pith-pipeline@v0.9.0 · 5399 in / 1053 out tokens · 28351 ms · 2026-05-10T08:36:50.016997+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 24 canonical work pages

[1]

A two-length-scale polymer theory for RNA loop free energies and helix stacking

Daniel P. Aalberts and Nagarajan Nandagopal. “A two-length-scale polymer theory for RNA loop free energies and helix stacking”. In:RNA16.7 (May 2010), pp. 1350–1355.issn: 1469-9001.doi: 10.1261/rna.1831710.url:http://dx.doi.org/10.1261/rna.1831710

work page doi:10.1261/rna.1831710.url:http://dx.doi.org/10.1261/rna.1831710 2010
[2]

Block decomposition of permutations and Schur- positivity

Ron M. Adin, Eli Bagno, and Yuval Roichman. “Block decomposition of permutations and Schur- positivity”. In:Journal of Algebraic Combinatorics47.4 (June 2018), pp. 603–622.issn: 1572-9192.doi: 10.1007/s10801-017-0788-9.url:https://link.springer.com/content/pdf/10.1007/s10801- 017-0788-9.pdf

work page doi:10.1007/s10801-017-0788-9.url:https://link.springer.com/content/pdf/10.1007/s10801- 2018
[3]

Local time for lattice paths and the associated limit laws

Cyril Banderier and Michael Wallner. “Local time for lattice paths and the associated limit laws”. In: GASCOM 2018, Athens, Greece. June 2018, pp. 69–78

2018
[4]

The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs

Jamie J Cannone et al. “The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs”. In:BMC Bioinformatics 3.1 (Jan. 2002).issn: 1471-2105.doi:10.1186/1471-2105-3-2.url:http://dx.doi.org/10.1186/ 1471-2105-3-2

work page doi:10.1186/1471-2105-3-2.url:http://dx.doi.org/10.1186/ 2002
[5]

The Comparative RNA Web (CRW) Site – version 2: a database of covariation-based secondary structures

C. X. Chan et al. “The Comparative RNA Web (CRW) Site – version 2: a database of covariation-based secondary structures”. Manuscript in preparation. 2023

2023
[6]

Expected distance between terminal nucleotides of RNA secondary structures

Peter Clote, Yann Ponty, and Jean-Marc Steyaert. “Expected distance between terminal nucleotides of RNA secondary structures”. In:Journal of Mathematical Biology65.3 (2011), pp. 581–599.issn: 1432- 1416.doi:10.1007/s00285-011-0467-8.url:http://dx.doi.org/10.1007/s00285-011-0467-8

work page doi:10.1007/s00285-011-0467-8.url:http://dx.doi.org/10.1007/s00285-011-0467-8 2011
[7]

Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction

Robin D Dowell and Sean R Eddy. “Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction”. In:BMC Bioinformatics5.1 (2004).issn: 1471-2105.doi: 10.1186/1471-2105-5-71.url:http://dx.doi.org/10.1186/1471-2105-5-71

work page doi:10.1186/1471-2105-5-71.url:http://dx.doi.org/10.1186/1471-2105-5-71 2004
[8]

Restricted Motzkin permutations, Motzkin paths, continued frac- tions, and Chebyshev polynomials

Sergi Elizalde and Toufik Mansour. “Restricted Motzkin permutations, Motzkin paths, continued frac- tions, and Chebyshev polynomials”. In:Discrete Mathematics305.1 (2005), pp. 170–189.issn: 0012- 365X.doi:https://doi.org/10.1016/j.disc.2005.10.010.url:https://www.sciencedirect. com/science/article/pii/S0012365X05005224

work page doi:10.1016/j.disc.2005.10.010.url:https://www.sciencedirect 2005
[9]

Making ends meet: New functions of mRNA secondary structure

Dmitri N. Ermolenko and David H. Mathews. “Making ends meet: New functions of mRNA secondary structure”. In:WIREs RNA12.2 (2020).issn: 1757-7012.doi:10 . 1002 / wrna . 1611.url:http : //dx.doi.org/10.1002/wrna.1611

work page doi:10.1002/wrna.1611 2020
[10]

The end-to-end distance of RNA as a randomly self-paired polymer

Li Tai Fang. “The end-to-end distance of RNA as a randomly self-paired polymer”. In:Journal of The- oretical Biology280.1 (2011), pp. 101–107.issn: 0022-5193.doi:https://doi.org/10.1016/j.jtbi. 2011.04.010.url:https://www.sciencedirect.com/science/article/pii/S0022519311002001

work page doi:10.1016/j.jtbi 2011
[11]

An analysis of large rRNA sequences folded by a thermodynamic method

Dana S. Fields and Robin R. Gutell. “An analysis of large rRNA sequences folded by a thermodynamic method”. In:Folding and Design1.6 (1996), pp. 419–430.issn: 1359-0278.doi:https://doi.org/ 10.1016/S1359-0278(96)00058-2.url:https://www.sciencedirect.com/science/article/pii/ S1359027896000582

work page doi:10.1016/s1359-0278(96)00058-2.url:https://www.sciencedirect.com/science/article/pii/ 1996
[12]

Cambridge: Cambridge University Press, 2009.isbn: 0521898064

Phillipe Flajolet and Robert Sedgewick.Analytic combinatorics. Cambridge: Cambridge University Press, 2009.isbn: 0521898064

2009
[13]

Force-Induced Denaturation of RNA

Ulrich Gerland, Ralf Bundschuh, and Terence Hwa. “Force-Induced Denaturation of RNA”. In:Bio- physical Journal81.3 (2001), pp. 1324–1332

2001
[14]

Rfam: an RNA family database

S. Griffiths-Jones. “Rfam: an RNA family database”. In:Nucleic Acids Reseaarch31.1 (Jan. 2003), pp. 439–441.issn: 1362-4962.doi:10.1093/nar/gkg006.url:http://dx.doi.org/10.1093/nar/ gkg006

work page doi:10.1093/nar/gkg006.url:http://dx.doi.org/10.1093/nar/ 2003
[15]

The 5’-3’ Distance of RNA Secondary Structures

Hillary S.W. Han and Christian M. Reidys. “The 5’-3’ Distance of RNA Secondary Structures”. In: Journal of Computational Biology19.7 (2012). PMID: 22731624, pp. 867–878.doi:10.1089/cmb. 2011.0301. eprint:https://doi.org/10.1089/cmb.2011.0301.url:https://doi.org/10.1089/ cmb.2011.0301. 29

work page doi:10.1089/cmb 2012
[16]

Combinatorics of RNA secondary structures

Ivo L. Hofacker, Peter Schuster, and Peter F. Stadler. “Combinatorics of RNA secondary structures”. In:Discrete Applied Mathematics88.1 (1998). Computational Molecular Biology DAM - CMB Series, pp. 207–237.issn: 0166-218X.doi:https : / / doi . org / 10 . 1016 / S0166 - 218X(98 ) 00073 - 0.url: https://www.sciencedirect.com/science/article/pii/S0166218X98000730

1998
[17]

uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts

Minghui Jiang et al. “uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts”. In:BMC Bioinformatics9.1 (Apr. 2008).issn: 1471-2105.doi:10.1186/1471-2105-9-192. url:http://dx.doi.org/10.1186/1471-2105-9-192

work page doi:10.1186/1471-2105-9-192 2008
[18]

Pfold: RNA secondary structure prediction using stochastic context- free grammars

Bjarne Knudsen and Jotun Hein. “Pfold: RNA secondary structure prediction using stochastic context- free grammars”. In:Nucleic Acids Research31.13 (2003), pp. 3423–3428.issn: 0305-1048.doi:10. 1093/nar/gkg614.url:http://dx.doi.org/10.1093/nar/gkg614

work page doi:10.1093/nar/gkg614 2003
[19]

A comparison of thermodynamic foldings with comparatively derived structures of 16S and 16S-like rRNAs

D. A. M. Konings and Robin R. Gutell. “A comparison of thermodynamic foldings with comparatively derived structures of 16S and 16S-like rRNAs”. In:RNA1.6 (1995), pp. 559–574

1995
[20]

J., Holmes, S

C. Krattenthaler. “Permutations with Restricted Patterns and Dyck Paths”. In:Advances in Applied Mathematics27.2 (2001), pp. 510–530.issn: 0196-8858.doi:https://doi.org/10.1006/aama.2001. 0747.url:https://www.sciencedirect.com/science/article/pii/S019688580190747X

work page doi:10.1006/aama.2001 2001
[21]

Singer, Xuguang Ai, Po-Ting Lai, Zhizheng Wang, Vip- ina K

Wan-Jung C. Lai et al. “mRNAs and lncRNAs intrinsically form secondary structures with short end- to-end distances”. In:Nature Communications9.1 (Oct. 2018).issn: 2041-1723.doi:10.1038/s41467- 018-06792-z.url:http://dx.doi.org/10.1038/s41467-018-06792-z

work page doi:10.1038/s41467- 2018
[22]

The separation between the 5’-3’ ends in long RNA molecules is short and nearly constant

Nehem´ ıas Leija-Mart´ ınez et al. “The separation between the 5’-3’ ends in long RNA molecules is short and nearly constant”. In:Nucleic Acids Research42.22 (Nov. 2014), pp. 13963–13968.issn: 0305-1048. doi:10.1093/nar/gku1249. eprint:https://academic.oup.com/nar/article-pdf/42/22/13963/ 17424887/gku1249.pdf.url:https://doi.org/10.1093/nar/gku1249

work page doi:10.1093/nar/gku1249 2014
[23]

ViennaRNA Package 2.0

Ronny Lorenz et al. “ViennaRNA Package 2.0”. In:Algorithms for Molecular Biology6.1 (Nov. 2011). issn: 1748-7188.doi:10.1186/1748-7188-6-26.url:http://dx.doi.org/10.1186/1748-7188-6- 26

work page doi:10.1186/1748-7188-6-26.url:http://dx.doi.org/10.1186/1748-7188-6- 2011
[24]

Efficient calculation of exact probability distri- butions of integer features on RNA secondary structures

Ryota Mori, Michiaki Hamada, and Kiyoshi Asai. “Efficient calculation of exact probability distri- butions of integer features on RNA secondary structures”. In:BMC Genomics15.10 (Dec. 2014), S6. issn: 1471-2164.doi:10.1186/1471-2164-15-S10-S6.url:https://link.springer.com/content/ pdf/10.1186/1471-2164-15-S10-S6.pdf

work page doi:10.1186/1471-2164-15-s10-s6.url:https://link.springer.com/content/ 2014
[25]

Asymptotic distribution of motifs in a stochastic context-free grammar model of RNA folding

Svetlana Poznanovic and Christine E. Heitsch. “Asymptotic distribution of motifs in a stochastic context-free grammar model of RNA folding”. In:Journal of Mathematical Biology69.6-7 (2014). issn: 1432-1416.doi:10.1007/s00285-013-0750-y.url:http://dx.doi.org/10.1007/s00285- 013-0750-y

work page doi:10.1007/s00285-013-0750-y.url:http://dx.doi.org/10.1007/s00285- 2014
[26]

RNAstructure: software for RNA secondary structure prediction and analysis

Jessica S. Reuter and David H. Mathews. “RNAstructure: software for RNA secondary structure prediction and analysis”. In:BMC Bioinformatics11 (2010), p. 129.doi:10.1186/1471-2105-11- 129

work page doi:10.1186/1471-2105-11- 2010
[27]

Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures

Michael F. Sloma and David H. Mathews. “Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures”. In:RNA22.12 (Oct. 2016), pp. 1808–1818.issn: 1469- 9001.doi:10.1261/rna.053694.115.url:http://dx.doi.org/10.1261/rna.053694.115

work page doi:10.1261/rna.053694.115.url:http://dx.doi.org/10.1261/rna.053694.115 2016
[28]

From knotted to nested RNA structures: A variety of computational methods for pseudoknot removal

Sandra Smit et al. “From knotted to nested RNA structures: A variety of computational methods for pseudoknot removal”. In:RNA14.3 (Jan. 2008), pp. 410–416.issn: 1469-9001.doi:10.1261/rna. 881308.url:http://dx.doi.org/10.1261/rna.881308

work page doi:10.1261/rna 2008
[29]

NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure

Douglas H. Turner and David H. Mathews. “NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure”. In:Nucleic Acids Research38.Database issue (Oct. 2009), pp. D280–D282.issn: 0305-1048.doi:10.1093/nar/gkp892.url:https://doi.org/ 10.1093/nar/gkp892

work page doi:10.1093/nar/gkp892.url:https://doi.org/ 2009
[30]

The ends of a large RNA molecule are necessarily close

Aron M. Yoffe et al. “The ends of a large RNA molecule are necessarily close”. In:Nucleic Acids Research39.1 (2010), pp. 292–299.issn: 0305-1048.doi:10.1093/nar/gkq642.url:http://dx.doi. org/10.1093/nar/gkq642. 30

work page doi:10.1093/nar/gkq642.url:http://dx.doi 2010