Asymptotics of Protein Number Distribution in Stochastic Gene Expression Models under Burst Approximation

Yuntao Lu; Yunxin Zhang

arxiv: 2511.14913 · v2 · submitted 2025-11-18 · ⚛️ physics.bio-ph · math.PR

Asymptotics of Protein Number Distribution in Stochastic Gene Expression Models under Burst Approximation

Yuntao Lu , Yunxin Zhang This is my paper

Pith reviewed 2026-05-17 20:20 UTC · model grok-4.3

classification ⚛️ physics.bio-ph math.PR

keywords stochastic gene expressionburst approximationprotein number distributionchemical master equationnegative binomial distributiongene statesasymptoticsfunctional analysis

0 comments

The pith

Surrogate models with multiple gene states admit exact time-dependent solutions for protein number distributions under burst approximation

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives an analytical time-dependent solution to the chemical master equation for surrogate models of gene expression that incorporate the burst approximation, multiple gene states, and arbitrary burst size distributions. Using functional analysis on this solution, it establishes several properties of the resulting protein number distribution. For geometrically distributed burst sizes, the distribution is dominated by a scaled negative binomial form and becomes light-tailed under certain parameter choices. Efficient algorithms are constructed to compute the distribution in multiple computational settings. The error incurred by the burst approximation relative to full models is bounded using low-order moments of the distribution.

Core claim

We propose surrogate models with multiple gene states and arbitrary burst size distributions under the burst approximation. An analytical time-dependent solution to the chemical master equation is derived and exploited to establish fine properties of the protein number distribution via functional analysis. For geometrically distributed burst sizes the distribution is dominated by a scaled negative binomial distribution and is light-tailed in certain parameter regimes. Efficient algorithms enable fast calculation of the distribution, and the approximation error relative to full gene expression models is estimated in terms of low-order moments.

What carries the argument

The analytical time-dependent solution to the chemical master equation for the multi-state surrogate models with arbitrary burst sizes, which supports both functional analysis of distribution tails and moment-based error bounds.

If this is right

The protein number distribution can be computed efficiently in three distinct algorithmic settings.
For geometric burst sizes the distribution is dominated by a scaled negative binomial form.
The distribution is light-tailed in identifiable parameter regimes.
Approximation error to full models is bounded explicitly by low-order moment discrepancies.
Several fine properties of the distribution follow from functional analysis of the closed-form solution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit solution could be used to derive closed-form expressions for higher moments or steady-state statistics without simulation.
The light-tailed regimes suggest that protein noise in certain gene circuits may be less extreme than heavy-tailed models predict, affecting downstream regulatory predictions.
The moment-based error estimate provides a practical diagnostic for when the burst approximation remains valid in experimental data fitting.
Similar surrogate constructions might apply to other bursty processes such as mRNA translation or signaling cascades.

Load-bearing premise

The burst approximation combined with the chosen surrogate models of multiple gene states and arbitrary burst size distributions sufficiently captures the essential dynamics of the original gene expression process.

What would settle it

A side-by-side comparison of the analytical protein number distribution against direct stochastic simulations of the full gene expression model, for parameter values where the low-order moment error is predicted to be small, would confirm or refute the claimed accuracy of the surrogate.

Figures

Figures reproduced from arXiv: 2511.14913 by Yuntao Lu, Yunxin Zhang.

**Figure 1.** Figure 1: An illustration of the notation P l1+···+lk=m l1,··· ,lk≥1 • in the case of m = 3. a computational perspective, a fast solver can be developed by looking into the recurrence relation among subsequent binomial moments. Note that, in general, direct computation based on (6) quickly becomes extremely expensive as m grows, because of the combinatorial enumeration needed to determine the integer partition of m… view at source ↗

**Figure 2.** Figure 2: Upper Bound for Binomial Moments and Probability Mass Function of Protein Copy Number  : In this illustration, parameters in (1) are set as follows: D0 =  −2.02 0.01 0.01 0.1 −7.2 0.1 0 0.01 −6.01  , D1 =   1 0 1 1 5 1 0 1 5  , δ = 1, and {Dr}r≥1 follows the geometric distribution with λ = 0.1. In the left panel, first 41 binomial moments (including B0 = 1) and the corresponding upper bound given i… view at source ↗

**Figure 3.** Figure 3: Probability Distribution of Protein Copy Number verified by Stochastic Simulation: In this example, D0, D1, and δ = 1 are the same as those in [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

The burst approximation is a widely used technique to simplify stochastic gene expression models. However, the dynamics and analytical properties of the protein number distribution in gene expression models under the burst approximation are barely studied. In this study, we propose and systematically analyze surrogate models with multiple gene states and arbitrary burst size distributions. An analytical time-dependent solution to the chemical master equation is derived and then exploited in two directions. Theoretically, several fine properties of the protein number distribution are established using functional analysis. For geometrically distributed burst sizes, the distribution is dominated by a scaled negative binomial distribution, and is light-tailed in certain parameter regimes. Computationally, we develop efficient algorithms in three settings, enabling fast calculation of the protein number distribution. Furthermore, the approximation error relative to full gene expression models is estimated in terms of low-order moments of the distribution, thereby clarifying the validity of the burst approximation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Derives explicit time-dependent solutions for protein distributions in multi-state burst models, with functional-analytic properties and moment-based error bounds to the full process.

read the letter

This paper derives analytical time-dependent solutions to the chemical master equation for surrogate multi-state gene expression models under the burst approximation, allowing arbitrary burst size distributions. They then use those solutions to establish distribution properties and build fast algorithms, plus moment-based error estimates against the original models. The main advance is closing the equations to get generating-function or spectral forms that evolve in time without simulation, which is new relative to the usual steady-state focus in this literature. For geometric bursts they show domination by a scaled negative binomial and light-tailed behavior in certain regimes via functional analysis; the error bounds follow directly from comparing low-order moments of the full and surrogate processes without extra uniformity assumptions. That part is clean and practical. The algorithms in three settings are a useful byproduct for computation. The softer spots are modest: everything still sits inside the burst approximation, so the error estimates clarify validity but do not replace checking against more detailed transcription models or data. No benchmarks against standard numerical methods appear, so the speed claims remain qualitative for now. The derivations look internally consistent from the stress-test details. This is for readers already working on stochastic models of gene expression who need analytical handles or faster distribution calculations rather than pure Monte Carlo. A mathematical biologist or computational modeler focused on bursting kinetics would get direct value from the explicit solutions and tail results. It deserves a serious referee because the math is self-contained, the error analysis is reproducible from the moment equations, and the claims are checkable. I would send it out for review; the central constructions hold up and fill a narrow but real gap in analytical tools for these models.

Referee Report

0 major / 4 minor

Summary. The paper introduces surrogate models for stochastic gene expression that incorporate the burst approximation, allowing multiple gene states and arbitrary burst size distributions. It derives an exact time-dependent solution to the associated chemical master equation, then applies functional-analytic techniques to establish properties of the protein-number distribution (domination by a scaled negative binomial for geometric bursts and light-tailed behavior in certain regimes). Efficient algorithms are constructed for three computational settings, and the approximation error relative to the full model is bounded using low-order moments.

Significance. If the derivations hold, the work supplies a mathematically rigorous justification for the burst approximation that is currently used heuristically in systems biology. The combination of closed-form generating-function solutions, functional-analysis tail bounds, and moment-based error estimates offers both theoretical insight into distribution asymptotics and practical tools for fast evaluation, strengthening the foundation for stochastic modeling of gene expression.

minor comments (4)

[§3] §3 (or the section presenting the surrogate models): the precise definition of the multi-state Markov chain and the transition rates under the burst approximation should be stated explicitly before the generating-function derivation, to allow readers to verify the closure property without ambiguity.
[Theoretical properties section] The statement that the distribution is 'light-tailed in certain parameter regimes' would benefit from an explicit characterization of those regimes (e.g., a condition on the burst rate or gene-state switching rates) rather than leaving the boundary implicit.
[Computational algorithms section] The three algorithmic settings are mentioned but not enumerated; a short table or numbered list clarifying the input parameters, output quantities, and complexity for each setting would improve reproducibility.
[Figures] Figure captions should indicate whether the plotted distributions are exact solutions of the surrogate model or numerical approximations, and should reference the corresponding theorem or algorithm.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our manuscript, as well as for recommending minor revision. The referee correctly identifies the core contributions: surrogate models under the burst approximation with multiple gene states and arbitrary burst distributions, the exact time-dependent solution to the chemical master equation, functional-analytic properties of the protein-number distribution (including domination by a scaled negative binomial for geometric bursts and light-tailed regimes), efficient algorithms for three computational settings, and moment-based bounds on the approximation error relative to the full model. We are pleased that these elements are recognized as providing both theoretical insight and practical tools. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from CME under burst approximation

full rationale

The paper derives an analytical time-dependent solution to the chemical master equation for surrogate models with multiple gene states and arbitrary burst size distributions. This is then used for functional analysis of distribution properties (e.g., domination by scaled negative binomial for geometric bursts) and error estimation via low-order moments. All steps rest on explicit generating-function or spectral representations that close under the stated approximations, with no reduction to fitted parameters, self-definitions, or load-bearing self-citations. The central claims are internally consistent and do not rely on unverified external uniqueness theorems or ansatzes smuggled via citation. The derivation starts from the chemical master equation and remains self-contained against external benchmarks without circular reductions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, specific free parameters, axioms, or invented entities are not detailed; the central claims rest on the validity of the burst approximation and the surrogate model construction, which are standard in the field but not independently verified here.

pith-pipeline@v0.9.0 · 5445 in / 1063 out tokens · 27118 ms · 2026-05-17T20:20:32.837505+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We conclude that the steady-state distribution of protein copy number is bounded from above by a constant multiple of some negative binomial distribution if the burst size is geometrically distributed.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Analytical solution to the chemical master equation is provided... recurrence relation among binomial moments

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

,Stochastic analysis of complex reaction networks using binomial moment equations, Phys. Rev. E, 86 (2012), p. 031126. [3]L. Cai, N. Friedman, and X. S. Xie,Stochastic protein expression in individual cells at the single molecule level, Nature, 440 (2006), pp. 358–362. [4]M. Chen, S. Luo, M. Cao, C. Guo, T. Zhou, and J. Zhang,Exact distributions for stoch...

work page arXiv 2012
[2]

,Stochastic kinetics of mRNA molecules in a general transcription model, Biophys. J. (In press), (2025). [15]S. Luo, Z. Zhang, Z. Wang, X. Yang, X. Chen, T. Zhou, and J. Zhang,Inferring transcrip- tional bursting kinetics from single-cell snapshot data using a generalized telegraph model, R. Soc. Open Sci., 10 (2023), p. 221057. [16]H. Masuyama and T. Tak...

work page 2025

[1] [1]

,Stochastic analysis of complex reaction networks using binomial moment equations, Phys. Rev. E, 86 (2012), p. 031126. [3]L. Cai, N. Friedman, and X. S. Xie,Stochastic protein expression in individual cells at the single molecule level, Nature, 440 (2006), pp. 358–362. [4]M. Chen, S. Luo, M. Cao, C. Guo, T. Zhou, and J. Zhang,Exact distributions for stoch...

work page arXiv 2012

[2] [2]

,Stochastic kinetics of mRNA molecules in a general transcription model, Biophys. J. (In press), (2025). [15]S. Luo, Z. Zhang, Z. Wang, X. Yang, X. Chen, T. Zhou, and J. Zhang,Inferring transcrip- tional bursting kinetics from single-cell snapshot data using a generalized telegraph model, R. Soc. Open Sci., 10 (2023), p. 221057. [16]H. Masuyama and T. Tak...

work page 2025