pith. sign in

arxiv: 2511.14913 · v2 · submitted 2025-11-18 · ⚛️ physics.bio-ph · math.PR

Asymptotics of Protein Number Distribution in Stochastic Gene Expression Models under Burst Approximation

Pith reviewed 2026-05-17 20:20 UTC · model grok-4.3

classification ⚛️ physics.bio-ph math.PR
keywords stochastic gene expressionburst approximationprotein number distributionchemical master equationnegative binomial distributiongene statesasymptoticsfunctional analysis
0
0 comments X

The pith

Surrogate models with multiple gene states admit exact time-dependent solutions for protein number distributions under burst approximation

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives an analytical time-dependent solution to the chemical master equation for surrogate models of gene expression that incorporate the burst approximation, multiple gene states, and arbitrary burst size distributions. Using functional analysis on this solution, it establishes several properties of the resulting protein number distribution. For geometrically distributed burst sizes, the distribution is dominated by a scaled negative binomial form and becomes light-tailed under certain parameter choices. Efficient algorithms are constructed to compute the distribution in multiple computational settings. The error incurred by the burst approximation relative to full models is bounded using low-order moments of the distribution.

Core claim

We propose surrogate models with multiple gene states and arbitrary burst size distributions under the burst approximation. An analytical time-dependent solution to the chemical master equation is derived and exploited to establish fine properties of the protein number distribution via functional analysis. For geometrically distributed burst sizes the distribution is dominated by a scaled negative binomial distribution and is light-tailed in certain parameter regimes. Efficient algorithms enable fast calculation of the distribution, and the approximation error relative to full gene expression models is estimated in terms of low-order moments.

What carries the argument

The analytical time-dependent solution to the chemical master equation for the multi-state surrogate models with arbitrary burst sizes, which supports both functional analysis of distribution tails and moment-based error bounds.

If this is right

  • The protein number distribution can be computed efficiently in three distinct algorithmic settings.
  • For geometric burst sizes the distribution is dominated by a scaled negative binomial form.
  • The distribution is light-tailed in identifiable parameter regimes.
  • Approximation error to full models is bounded explicitly by low-order moment discrepancies.
  • Several fine properties of the distribution follow from functional analysis of the closed-form solution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The explicit solution could be used to derive closed-form expressions for higher moments or steady-state statistics without simulation.
  • The light-tailed regimes suggest that protein noise in certain gene circuits may be less extreme than heavy-tailed models predict, affecting downstream regulatory predictions.
  • The moment-based error estimate provides a practical diagnostic for when the burst approximation remains valid in experimental data fitting.
  • Similar surrogate constructions might apply to other bursty processes such as mRNA translation or signaling cascades.

Load-bearing premise

The burst approximation combined with the chosen surrogate models of multiple gene states and arbitrary burst size distributions sufficiently captures the essential dynamics of the original gene expression process.

What would settle it

A side-by-side comparison of the analytical protein number distribution against direct stochastic simulations of the full gene expression model, for parameter values where the low-order moment error is predicted to be small, would confirm or refute the claimed accuracy of the surrogate.

Figures

Figures reproduced from arXiv: 2511.14913 by Yuntao Lu, Yunxin Zhang.

Figure 1
Figure 1. Figure 1: An illustration of the notation P l1+···+lk=m l1,··· ,lk≥1 • in the case of m = 3. a computational perspective, a fast solver can be developed by looking into the recurrence rela￾tion among subsequent binomial moments. Note that, in general, direct computation based on (6) quickly becomes extremely expensive as m grows, because of the combinatorial enumeration needed to determine the integer partition of m… view at source ↗
Figure 2
Figure 2. Figure 2: Upper Bound for Binomial Moments and Probability Mass Function of Protein Copy Number  : In this illustration, parameters in (1) are set as follows: D0 =  −2.02 0.01 0.01 0.1 −7.2 0.1 0 0.01 −6.01  , D1 =   1 0 1 1 5 1 0 1 5  , δ = 1, and {Dr}r≥1 follows the geometric distribution with λ = 0.1. In the left panel, first 41 binomial moments (including B0 = 1) and the corresponding upper bound given i… view at source ↗
Figure 3
Figure 3. Figure 3: Probability Distribution of Protein Copy Number verified by Stochastic Simulation: In this example, D0, D1, and δ = 1 are the same as those in [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

The burst approximation is a widely used technique to simplify stochastic gene expression models. However, the dynamics and analytical properties of the protein number distribution in gene expression models under the burst approximation are barely studied. In this study, we propose and systematically analyze surrogate models with multiple gene states and arbitrary burst size distributions. An analytical time-dependent solution to the chemical master equation is derived and then exploited in two directions. Theoretically, several fine properties of the protein number distribution are established using functional analysis. For geometrically distributed burst sizes, the distribution is dominated by a scaled negative binomial distribution, and is light-tailed in certain parameter regimes. Computationally, we develop efficient algorithms in three settings, enabling fast calculation of the protein number distribution. Furthermore, the approximation error relative to full gene expression models is estimated in terms of low-order moments of the distribution, thereby clarifying the validity of the burst approximation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 4 minor

Summary. The paper introduces surrogate models for stochastic gene expression that incorporate the burst approximation, allowing multiple gene states and arbitrary burst size distributions. It derives an exact time-dependent solution to the associated chemical master equation, then applies functional-analytic techniques to establish properties of the protein-number distribution (domination by a scaled negative binomial for geometric bursts and light-tailed behavior in certain regimes). Efficient algorithms are constructed for three computational settings, and the approximation error relative to the full model is bounded using low-order moments.

Significance. If the derivations hold, the work supplies a mathematically rigorous justification for the burst approximation that is currently used heuristically in systems biology. The combination of closed-form generating-function solutions, functional-analysis tail bounds, and moment-based error estimates offers both theoretical insight into distribution asymptotics and practical tools for fast evaluation, strengthening the foundation for stochastic modeling of gene expression.

minor comments (4)
  1. [§3] §3 (or the section presenting the surrogate models): the precise definition of the multi-state Markov chain and the transition rates under the burst approximation should be stated explicitly before the generating-function derivation, to allow readers to verify the closure property without ambiguity.
  2. [Theoretical properties section] The statement that the distribution is 'light-tailed in certain parameter regimes' would benefit from an explicit characterization of those regimes (e.g., a condition on the burst rate or gene-state switching rates) rather than leaving the boundary implicit.
  3. [Computational algorithms section] The three algorithmic settings are mentioned but not enumerated; a short table or numbered list clarifying the input parameters, output quantities, and complexity for each setting would improve reproducibility.
  4. [Figures] Figure captions should indicate whether the plotted distributions are exact solutions of the surrogate model or numerical approximations, and should reference the corresponding theorem or algorithm.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our manuscript, as well as for recommending minor revision. The referee correctly identifies the core contributions: surrogate models under the burst approximation with multiple gene states and arbitrary burst distributions, the exact time-dependent solution to the chemical master equation, functional-analytic properties of the protein-number distribution (including domination by a scaled negative binomial for geometric bursts and light-tailed regimes), efficient algorithms for three computational settings, and moment-based bounds on the approximation error relative to the full model. We are pleased that these elements are recognized as providing both theoretical insight and practical tools. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from CME under burst approximation

full rationale

The paper derives an analytical time-dependent solution to the chemical master equation for surrogate models with multiple gene states and arbitrary burst size distributions. This is then used for functional analysis of distribution properties (e.g., domination by scaled negative binomial for geometric bursts) and error estimation via low-order moments. All steps rest on explicit generating-function or spectral representations that close under the stated approximations, with no reduction to fitted parameters, self-definitions, or load-bearing self-citations. The central claims are internally consistent and do not rely on unverified external uniqueness theorems or ansatzes smuggled via citation. The derivation starts from the chemical master equation and remains self-contained against external benchmarks without circular reductions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, specific free parameters, axioms, or invented entities are not detailed; the central claims rest on the validity of the burst approximation and the surrogate model construction, which are standard in the field but not independently verified here.

pith-pipeline@v0.9.0 · 5445 in / 1063 out tokens · 27118 ms · 2026-05-17T20:20:32.837505+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

  1. [1]

    ,Stochastic analysis of complex reaction networks using binomial moment equations, Phys. Rev. E, 86 (2012), p. 031126. [3]L. Cai, N. Friedman, and X. S. Xie,Stochastic protein expression in individual cells at the single molecule level, Nature, 440 (2006), pp. 358–362. [4]M. Chen, S. Luo, M. Cao, C. Guo, T. Zhou, and J. Zhang,Exact distributions for stoch...

  2. [2]

    ,Stochastic kinetics of mRNA molecules in a general transcription model, Biophys. J. (In press), (2025). [15]S. Luo, Z. Zhang, Z. Wang, X. Yang, X. Chen, T. Zhou, and J. Zhang,Inferring transcrip- tional bursting kinetics from single-cell snapshot data using a generalized telegraph model, R. Soc. Open Sci., 10 (2023), p. 221057. [16]H. Masuyama and T. Tak...