pith. sign in

arxiv: 1907.05138 · v1 · pith:BW2A7XVLnew · submitted 2019-07-11 · 🧬 q-bio.PE · math.CO· physics.soc-ph· q-bio.QM

Distribution of outbreak sizes for SIR disease in finite populations

Pith reviewed 2026-05-24 22:43 UTC · model grok-4.3

classification 🧬 q-bio.PE math.COphysics.soc-phq-bio.QM
keywords SIR epidemicfinal size distributionfinite populationsoutbreak sizeparameter inferencereproductive numbertransmission distribution
0
0 comments X

The pith

An exact expression for the final size distribution of SIR epidemics holds in finite populations for arbitrary transmission distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a formula giving the probability that an SIR epidemic ends with any given number of cases in a closed population of fixed size. The formula applies when each infected individual causes a number of transmissions drawn independently from any fixed distribution. A sympathetic reader cares because the same expression can be used to estimate the parameters of the transmission distribution from final-size data collected across many independent small populations. The derivation also shows that inference faces identifiability problems: combinations of parameters that share the same reproductive number produce similar distributions and therefore require large numbers of observed outbreaks to separate.

Core claim

We derive an expression for the final size distribution of an SIR epidemic in a finite population. Our derivation allows arbitrary distributions of the number of transmissions caused by an infected individual. We show how this calculation can be used to infer parameters of the infectious disease through observations in multiple small populations. The inference suffers from some identifiability difficulties, and it requires many observations to distinguish between parameter combinations that correspond to the same reproductive number.

What carries the argument

The exact probability mass function for epidemic final size, computed by accounting for depletion of susceptibles while allowing arbitrary offspring distributions.

If this is right

  • The final size probabilities can be calculated exactly without simulation for any chosen transmission distribution.
  • Maximum-likelihood estimates of transmission parameters become available from collections of independent small-population outbreaks.
  • Different parameter sets that produce the same reproductive number remain distinguishable only when the number of observed outbreaks is large.
  • The method applies directly to household or school outbreak data where population size is known and small.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same expression could be used to test whether a given offspring distribution is consistent with data before fitting more complex models.
  • In the limit of many small populations the approach may yield tighter bounds on variance of transmission than aggregate data from one large population.
  • The identifiability warning implies that reproductive number alone is an insufficient summary statistic when only final sizes are observed.

Load-bearing premise

The process is a standard SIR epidemic in a closed finite population where each infected individual draws its number of transmissions independently from the same fixed distribution.

What would settle it

If the empirical distribution of final outbreak sizes across many small populations deviates from the probabilities predicted by the derived expression for any choice of transmission parameters, the claimed formula would be falsified.

Figures

Figures reproduced from arXiv: 1907.05138 by Joel C Miller.

Figure 1
Figure 1. Figure 1: Algorithm for generating a directed network from a population and known offspring [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A sample directed network generated by the algorithm of Fig. 1 (c.f., Fig. 6.14 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An illustration of the observation of Section 3.1: Because the out-component of 1 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the outcome for multiple introductions (in this case through two [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Scaled posterior probabilities for Table 1. The star denotes the true parameter [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Scaled posterior probabilities for ten times as many observations in each population [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Scaled posterior probabilities for the same simulations used in Figure 5 (the [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The inferred values of R0 as calculated assuming a Poisson distribution (solid) or Negative Binomial distribution (dashed) for the simulations of figure 5. The ’*’ denotes the actual location of R0. The difference in prediction of the two distributions is not very large. 7 Discussion We have shown that given a known distribution of the number of transmissions an arbitrary individual will cause in a finite … view at source ↗
read the original abstract

We consider the spread of a Susceptible-Infected-Recovered (SIR) disease through finite populations and derive an expression for the final size distribution. Our derivation allows arbitrary distributions of the number of transmissions caused by an infected individual. We show how this calculation can be used to infer parameters of the infectious disease through observations in multiple small populations. The inference suffers from some identifiability difficulties, and it requires many observations to distinguish between parameter combinations that correspond to the same reproductive number.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript derives an exact expression for the final size distribution of an SIR epidemic in a closed finite population, where each infected individual independently draws its number of secondary infections from an arbitrary but fixed distribution. It further shows how this distribution can be used to perform parameter inference from final size observations across multiple small populations, while noting identifiability challenges when distinguishing parameter sets with the same basic reproductive number.

Significance. If the derivation is correct, the result supplies a general, non-Poisson framework for exact final-size probabilities in finite populations that directly supports inference from small-population data. The explicit allowance for arbitrary offspring distributions and the clear statement of the reproductive-number identifiability limitation are both strengths; the work therefore supplies a usable computational tool rather than an approximation.

minor comments (3)
  1. [Abstract] The abstract states that the derivation 'allows arbitrary distributions' but does not indicate whether the final expression is given in closed form, as a recursion, or via generating functions; a single sentence clarifying the computational representation would help readers.
  2. The inference section would benefit from an explicit statement of the likelihood function or the numerical procedure used to obtain posterior distributions over parameters, even if only in a short paragraph or appendix.
  3. Notation for the offspring distribution (e.g., p_k versus the probability generating function) should be introduced once and used consistently in all subsequent equations and figures.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our manuscript and for recommending minor revision. The referee accurately captures the main contributions, including the exact final-size distribution for arbitrary offspring distributions and the identifiability issues for inference when reproductive numbers coincide. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation is independent first-principles result

full rationale

The paper derives the final-size distribution for a standard SIR process in closed finite populations where each infected individual draws transmissions independently from an arbitrary fixed distribution. This is a direct combinatorial/probabilistic calculation from the model definition, with no reduction of the claimed distribution to a fitted quantity, no self-citation load-bearing the central result, and no ansatz or uniqueness theorem imported from prior author work. The inference application is presented with explicit identifiability caveats rather than as a prediction forced by construction. The derivation is therefore self-contained against the stated modeling assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents exhaustive enumeration; the SIR finite-population framework and the existence of an arbitrary but fixed transmission distribution are the primary background assumptions invoked.

free parameters (1)
  • reproductive number
    Abstract states that multiple parameter combinations can share the same reproductive number, making it a fitted or target quantity for inference.
axioms (1)
  • domain assumption SIR process in closed finite populations with independent transmissions drawn from a fixed arbitrary distribution
    The entire derivation and inference application presuppose this standard compartmental model.

pith-pipeline@v0.9.0 · 5602 in / 1263 out tokens · 25276 ms · 2026-05-24T22:43:15.453104+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

  1. [1]

    A unified approach to the distribution of total size and total area under the trajectory of infectives in epidemic models

    Frank Ball. A unified approach to the distribution of total size and total area under the trajectory of infectives in epidemic models. Advances in Applied Probability, 18(2):289– 310, 1986. 20

  2. [2]

    Implementation and applications of EMOD, an individual- based multi-disease modeling platform

    Anna Bershteyn, Jaline Gerardin, Daniel Bridenbecker, Christopher W Lorton, Jonathan Bloedow, Robert S Baker, Guillaume Chabot-Couture, Ye Chen, Thomas Fischle, Kurt Frey, et al. Implementation and applications of EMOD, an individual- based multi-disease modeling platform. Pathogens and disease, 76(5):fty059, 2018

  3. [3]

    Chowell, M.A

    G. Chowell, M.A. Miller, and C. Viboud. Seasonal influenza in the united states, france, and australia: transmission and prospects for control. Epidemiology & Infection, 136(6):852–864, 2008

  4. [4]

    R. Durrett. Random graph dynamics. Cambridge University Press, 2007

  5. [5]

    Connectivity of inhomogeneous random K-out graphs

    Rashad Eletreby and Osman Ya˘ gan. Connectivity of inhomogeneous random k-out graphs. arXiv preprint arXiv:1810.09921 , 2018

  6. [6]

    Cambridge University Press, 2016

    Alan Frieze and Micha l Karo´ nski.Introduction to random graphs. Cambridge University Press, 2016

  7. [7]

    Mitigation strategies for pandemic influenza in the united states

    Timothy C Germann, Kai Kadau, Ira M Longini, and Catherine A Macken. Mitigation strategies for pandemic influenza in the united states. Proceedings of the National Academy of Sciences, 103(15):5935–5940, 2006

  8. [8]

    Travelling waves and spatial hierarchies in measles epidemics

    Bryan T Grenfell, Ottar N Bjørnstad, and Jens Kappey. Travelling waves and spatial hierarchies in measles epidemics. Nature, 414(6865):716, 2001

  9. [9]

    M. B. Hastings. Systematic series expansions for processes on networks. Physical Review Letters, 96(14):148701, 2006

  10. [10]

    A first course in Bayesian statistical methods

    Peter D Hoff. A first course in Bayesian statistical methods. Springer Science & Business Media, 2009

  11. [11]

    Plotting a set of functions using a ‘violin-plot’ style plot in python

    ImportanceOfBeingErnest (https://stackoverflow.com/users/4124317/importanceofbeingernest). Plotting a set of functions using a ‘violin-plot’ style plot in python. Stack Overflow. URL:https://stackoverflow.com/a/55886832/2966723 (version: 2019-04-28)

  12. [12]

    Eben Kenah and Joel C. Miller. Epidemic percolation networks, epidemic outcomes, and interventions. Interdisciplinary Perspectives on Infectious Diseases , 2011

  13. [13]

    Eben Kenah and James M. Robins. Second look at the spread of epidemics on networks. Physical Review E, 76(3):036113, 2007

  14. [14]

    Istvan Z Kiss, Joel C Miller, and P´ eter L Simon.Mathematics of epidemics on networks: from exact to approximate models . IAM. Springer, 2017

  15. [15]

    Dynamics and control of ebola virus transmission in montserrado, liberia: a mathematical modelling analysis

    Joseph A Lewnard, Martial L Ndeffo Mbah, Jorge A Alfaro-Murillo, Frederick L Altice, Luke Bawo, Tolbert G Nyenswah, and Alison P Galvani. Dynamics and control of ebola virus transmission in montserrado, liberia: a mathematical modelling analysis. The Lancet Infectious Diseases, 14(12):1189–1195, 2014. 21

  16. [16]

    Superspreading and the effect of individual variation on disease emergence

    James O Lloyd-Smith, Sebastian J Schreiber, P Ekkehard Kopp, and Wayne M Getz. Superspreading and the effect of individual variation on disease emergence. Nature, 438(7066):355, 2005

  17. [17]

    Generality of the final size formula for an epidemic of a newly invading infectious disease

    Junling Ma and David JD Earn. Generality of the final size formula for an epidemic of a newly invading infectious disease. Bulletin of mathematical biology , 68(3):679–702, 2006

  18. [18]

    Contact network epidemiology: Bond percolation applied to infectious disease prediction and control.Bulletin of the American Mathematical Society, 44(1):63–86, 2007

    Lauren Ancel Meyers. Contact network epidemiology: Bond percolation applied to infectious disease prediction and control.Bulletin of the American Mathematical Society, 44(1):63–86, 2007

  19. [19]

    Pourbohloul

    Lauren Ancel Meyers, Mark Newman, and B. Pourbohloul. Predicting epidemics on directed contact networks. Journal of Theoretical Biology, 240(3):400–418, June 2006

  20. [20]

    Lauren Ancel Meyers, Babak Pourbohloul, Mark E. J. Newman, Danuta M. Skowronski, and Robert C. Brunham. Network theory and SARS: predicting outbreak diversity. Journal of Theoretical Biology, 232(1):71–81, January 2005

  21. [21]

    Joel C. Miller. Epidemic size and probability in populations with heterogeneous infec- tivity and susceptibility. Physical Review E, 76(1):010101(R), 2007

  22. [22]

    Joel C. Miller. Bounding the size and probability of epidemics on networks. Journal of Applied Probability, 45:498–512, 2008

  23. [23]

    Joel C. Miller. A note on the derivation of epidemic final sizes. Bulletin of Mathematical Biology, 74(9):2125–2141, 2012

  24. [24]

    Joel C. Miller. A primer on the use of probability generating functions in infectious disease modeling. Infectious Disease Modelling, 3:192–248, 2018

  25. [25]

    The signature fea- tures of influenza pandemics—implications for policy.New England Journal of Medicine, 360(25):2595–2598, 2009

    Mark A Miller, Cecile Viboud, Marta Balinska, and Lone Simonsen. The signature fea- tures of influenza pandemics—implications for policy.New England Journal of Medicine, 360(25):2595–2598, 2009

  26. [26]

    Episims simulation of a multi-component strategy for pandemic influenza

    Susan M Mniszewski, Sara Y Del Valle, Phillip D Stroud, Jane M Riese, and Stephen J Sydoriak. Episims simulation of a multi-component strategy for pandemic influenza. In Proceedings of the 2008 Spring simulation multiconference , pages 556–563. Society for Computer Simulation International, 2008

  27. [27]

    M. E. J. Newman. Spread of epidemic disease on networks. Physical Review E , 66(1):016128, 2002

  28. [28]

    A moment-generating formula for Erd˝ os-R´ enyi component sizes.Electronic Communications in Probability, 23, 2018

    Bal´ azs R´ ath. A moment-generating formula for Erd˝ os-R´ enyi component sizes.Electronic Communications in Probability, 23, 2018. 22

  29. [29]

    Superspreading SARS events, beijing, 2003

    Zhuang Shen, Fang Ning, Weigong Zhou, Xiong He, Changying Lin, Daniel P Chin, Zonghan Zhu, and Anne Schuchat. Superspreading SARS events, beijing, 2003. Emerg- ing infectious diseases, 10(2):256, 2004. 23