Distribution of outbreak sizes for SIR disease in finite populations

Joel C Miller

arxiv: 1907.05138 · v1 · pith:BW2A7XVLnew · submitted 2019-07-11 · 🧬 q-bio.PE · math.CO· physics.soc-ph· q-bio.QM

Distribution of outbreak sizes for SIR disease in finite populations

Joel C Miller This is my paper

Pith reviewed 2026-05-24 22:43 UTC · model grok-4.3

classification 🧬 q-bio.PE math.COphysics.soc-phq-bio.QM

keywords SIR epidemicfinal size distributionfinite populationsoutbreak sizeparameter inferencereproductive numbertransmission distribution

0 comments

The pith

An exact expression for the final size distribution of SIR epidemics holds in finite populations for arbitrary transmission distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a formula giving the probability that an SIR epidemic ends with any given number of cases in a closed population of fixed size. The formula applies when each infected individual causes a number of transmissions drawn independently from any fixed distribution. A sympathetic reader cares because the same expression can be used to estimate the parameters of the transmission distribution from final-size data collected across many independent small populations. The derivation also shows that inference faces identifiability problems: combinations of parameters that share the same reproductive number produce similar distributions and therefore require large numbers of observed outbreaks to separate.

Core claim

We derive an expression for the final size distribution of an SIR epidemic in a finite population. Our derivation allows arbitrary distributions of the number of transmissions caused by an infected individual. We show how this calculation can be used to infer parameters of the infectious disease through observations in multiple small populations. The inference suffers from some identifiability difficulties, and it requires many observations to distinguish between parameter combinations that correspond to the same reproductive number.

What carries the argument

The exact probability mass function for epidemic final size, computed by accounting for depletion of susceptibles while allowing arbitrary offspring distributions.

If this is right

The final size probabilities can be calculated exactly without simulation for any chosen transmission distribution.
Maximum-likelihood estimates of transmission parameters become available from collections of independent small-population outbreaks.
Different parameter sets that produce the same reproductive number remain distinguishable only when the number of observed outbreaks is large.
The method applies directly to household or school outbreak data where population size is known and small.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same expression could be used to test whether a given offspring distribution is consistent with data before fitting more complex models.
In the limit of many small populations the approach may yield tighter bounds on variance of transmission than aggregate data from one large population.
The identifiability warning implies that reproductive number alone is an insufficient summary statistic when only final sizes are observed.

Load-bearing premise

The process is a standard SIR epidemic in a closed finite population where each infected individual draws its number of transmissions independently from the same fixed distribution.

What would settle it

If the empirical distribution of final outbreak sizes across many small populations deviates from the probabilities predicted by the derived expression for any choice of transmission parameters, the claimed formula would be falsified.

Figures

Figures reproduced from arXiv: 1907.05138 by Joel C Miller.

**Figure 2.** Figure 2: A sample directed network generated by the algorithm of Fig. 1 (c.f., Fig. 6.14 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: An illustration of the observation of Section 3.1: Because the out-component of 1 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of the outcome for multiple introductions (in this case through two [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Scaled posterior probabilities for Table 1. The star denotes the true parameter [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Scaled posterior probabilities for ten times as many observations in each population [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Scaled posterior probabilities for the same simulations used in Figure 5 (the [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: The inferred values of R0 as calculated assuming a Poisson distribution (solid) or Negative Binomial distribution (dashed) for the simulations of figure 5. The ’*’ denotes the actual location of R0. The difference in prediction of the two distributions is not very large. 7 Discussion We have shown that given a known distribution of the number of transmissions an arbitrary individual will cause in a finite … view at source ↗

read the original abstract

We consider the spread of a Susceptible-Infected-Recovered (SIR) disease through finite populations and derive an expression for the final size distribution. Our derivation allows arbitrary distributions of the number of transmissions caused by an infected individual. We show how this calculation can be used to infer parameters of the infectious disease through observations in multiple small populations. The inference suffers from some identifiability difficulties, and it requires many observations to distinguish between parameter combinations that correspond to the same reproductive number.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Miller gives an exact final-size distribution for SIR in finite populations under arbitrary offspring distributions and shows its use for inference from multiple small outbreaks, with explicit identifiability limits.

read the letter

The main takeaway is that this paper derives the exact distribution of final outbreak sizes for SIR epidemics in closed finite populations when each infected individual draws its number of transmissions from any fixed distribution. It then shows how to use that distribution to estimate parameters from final-size data across several independent small populations. The identifiability note is already in the abstract: combinations that share the same reproductive number are hard to separate without many observations.

Referee Report

0 major / 3 minor

Summary. The manuscript derives an exact expression for the final size distribution of an SIR epidemic in a closed finite population, where each infected individual independently draws its number of secondary infections from an arbitrary but fixed distribution. It further shows how this distribution can be used to perform parameter inference from final size observations across multiple small populations, while noting identifiability challenges when distinguishing parameter sets with the same basic reproductive number.

Significance. If the derivation is correct, the result supplies a general, non-Poisson framework for exact final-size probabilities in finite populations that directly supports inference from small-population data. The explicit allowance for arbitrary offspring distributions and the clear statement of the reproductive-number identifiability limitation are both strengths; the work therefore supplies a usable computational tool rather than an approximation.

minor comments (3)

[Abstract] The abstract states that the derivation 'allows arbitrary distributions' but does not indicate whether the final expression is given in closed form, as a recursion, or via generating functions; a single sentence clarifying the computational representation would help readers.
The inference section would benefit from an explicit statement of the likelihood function or the numerical procedure used to obtain posterior distributions over parameters, even if only in a short paragraph or appendix.
Notation for the offspring distribution (e.g., p_k versus the probability generating function) should be introduced once and used consistently in all subsequent equations and figures.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our manuscript and for recommending minor revision. The referee accurately captures the main contributions, including the exact final-size distribution for arbitrary offspring distributions and the identifiability issues for inference when reproductive numbers coincide. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation is independent first-principles result

full rationale

The paper derives the final-size distribution for a standard SIR process in closed finite populations where each infected individual draws transmissions independently from an arbitrary fixed distribution. This is a direct combinatorial/probabilistic calculation from the model definition, with no reduction of the claimed distribution to a fitted quantity, no self-citation load-bearing the central result, and no ansatz or uniqueness theorem imported from prior author work. The inference application is presented with explicit identifiability caveats rather than as a prediction forced by construction. The derivation is therefore self-contained against the stated modeling assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents exhaustive enumeration; the SIR finite-population framework and the existence of an arbitrary but fixed transmission distribution are the primary background assumptions invoked.

free parameters (1)

reproductive number
Abstract states that multiple parameter combinations can share the same reproductive number, making it a fitted or target quantity for inference.

axioms (1)

domain assumption SIR process in closed finite populations with independent transmissions drawn from a fixed arbitrary distribution
The entire derivation and inference application presuppose this standard compartmental model.

pith-pipeline@v0.9.0 · 5602 in / 1263 out tokens · 25276 ms · 2026-05-24T22:43:15.453104+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

[1]

A uniﬁed approach to the distribution of total size and total area under the trajectory of infectives in epidemic models

Frank Ball. A uniﬁed approach to the distribution of total size and total area under the trajectory of infectives in epidemic models. Advances in Applied Probability, 18(2):289– 310, 1986. 20

work page 1986
[2]

Implementation and applications of EMOD, an individual- based multi-disease modeling platform

Anna Bershteyn, Jaline Gerardin, Daniel Bridenbecker, Christopher W Lorton, Jonathan Bloedow, Robert S Baker, Guillaume Chabot-Couture, Ye Chen, Thomas Fischle, Kurt Frey, et al. Implementation and applications of EMOD, an individual- based multi-disease modeling platform. Pathogens and disease, 76(5):fty059, 2018

work page 2018
[3]

Chowell, M.A

G. Chowell, M.A. Miller, and C. Viboud. Seasonal inﬂuenza in the united states, france, and australia: transmission and prospects for control. Epidemiology & Infection, 136(6):852–864, 2008

work page 2008
[4]

R. Durrett. Random graph dynamics. Cambridge University Press, 2007

work page 2007
[5]

Connectivity of inhomogeneous random K-out graphs

Rashad Eletreby and Osman Ya˘ gan. Connectivity of inhomogeneous random k-out graphs. arXiv preprint arXiv:1810.09921 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Cambridge University Press, 2016

Alan Frieze and Micha l Karo´ nski.Introduction to random graphs. Cambridge University Press, 2016

work page 2016
[7]

Mitigation strategies for pandemic inﬂuenza in the united states

Timothy C Germann, Kai Kadau, Ira M Longini, and Catherine A Macken. Mitigation strategies for pandemic inﬂuenza in the united states. Proceedings of the National Academy of Sciences, 103(15):5935–5940, 2006

work page 2006
[8]

Travelling waves and spatial hierarchies in measles epidemics

Bryan T Grenfell, Ottar N Bjørnstad, and Jens Kappey. Travelling waves and spatial hierarchies in measles epidemics. Nature, 414(6865):716, 2001

work page 2001
[9]

M. B. Hastings. Systematic series expansions for processes on networks. Physical Review Letters, 96(14):148701, 2006

work page 2006
[10]

A ﬁrst course in Bayesian statistical methods

Peter D Hoﬀ. A ﬁrst course in Bayesian statistical methods. Springer Science & Business Media, 2009

work page 2009
[11]

Plotting a set of functions using a ‘violin-plot’ style plot in python

ImportanceOfBeingErnest (https://stackoverﬂow.com/users/4124317/importanceofbeingernest). Plotting a set of functions using a ‘violin-plot’ style plot in python. Stack Overﬂow. URL:https://stackoverﬂow.com/a/55886832/2966723 (version: 2019-04-28)

work page arXiv 2019
[12]

Eben Kenah and Joel C. Miller. Epidemic percolation networks, epidemic outcomes, and interventions. Interdisciplinary Perspectives on Infectious Diseases , 2011

work page 2011
[13]

Eben Kenah and James M. Robins. Second look at the spread of epidemics on networks. Physical Review E, 76(3):036113, 2007

work page 2007
[14]

Istvan Z Kiss, Joel C Miller, and P´ eter L Simon.Mathematics of epidemics on networks: from exact to approximate models . IAM. Springer, 2017

work page 2017
[15]

Dynamics and control of ebola virus transmission in montserrado, liberia: a mathematical modelling analysis

Joseph A Lewnard, Martial L Ndeﬀo Mbah, Jorge A Alfaro-Murillo, Frederick L Altice, Luke Bawo, Tolbert G Nyenswah, and Alison P Galvani. Dynamics and control of ebola virus transmission in montserrado, liberia: a mathematical modelling analysis. The Lancet Infectious Diseases, 14(12):1189–1195, 2014. 21

work page 2014
[16]

Superspreading and the eﬀect of individual variation on disease emergence

James O Lloyd-Smith, Sebastian J Schreiber, P Ekkehard Kopp, and Wayne M Getz. Superspreading and the eﬀect of individual variation on disease emergence. Nature, 438(7066):355, 2005

work page 2005
[17]

Generality of the ﬁnal size formula for an epidemic of a newly invading infectious disease

Junling Ma and David JD Earn. Generality of the ﬁnal size formula for an epidemic of a newly invading infectious disease. Bulletin of mathematical biology , 68(3):679–702, 2006

work page 2006
[18]

Contact network epidemiology: Bond percolation applied to infectious disease prediction and control.Bulletin of the American Mathematical Society, 44(1):63–86, 2007

Lauren Ancel Meyers. Contact network epidemiology: Bond percolation applied to infectious disease prediction and control.Bulletin of the American Mathematical Society, 44(1):63–86, 2007

work page 2007
[19]

Pourbohloul

Lauren Ancel Meyers, Mark Newman, and B. Pourbohloul. Predicting epidemics on directed contact networks. Journal of Theoretical Biology, 240(3):400–418, June 2006

work page 2006
[20]

Lauren Ancel Meyers, Babak Pourbohloul, Mark E. J. Newman, Danuta M. Skowronski, and Robert C. Brunham. Network theory and SARS: predicting outbreak diversity. Journal of Theoretical Biology, 232(1):71–81, January 2005

work page 2005
[21]

Joel C. Miller. Epidemic size and probability in populations with heterogeneous infec- tivity and susceptibility. Physical Review E, 76(1):010101(R), 2007

work page 2007
[22]

Joel C. Miller. Bounding the size and probability of epidemics on networks. Journal of Applied Probability, 45:498–512, 2008

work page 2008
[23]

Joel C. Miller. A note on the derivation of epidemic ﬁnal sizes. Bulletin of Mathematical Biology, 74(9):2125–2141, 2012

work page 2012
[24]

Joel C. Miller. A primer on the use of probability generating functions in infectious disease modeling. Infectious Disease Modelling, 3:192–248, 2018

work page 2018
[25]

The signature fea- tures of inﬂuenza pandemics—implications for policy.New England Journal of Medicine, 360(25):2595–2598, 2009

Mark A Miller, Cecile Viboud, Marta Balinska, and Lone Simonsen. The signature fea- tures of inﬂuenza pandemics—implications for policy.New England Journal of Medicine, 360(25):2595–2598, 2009

work page 2009
[26]

Episims simulation of a multi-component strategy for pandemic inﬂuenza

Susan M Mniszewski, Sara Y Del Valle, Phillip D Stroud, Jane M Riese, and Stephen J Sydoriak. Episims simulation of a multi-component strategy for pandemic inﬂuenza. In Proceedings of the 2008 Spring simulation multiconference , pages 556–563. Society for Computer Simulation International, 2008

work page 2008
[27]

M. E. J. Newman. Spread of epidemic disease on networks. Physical Review E , 66(1):016128, 2002

work page 2002
[28]

A moment-generating formula for Erd˝ os-R´ enyi component sizes.Electronic Communications in Probability, 23, 2018

Bal´ azs R´ ath. A moment-generating formula for Erd˝ os-R´ enyi component sizes.Electronic Communications in Probability, 23, 2018. 22

work page 2018
[29]

Superspreading SARS events, beijing, 2003

Zhuang Shen, Fang Ning, Weigong Zhou, Xiong He, Changying Lin, Daniel P Chin, Zonghan Zhu, and Anne Schuchat. Superspreading SARS events, beijing, 2003. Emerg- ing infectious diseases, 10(2):256, 2004. 23

work page 2003

[1] [1]

A uniﬁed approach to the distribution of total size and total area under the trajectory of infectives in epidemic models

Frank Ball. A uniﬁed approach to the distribution of total size and total area under the trajectory of infectives in epidemic models. Advances in Applied Probability, 18(2):289– 310, 1986. 20

work page 1986

[2] [2]

Implementation and applications of EMOD, an individual- based multi-disease modeling platform

Anna Bershteyn, Jaline Gerardin, Daniel Bridenbecker, Christopher W Lorton, Jonathan Bloedow, Robert S Baker, Guillaume Chabot-Couture, Ye Chen, Thomas Fischle, Kurt Frey, et al. Implementation and applications of EMOD, an individual- based multi-disease modeling platform. Pathogens and disease, 76(5):fty059, 2018

work page 2018

[3] [3]

Chowell, M.A

G. Chowell, M.A. Miller, and C. Viboud. Seasonal inﬂuenza in the united states, france, and australia: transmission and prospects for control. Epidemiology & Infection, 136(6):852–864, 2008

work page 2008

[4] [4]

R. Durrett. Random graph dynamics. Cambridge University Press, 2007

work page 2007

[5] [5]

Connectivity of inhomogeneous random K-out graphs

Rashad Eletreby and Osman Ya˘ gan. Connectivity of inhomogeneous random k-out graphs. arXiv preprint arXiv:1810.09921 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Cambridge University Press, 2016

Alan Frieze and Micha l Karo´ nski.Introduction to random graphs. Cambridge University Press, 2016

work page 2016

[7] [7]

Mitigation strategies for pandemic inﬂuenza in the united states

Timothy C Germann, Kai Kadau, Ira M Longini, and Catherine A Macken. Mitigation strategies for pandemic inﬂuenza in the united states. Proceedings of the National Academy of Sciences, 103(15):5935–5940, 2006

work page 2006

[8] [8]

Travelling waves and spatial hierarchies in measles epidemics

Bryan T Grenfell, Ottar N Bjørnstad, and Jens Kappey. Travelling waves and spatial hierarchies in measles epidemics. Nature, 414(6865):716, 2001

work page 2001

[9] [9]

M. B. Hastings. Systematic series expansions for processes on networks. Physical Review Letters, 96(14):148701, 2006

work page 2006

[10] [10]

A ﬁrst course in Bayesian statistical methods

Peter D Hoﬀ. A ﬁrst course in Bayesian statistical methods. Springer Science & Business Media, 2009

work page 2009

[11] [11]

Plotting a set of functions using a ‘violin-plot’ style plot in python

ImportanceOfBeingErnest (https://stackoverﬂow.com/users/4124317/importanceofbeingernest). Plotting a set of functions using a ‘violin-plot’ style plot in python. Stack Overﬂow. URL:https://stackoverﬂow.com/a/55886832/2966723 (version: 2019-04-28)

work page arXiv 2019

[12] [12]

Eben Kenah and Joel C. Miller. Epidemic percolation networks, epidemic outcomes, and interventions. Interdisciplinary Perspectives on Infectious Diseases , 2011

work page 2011

[13] [13]

Eben Kenah and James M. Robins. Second look at the spread of epidemics on networks. Physical Review E, 76(3):036113, 2007

work page 2007

[14] [14]

Istvan Z Kiss, Joel C Miller, and P´ eter L Simon.Mathematics of epidemics on networks: from exact to approximate models . IAM. Springer, 2017

work page 2017

[15] [15]

Dynamics and control of ebola virus transmission in montserrado, liberia: a mathematical modelling analysis

Joseph A Lewnard, Martial L Ndeﬀo Mbah, Jorge A Alfaro-Murillo, Frederick L Altice, Luke Bawo, Tolbert G Nyenswah, and Alison P Galvani. Dynamics and control of ebola virus transmission in montserrado, liberia: a mathematical modelling analysis. The Lancet Infectious Diseases, 14(12):1189–1195, 2014. 21

work page 2014

[16] [16]

Superspreading and the eﬀect of individual variation on disease emergence

James O Lloyd-Smith, Sebastian J Schreiber, P Ekkehard Kopp, and Wayne M Getz. Superspreading and the eﬀect of individual variation on disease emergence. Nature, 438(7066):355, 2005

work page 2005

[17] [17]

Generality of the ﬁnal size formula for an epidemic of a newly invading infectious disease

Junling Ma and David JD Earn. Generality of the ﬁnal size formula for an epidemic of a newly invading infectious disease. Bulletin of mathematical biology , 68(3):679–702, 2006

work page 2006

[18] [18]

Contact network epidemiology: Bond percolation applied to infectious disease prediction and control.Bulletin of the American Mathematical Society, 44(1):63–86, 2007

Lauren Ancel Meyers. Contact network epidemiology: Bond percolation applied to infectious disease prediction and control.Bulletin of the American Mathematical Society, 44(1):63–86, 2007

work page 2007

[19] [19]

Pourbohloul

Lauren Ancel Meyers, Mark Newman, and B. Pourbohloul. Predicting epidemics on directed contact networks. Journal of Theoretical Biology, 240(3):400–418, June 2006

work page 2006

[20] [20]

Lauren Ancel Meyers, Babak Pourbohloul, Mark E. J. Newman, Danuta M. Skowronski, and Robert C. Brunham. Network theory and SARS: predicting outbreak diversity. Journal of Theoretical Biology, 232(1):71–81, January 2005

work page 2005

[21] [21]

Joel C. Miller. Epidemic size and probability in populations with heterogeneous infec- tivity and susceptibility. Physical Review E, 76(1):010101(R), 2007

work page 2007

[22] [22]

Joel C. Miller. Bounding the size and probability of epidemics on networks. Journal of Applied Probability, 45:498–512, 2008

work page 2008

[23] [23]

Joel C. Miller. A note on the derivation of epidemic ﬁnal sizes. Bulletin of Mathematical Biology, 74(9):2125–2141, 2012

work page 2012

[24] [24]

Joel C. Miller. A primer on the use of probability generating functions in infectious disease modeling. Infectious Disease Modelling, 3:192–248, 2018

work page 2018

[25] [25]

The signature fea- tures of inﬂuenza pandemics—implications for policy.New England Journal of Medicine, 360(25):2595–2598, 2009

Mark A Miller, Cecile Viboud, Marta Balinska, and Lone Simonsen. The signature fea- tures of inﬂuenza pandemics—implications for policy.New England Journal of Medicine, 360(25):2595–2598, 2009

work page 2009

[26] [26]

Episims simulation of a multi-component strategy for pandemic inﬂuenza

Susan M Mniszewski, Sara Y Del Valle, Phillip D Stroud, Jane M Riese, and Stephen J Sydoriak. Episims simulation of a multi-component strategy for pandemic inﬂuenza. In Proceedings of the 2008 Spring simulation multiconference , pages 556–563. Society for Computer Simulation International, 2008

work page 2008

[27] [27]

M. E. J. Newman. Spread of epidemic disease on networks. Physical Review E , 66(1):016128, 2002

work page 2002

[28] [28]

A moment-generating formula for Erd˝ os-R´ enyi component sizes.Electronic Communications in Probability, 23, 2018

Bal´ azs R´ ath. A moment-generating formula for Erd˝ os-R´ enyi component sizes.Electronic Communications in Probability, 23, 2018. 22

work page 2018

[29] [29]

Superspreading SARS events, beijing, 2003

Zhuang Shen, Fang Ning, Weigong Zhou, Xiong He, Changying Lin, Daniel P Chin, Zonghan Zhu, and Anne Schuchat. Superspreading SARS events, beijing, 2003. Emerg- ing infectious diseases, 10(2):256, 2004. 23

work page 2003