Efficient Sampling in Disease Surveillance through Subpopulations: Sampling Canaries in the Coal Mine
Pith reviewed 2026-05-24 01:10 UTC · model grok-4.3
The pith
Sampling subpopulations with higher baseline disease risk increases detection efficiency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The relative sampling efficiency between two subpopulations is inversely proportional to the ratio of their respective baseline disease risks. This implies one can increase sampling efficiency by sampling from the subpopulation with higher baseline disease risk. The results require careful treatment of the power curves of exact binomial tests as a function of their sample size, which are non-monotonic due to the underlying discreteness.
What carries the argument
The inverse proportionality between relative sampling efficiency and the ratio of baseline disease risks.
If this is right
- Choosing the subpopulation with higher baseline risk improves sampling efficiency for outbreak detection.
- The efficiency relationship holds after accounting for non-monotonic power curves in exact binomial tests.
- Stratified sampling can be optimized by focusing on groups like age cohorts or travelers with elevated risks.
- The approach is illustrated in a case study of COVID-19 cases in the Netherlands.
Where Pith is reading between the lines
- Similar logic might apply to other surveillance tasks where detection power depends on baseline rates.
- Resource allocation in public health could prioritize high-risk groups for routine testing to maximize early warning.
- Extensions could explore how this interacts with varying subpopulation sizes or costs of sampling.
Load-bearing premise
The assumptions under which the relative sampling efficiency is inversely proportional to the baseline disease risk ratio, including the specific treatment of non-monotonic power curves of exact binomial tests.
What would settle it
A simulation or empirical study showing that the relative efficiency does not follow the inverse ratio when sampling from higher versus lower risk subpopulations under the model's conditions.
Figures
read the original abstract
We consider outbreak detection settings of endemic diseases where the population under study consists of various subpopulations available for stratified surveillance. These subpopulations can for example be based on age cohorts, but may also correspond to other subgroups of the population under study such as international travellers. Rather than sampling uniformly across the population, one may elevate the effectiveness of the detection methodology by optimally choosing a sampling subpopulation. We show (under some assumptions) the relative sampling efficiency between two subpopulations is inversely proportional to the ratio of their respective baseline disease risks. This implies one can increase sampling efficiency by sampling from the subpopulation with higher baseline disease risk. Our results require careful treatment of the power curves of exact binomial tests as a function of their sample size, which are non-monotonic due to the underlying discreteness. A case study of COVID-19 cases in the Netherlands illustrates our theoretical findings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript considers stratified sampling for outbreak detection in endemic diseases, where subpopulations (e.g., age cohorts or travelers) have different baseline risks. It claims that, under some assumptions, the relative sampling efficiency between two subpopulations is inversely proportional to the ratio of their baseline disease risks. This is said to imply that sampling from the higher-risk subpopulation increases efficiency. The work emphasizes the need to handle non-monotonic power curves of exact binomial tests arising from discreteness and includes a COVID-19 case study in the Netherlands.
Significance. If the stated relationship holds after correction of the apparent inconsistency, the result would supply a simple rule for choosing subpopulations in surveillance, potentially improving detection power for a given sample size. The explicit treatment of non-monotonic exact-test power curves is a technical strength that distinguishes the work from standard large-sample approximations.
major comments (1)
- [Abstract] Abstract: the sentence 'the relative sampling efficiency between two subpopulations is inversely proportional to the ratio of their respective baseline disease risks' implies that if risk_A > risk_B then efficiency_A < efficiency_B (under any standard definition of relative efficiency as a ratio). The immediately following sentence asserts the opposite practical implication. This contradiction is load-bearing because the entire recommendation for subpopulation selection rests on the direction of the claimed relationship.
minor comments (1)
- The assumptions required for the proportionality result are described only as 'some assumptions' in the abstract; a clear statement of these assumptions (including how non-monotonicity is handled) should appear in the main text, preferably with a derivation or theorem statement.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for identifying the inconsistency in the abstract. We agree that the current wording creates a contradiction between the stated mathematical relationship and the practical implication, and we will revise the abstract to resolve this.
read point-by-point responses
-
Referee: [Abstract] Abstract: the sentence 'the relative sampling efficiency between two subpopulations is inversely proportional to the ratio of their respective baseline disease risks' implies that if risk_A > risk_B then efficiency_A < efficiency_B (under any standard definition of relative efficiency as a ratio). The immediately following sentence asserts the opposite practical implication. This contradiction is load-bearing because the entire recommendation for subpopulation selection rests on the direction of the claimed relationship.
Authors: We acknowledge the referee's observation. The abstract's use of 'inversely proportional to the ratio of their respective baseline disease risks' does produce the logical implication described (eff_A < eff_B when risk_A > risk_B), which conflicts with the following sentence recommending sampling from the higher-risk subpopulation. This appears to be an imprecise phrasing in the abstract alone. The manuscript body derives the efficiency relationship under the stated assumptions and uses the 'canaries in the coal mine' framing to indicate that higher baseline risk improves detection efficiency. We will revise the abstract to eliminate the contradiction, for instance by rephrasing the proportionality statement and explicitly defining the ratio so that the mathematical claim aligns with the practical recommendation. This change will appear in the revised manuscript. revision: yes
Circularity Check
No circularity; derivation presented as independent mathematical result under assumptions
full rationale
The paper derives the claimed proportionality from assumptions on the non-monotonic power curves of exact binomial tests as a function of sample size. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations are present in the provided text. The central claim is positioned as following from those assumptions rather than reducing to them by construction, and the derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Power curves of exact binomial tests are non-monotonic due to the underlying discreteness
Reference graph
Works this paper leans on
-
[1]
Anderson, T. W. and S. M. Samuels (1967). Some inequalities among binomial and Poisson probabilities. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , Volume 1: Statistics , Volume 5, pp.\ 1--13. University of California Press
work page 1967
-
[2]
Bao, L., X. Niu, and Y. Zhang (2022). What we can learn from the exported cases in detecting disease outbreaks-a case study of the COVID-19 epidemic. Annals of Epidemiology\/ 75 , 67--72
work page 2022
-
[3]
Brook, D. and D. A. Evans (1972, December). An approach to the probability distribution of cusum run length. Biometrika\/ 59\/ (3), 539--549
work page 1972
-
[4]
Brown, L. D., T. T. Cai, and A. DasGupta (2001, May). Interval Estimation for a Binomial Proportion . Statistical Science\/ 16\/ (2), 101--133
work page 2001
-
[5]
Brown, L. D., T. T. Cai, and A. DasGupta (2002, February). Confidence Intervals for a binomial proportion and asymptotic expansions. Annals of Statistics\/ 30\/ (1), 160--201
work page 2002
-
[6]
Chernick, M. R. and C. Y. Liu (2012). The Saw-Toothed Behavior of Power Versus Sample Size and Software Solutions . The American Statistician\/ 56\/ (2), 149--155
work page 2012
-
[7]
Dato, V., M. M. Wagner, and A. Fapohunda (2004). How Outbreaks of Infectious Disease Are Detected : A Review of Surveillance Systems and Outbreaks . Public Health Reports (1974-)\/ 119\/ (5), 464--471
work page 2004
-
[8]
Ferguson, J. M., J. B. Langebrake, V. L. Cannataro, A. J. Garcia, and E. A. Hamman (2014). Optimal Sampling Strategies for Detecting Zoonotic Disease Epidemics . PLoS Comput Biol\/ 10\/ (6), e1003668
work page 2014
-
[9]
Freedman, D. O. and K. Leder (2006). Influenza: Changing Approaches to Prevention and Treatment in Travelers . Journal of Travel Medicine\/ 12\/ (1), 36--44
work page 2006
-
[10]
Fukusumi, M., T. Arashiro, Y. Arima, T. Matsui, T. Shimada, H. Kinoshita, A. Arashiro, T. Takasaki, T. Sunagawa, and K. Oishi (2016, August). Dengue Sentinel Traveler Surveillance : Monthly and Yearly Notification Trends among Japanese Travelers , 2006-2014. PLOS Neglected Tropical Diseases\/ 10\/ (8), e0004924
work page 2016
-
[11]
Hamer, D. H., A. Rizwan, D. O. Freedman, P. Kozarsky, and M. Libman (2020, December). GeoSentinel : Past, present and future. Journal of Travel Medicine\/ 27\/ (8), taaa219
work page 2020
-
[12]
Hawkins, D. M. and D. H. Olwell (1998). Cumulative Sum Charts and Charting for Quality Improvement . Springer New York
work page 1998
-
[13]
Heidema, S., I. V. Stoepker, R. Huits, and E. R. Van den Heuvel (2024). The Poisson CUSUM chart with estimation uncertainty for monitoring small counts. In preparation
work page 2024
-
[14]
Herzog, F. (1947). 4186. The American Mathematical Monthly\/ 54\/ (8), 485--487
work page 1947
-
[15]
Hoeffding, W. (1956). On the Distribution of the Number of Successes in Independent Trials . Annals of Mathematical Statistics\/ 27\/ (3), 713--721
work page 1956
-
[16]
H \"o hle, M. and M. Paul (2008, May). Count data regression charts for the monitoring of surveillance time series. Computational Statistics & Data Analysis\/ 52\/ (9), 4357--4368
work page 2008
-
[17]
Jiang, W., L. Shu, H. Zhao, and K.-L. Tsui (2013, October). CUSUM Procedures for Health Care Surveillance . Quality and Reliability Engineering International\/ 29\/ (6), 883--897
work page 2013
-
[18]
Lau, H., T. Khosrawipour, P. Kocbach, H. Ichii, J. Bania, and V. Khosrawipour (2021, March). Evaluating the massive underreporting and undertesting of COVID-19 cases in multiple global epicenters. Pulmonology\/ 27\/ (2), 110--115
work page 2021
-
[19]
Le Cam, L. (1960, January). An approximation theorem for the Poisson binomial distribution. Pacific Journal of Mathematics\/ 10\/ (4), 1181--1197
work page 1960
-
[20]
Leder, K., M. P. Grobusch, P. Gautret, L. H. Chen, S. Kuhn, L. Lim, J. Yates, A. E. Mccarthy, C. Rothe, Y. Kato, E. Bottieau, K. Huber, E. Schwartz, W. Stauffer, D. Malvy, M. T. M. Shaw, C. Rapp, L. Blumberg, M. Jensenius, P. J. J. Van Genderen, and D. H. Hamer (2017). Zika beyond the Americas : Travelers as sentinels of Zika virus transmission. A GeoSent...
work page 2017
-
[21]
Lombard, J. S. and D. L. Buckeridge (2007). Disease Surveillance : A Public Health Informatics Approach . John Wiley & Sons
work page 2007
-
[22]
Lorden, G. (1971). Procedures for Reacting to a Change in Distribution . Annals of Mathematical Statistics\/ 42\/ (6), 1897--1908
work page 1971
-
[23]
Lucas, J. M. (1985). Counted Data CUSUM 's. Technometrics\/ 27\/ (2), 129--144
work page 1985
- [24]
-
[25]
Murray, J. and A. L. Cohen (2017). Infectious Disease Surveillance . In International Encyclopedia of Public Health ( Second Edition ) , pp.\ 222--229. Academic Press
work page 2017
-
[26]
R: A Language and Environment for Statistical Computing
R Core Team (2021). R: A Language and Environment for Statistical Computing . Vienna, Austria: R Foundation for Statistical Computing
work page 2021
-
[27]
Rogerson, P. A. and I. Yamanda (2004). Approaches to syndromic surveillance when data consist of small regional counts. MMWR Supplement\/ 53 , 79--85
work page 2004
-
[28]
Salmon, M., D. Schumacher, and M. H \"o hle (2016). Monitoring Count Time Series in R : Aberration Detection in Public Health Surveillance . Journal of Statistical Software\/ 70\/ (10), 1--35
work page 2016
-
[29]
Taylor-Salmon , Emma , V. Hill, L. M. Paul, R. T. Koch, M. I. Breban, C. Chaguza, A. Sodeinde, J. L. Warren, S. Bunch, N. Cano, M. Cone, S. Eysoldt, A. Garcia, N. Gilles, A. Hagy, L. Heberlein, R. Jaber, E. Kassens, P. Colarusso, A. Davis, S. Baudin, E. Rico, \'A . Mej \'i a-Echeverri , B. Scott, D. Stanek, R. Zimler, J. L. Mu \ n oz-Jord \'a n , G. A. Sa...
work page 2023
-
[30]
Unkel, S., C. P. Farrington, P. H. Garthwaite, C. Robertson, and N. Andrews (2012). Statistical methods for the prospective detection of infectious disease outbreaks: A review. Journal of the Royal Statistical Society. Series A (Statistics in Society)\/ 175\/ (1), 49--82
work page 2012
-
[31]
Wang, P., T. Hu, H. Liu, and X. Zhu (2022). Exploring the impact of under-reported cases on the COVID-19 spatiotemporal distributions using healthcare workers infection data. Cities\/ 123 , 103593
work page 2022
-
[32]
Wilson, M. E. (2003). The traveller and emerging infections: Sentinel, courier, transmitter. Journal of Applied Mictobiology\/ 94\/ (s1), 1--11
work page 2003
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.