Estimation of relative risk, odds ratio and their logarithms with guaranteed accuracy and controlled sample size ratio
Pith reviewed 2026-05-23 01:06 UTC · model grok-4.3
The pith
Two-stage sequential sampling yields estimators for relative risk and odds ratio that bound relative mean-square error below any target for all probabilities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The estimators guarantee that the relative mean-square error, or the mean-square error for the logarithmic versions, is less than a target value for any p1, p2 ∈ (0,1), and the ratio of average sample sizes from the two populations is close to a prescribed value.
What carries the argument
Two-stage sequential sampling procedure applied to each population, with second-stage sizes depending on first-stage results to enforce the error bound.
If this is right
- The efficiency is close to the Cramér-Rao bound, particularly for small target errors.
- The estimators can be used with group sampling in batches from both populations simultaneously.
- The guarantees hold for any values of the success probabilities in (0,1).
- Sample sizes adapt dynamically based on initial observations.
Where Pith is reading between the lines
- If implemented, this could minimize sample sizes needed in comparative studies while providing accuracy guarantees.
- Similar adaptive sampling might extend to estimating other functions of binomial parameters.
- The controlled sample size ratio could be useful in resource-limited settings with unequal population access.
Load-bearing premise
It is possible to select second-stage sample sizes from the first-stage results such that the error bound holds uniformly over all possible probabilities.
What would settle it
Finding any pair of probabilities p1 and p2 where the relative mean-square error of the estimator exceeds the target value in repeated trials would falsify the guarantee.
read the original abstract
Given two populations from which independent binary observations are taken with parameters $p_1$ and $p_2$ respectively, estimators are proposed for the relative risk $p_1/p_2$, the odds ratio $p_1(1-p_2)/(p_2(1-p_1))$ and their logarithms. The sampling strategy used by the estimators is based on two-stage sequential sampling applied to each population, where the sample sizes of the second stage depend on the results observed in the first stage. The estimators guarantee that the relative mean-square error, or the mean-square error for the logarithmic versions, is less than a target value for any $p_1, p_2 \in (0,1)$, and the ratio of average sample sizes from the two populations is close to a prescribed value. The estimators can also be used with group sampling, whereby samples are taken in batches of fixed size from the two populations simultaneously, each batch containing samples from the two populations. The efficiency of the estimators with respect to the Cram\'er-Rao bound is good, and in particular it is close to $1$ for small values of the target error.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes two-stage sequential sampling estimators for the relative risk p1/p2, the odds ratio, and their logarithms from two independent Bernoulli populations. The central claim is that these estimators guarantee the relative mean-square error (or MSE for the log versions) is strictly less than a user-specified target uniformly for all p1, p2 ∈ (0,1), while keeping the ratio of expected sample sizes from the two populations close to a prescribed value. The procedure also extends to group sampling, and the estimators are asserted to achieve efficiency close to the Cramér-Rao bound, especially for small target errors.
Significance. If the uniform guarantees hold and the sampling rules are fully specified, the work would offer a practical advance for applications requiring bounded error in ratio estimation with binary outcomes (e.g., epidemiology, clinical trials) while controlling sample allocation. The efficiency claims near the CR bound for small targets would add value if verified.
major comments (1)
- [two-stage sequential sampling procedure] The two-stage sequential sampling procedure (described in the abstract and presumably detailed in the methods section) determines second-stage sizes n2,1 and n2,2 from first-stage counts X1 ~ Bin(m, p1) and X2 ~ Bin(m, p2). When X1 = 0 or X2 = 0, the plug-in variance estimate is exactly zero, yet no explicit, non-vacuous rule is supplied for choosing the additional samples in these cases. Because the claimed uniform relative-MSE bound must hold for every p1, p2 ∈ (0,1) (including arbitrarily small values where P(X=0) > 0 for any fixed m), this omission is load-bearing for the central guarantee.
Simulated Author's Rebuttal
We thank the referee for their careful review and for identifying this key point about the two-stage procedure. We respond to the major comment below.
read point-by-point responses
-
Referee: [two-stage sequential sampling procedure] The two-stage sequential sampling procedure (described in the abstract and presumably detailed in the methods section) determines second-stage sizes n2,1 and n2,2 from first-stage counts X1 ~ Bin(m, p1) and X2 ~ Bin(m, p2). When X1 = 0 or X2 = 0, the plug-in variance estimate is exactly zero, yet no explicit, non-vacuous rule is supplied for choosing the additional samples in these cases. Because the claimed uniform relative-MSE bound must hold for every p1, p2 ∈ (0,1) (including arbitrarily small values where P(X=0) > 0 for any fixed m), this omission is load-bearing for the central guarantee.
Authors: We agree that the current manuscript description does not supply an explicit, non-vacuous rule for the second-stage sizes when X1 = 0 or X2 = 0. This is a valid observation, as the plug-in variance is zero in those cases and the uniform bound must hold for all p1, p2 ∈ (0,1). We will revise the manuscript to add a concrete rule: when either first-stage count is zero, the second-stage sizes n2,1 and n2,2 will be set to the smallest integers satisfying the target accuracy under a conservative bound that replaces the zero count with a small positive value (e.g., 1/(m+1)) while preserving the prescribed sample-size ratio. The revised text will appear in the section defining the two-stage procedure. revision: yes
Circularity Check
No significant circularity in derivation
full rationale
The paper constructs estimators via a two-stage sequential sampling rule whose second-stage sizes are chosen from first-stage observations to enforce a uniform relative MSE (or MSE) bound below a target for every p1,p2 in (0,1), while controlling the expected sample-size ratio. No quoted equation, definition, or self-citation reduces this guarantee to a tautology, a fitted parameter renamed as prediction, or a load-bearing prior result by the same author. The central claim is presented as following from the explicit sampling design rather than from any internal redefinition or statistical forcing, making the derivation self-contained against the stated external sampling-theory benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- target relative MSE (or MSE for logs)
- prescribed sample size ratio
axioms (1)
- domain assumption Observations are independent binary random variables drawn from two populations with unknown parameters p1 and p2 in (0,1).
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.