Estimation of relative risk, odds ratio and their logarithms with guaranteed accuracy and controlled sample size ratio

Luis Mendo

arxiv: 2503.04876 · v4 · submitted 2025-03-06 · 📊 stat.ME · math.ST· stat.TH

Estimation of relative risk, odds ratio and their logarithms with guaranteed accuracy and controlled sample size ratio

Luis Mendo This is my paper

Pith reviewed 2026-05-23 01:06 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH

keywords relative risk estimationodds ratiosequential samplingmean square error boundbinary observationstwo populations

0 comments

The pith

Two-stage sequential sampling yields estimators for relative risk and odds ratio that bound relative mean-square error below any target for all probabilities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces estimators for the relative risk p1/p2, the odds ratio, and their logarithms using independent binary observations from two populations. The method relies on two-stage sequential sampling, with the second stage sizes determined by the first stage outcomes. These estimators ensure the relative mean-square error stays under a chosen target value no matter what the true probabilities p1 and p2 are between zero and one. They also maintain the average sample size ratio close to a user-specified value. The same approach supports group sampling where batches are taken simultaneously from both populations.

Core claim

The estimators guarantee that the relative mean-square error, or the mean-square error for the logarithmic versions, is less than a target value for any p1, p2 ∈ (0,1), and the ratio of average sample sizes from the two populations is close to a prescribed value.

What carries the argument

Two-stage sequential sampling procedure applied to each population, with second-stage sizes depending on first-stage results to enforce the error bound.

If this is right

The efficiency is close to the Cramér-Rao bound, particularly for small target errors.
The estimators can be used with group sampling in batches from both populations simultaneously.
The guarantees hold for any values of the success probabilities in (0,1).
Sample sizes adapt dynamically based on initial observations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If implemented, this could minimize sample sizes needed in comparative studies while providing accuracy guarantees.
Similar adaptive sampling might extend to estimating other functions of binomial parameters.
The controlled sample size ratio could be useful in resource-limited settings with unequal population access.

Load-bearing premise

It is possible to select second-stage sample sizes from the first-stage results such that the error bound holds uniformly over all possible probabilities.

What would settle it

Finding any pair of probabilities p1 and p2 where the relative mean-square error of the estimator exceeds the target value in repeated trials would falsify the guarantee.

read the original abstract

Given two populations from which independent binary observations are taken with parameters $p_1$ and $p_2$ respectively, estimators are proposed for the relative risk $p_1/p_2$, the odds ratio $p_1(1-p_2)/(p_2(1-p_1))$ and their logarithms. The sampling strategy used by the estimators is based on two-stage sequential sampling applied to each population, where the sample sizes of the second stage depend on the results observed in the first stage. The estimators guarantee that the relative mean-square error, or the mean-square error for the logarithmic versions, is less than a target value for any $p_1, p_2 \in (0,1)$, and the ratio of average sample sizes from the two populations is close to a prescribed value. The estimators can also be used with group sampling, whereby samples are taken in batches of fixed size from the two populations simultaneously, each batch containing samples from the two populations. The efficiency of the estimators with respect to the Cram\'er-Rao bound is good, and in particular it is close to $1$ for small values of the target error.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a two-stage sampling plan that claims a uniform relative-MSE bound for relative risk and odds ratio estimators over all p1,p2 in (0,1) while controlling the sample-size ratio, but the zero-count case in stage one is not resolved in the abstract and could block the guarantee.

read the letter

The main takeaway is that this work supplies estimators for relative risk, odds ratio, and their logs that use two-stage sequential sampling on each population. The second-stage sizes are set from the first-stage counts so that the relative mean-square error (or ordinary MSE for the log versions) stays below a preset target no matter what the true p1 and p2 are, and the ratio of average sample sizes stays near a chosen value. Group sampling in simultaneous batches is also covered. The efficiency figures relative to the Cramér-Rao bound are reported as close to one when the target error is small. That combination of uniform non-asymptotic control plus sample-ratio control is the concrete contribution. It targets a practical need in settings where you must deliver a stated accuracy level without knowing the probabilities ahead of time. The abstract is straightforward about the goal and the sampling structure. The zero-event issue in the first stage is the clearest soft spot. When the initial sample from one or both groups yields zero successes, any plug-in variance is exactly zero, yet the paper must still pick additional samples that keep the final estimator inside the error bound for every possible p1 and p2 while preserving the expected sample-size ratio. The abstract supplies no explicit rule for those realizations, and if the full text does not either, the uniform guarantee over (0,1) does not hold. That is not a minor gap because the probability of zero counts is positive for any fixed first-stage size when p is small. The rest of the construction looks internally consistent on the points that are shown. This paper is for methodologists who design studies with binary outcomes and want explicit accuracy guarantees rather than asymptotic statements. A reader already working on sequential or adaptive sampling for proportions would get the most from it. It deserves a serious referee to check whether the second-stage rule is fully specified and actually works at the zero boundary; the central claim is strong enough that the details matter.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes two-stage sequential sampling estimators for the relative risk p1/p2, the odds ratio, and their logarithms from two independent Bernoulli populations. The central claim is that these estimators guarantee the relative mean-square error (or MSE for the log versions) is strictly less than a user-specified target uniformly for all p1, p2 ∈ (0,1), while keeping the ratio of expected sample sizes from the two populations close to a prescribed value. The procedure also extends to group sampling, and the estimators are asserted to achieve efficiency close to the Cramér-Rao bound, especially for small target errors.

Significance. If the uniform guarantees hold and the sampling rules are fully specified, the work would offer a practical advance for applications requiring bounded error in ratio estimation with binary outcomes (e.g., epidemiology, clinical trials) while controlling sample allocation. The efficiency claims near the CR bound for small targets would add value if verified.

major comments (1)

[two-stage sequential sampling procedure] The two-stage sequential sampling procedure (described in the abstract and presumably detailed in the methods section) determines second-stage sizes n2,1 and n2,2 from first-stage counts X1 ~ Bin(m, p1) and X2 ~ Bin(m, p2). When X1 = 0 or X2 = 0, the plug-in variance estimate is exactly zero, yet no explicit, non-vacuous rule is supplied for choosing the additional samples in these cases. Because the claimed uniform relative-MSE bound must hold for every p1, p2 ∈ (0,1) (including arbitrarily small values where P(X=0) > 0 for any fixed m), this omission is load-bearing for the central guarantee.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and for identifying this key point about the two-stage procedure. We respond to the major comment below.

read point-by-point responses

Referee: [two-stage sequential sampling procedure] The two-stage sequential sampling procedure (described in the abstract and presumably detailed in the methods section) determines second-stage sizes n2,1 and n2,2 from first-stage counts X1 ~ Bin(m, p1) and X2 ~ Bin(m, p2). When X1 = 0 or X2 = 0, the plug-in variance estimate is exactly zero, yet no explicit, non-vacuous rule is supplied for choosing the additional samples in these cases. Because the claimed uniform relative-MSE bound must hold for every p1, p2 ∈ (0,1) (including arbitrarily small values where P(X=0) > 0 for any fixed m), this omission is load-bearing for the central guarantee.

Authors: We agree that the current manuscript description does not supply an explicit, non-vacuous rule for the second-stage sizes when X1 = 0 or X2 = 0. This is a valid observation, as the plug-in variance is zero in those cases and the uniform bound must hold for all p1, p2 ∈ (0,1). We will revise the manuscript to add a concrete rule: when either first-stage count is zero, the second-stage sizes n2,1 and n2,2 will be set to the smallest integers satisfying the target accuracy under a conservative bound that replaces the zero count with a small positive value (e.g., 1/(m+1)) while preserving the prescribed sample-size ratio. The revised text will appear in the section defining the two-stage procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation

full rationale

The paper constructs estimators via a two-stage sequential sampling rule whose second-stage sizes are chosen from first-stage observations to enforce a uniform relative MSE (or MSE) bound below a target for every p1,p2 in (0,1), while controlling the expected sample-size ratio. No quoted equation, definition, or self-citation reduces this guarantee to a tautology, a fitted parameter renamed as prediction, or a load-bearing prior result by the same author. The central claim is presented as following from the explicit sampling design rather than from any internal redefinition or statistical forcing, making the derivation self-contained against the stated external sampling-theory benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on standard assumptions of independent Bernoulli observations and the feasibility of sequential sampling; no new entities are introduced and the target error and sample-size ratio are user-specified parameters rather than fitted quantities.

free parameters (2)

target relative MSE (or MSE for logs)
User-chosen bound that the estimator is designed to respect for all p1, p2.
prescribed sample size ratio
User-specified target for the long-run ratio of average sample sizes from the two populations.

axioms (1)

domain assumption Observations are independent binary random variables drawn from two populations with unknown parameters p1 and p2 in (0,1).
Explicitly stated as the data-generating model in the abstract.

pith-pipeline@v0.9.0 · 5740 in / 1226 out tokens · 48334 ms · 2026-05-23T01:06:10.763059+00:00 · methodology

Estimation of relative risk, odds ratio and their logarithms with guaranteed accuracy and controlled sample size ratio

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)