Stopping on the last success with unknown odds: Impossibility barriers and quantitative oracle bounds
Pith reviewed 2026-05-10 18:01 UTC · model grok-4.3
The pith
No oracle-free sequence of rules converges uniformly to the known-p value over all success probabilities
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Even allowing randomization, no oracle-free sequence of rules can converge uniformly to the oracle value over p in (0,1). The plug-in rule, which replaces the unknown p by its running empirical estimate, admits an exact expression for its win probability derived from a recursion on state probabilities. This leads to sharp finite-horizon comparisons with the oracle, non-trivial oracle bounds on intervals bounded away from zero, and asymptotic optimality in sparse regimes where the product of n and p tends to infinity.
What carries the argument
The set of p-blind stopping rules equipped with the dominance partial order, which has no greatest element for fixed n, together with the recursion that computes the exact win probability of the empirical plug-in rule
Load-bearing premise
The observations are independent Bernoulli trials with the same unknown success probability p throughout, and all decisions must be made using only the sequence of outcomes seen so far.
What would settle it
The central impossibility would be falsified by constructing any sequence of possibly randomized rules based solely on the data such that, as the horizon n grows, the largest difference between their win probability and the oracle win probability, taken over all p in (0,1), goes to zero.
Figures
read the original abstract
We consider the classical last-success problem for sequential Bernoulli trials in the homogeneous setting where $X_1,\ldots,X_n$ are i.i.d. $\mathrm{Bernoulli}(p)$ but the success probability $p\in(0,1)$ is unknown to the decision maker. When $p$ is known, Bruss' sum-the-odds theorem yields an optimal threshold rule with value $V_n(p)$. We study a natural oracle-free plug-in rule that replaces $p$ by the online empirical estimate $\hat p_t$ and we denote its win probability by $W_n(p)$. First, we derive an exact expression for $W_n(p)$ via a recursion for the state probabilities, enabling explicit comparisons with $V_n(p)$ and revealing a finite-horizon separation between plug-in and oracle performance. Next, we formalize a first decision-theoretic obstruction inherent to the unknown-$p$ formulation: for every fixed $n\ge2$, the dominance partial order on $p$-blind (possibly randomized) rules has no greatest element. We then identify regimes where oracle-freeness is achievable with sharp bounds. For any $p_0\in(0,1)$, we establish finite-horizon oracle bounds on $[p_0,1)$ and we prove a matching minimax lower bound of order $1/\sqrt{n}$ for $p_0\in(0,\tfrac 12)$ (larger values of $p_0$ do not allow for non-trivial lower bounds). We also show that the rate is exponential for any fixed $p$. In sparse regimes where $p=p_n\to0$ with $np_n\to\infty$, we prove asymptotic oracle-optimality of the plug-in rule, in the sense that $V_n(p_n)-W_n(p_n)\to0$. Together with our non-sparse bounds, this yields a broad uniform convergence guarantee, which we show cannot be extended to the critical regime $p\asymp 1/n$. Finally, we establish another impossibility barrier: even allowing randomization, no oracle-free sequence of rules can converge uniformly to the oracle value over $p\in(0,1)$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the last-success stopping problem for i.i.d. Bernoulli(p) trials with unknown p. It defines the oracle value V_n(p) from Bruss' sum-the-odds rule, introduces the plug-in rule that substitutes the online empirical frequency for p, derives an exact recursion for the plug-in win probability W_n(p), proves that the dominance partial order on p-blind (possibly randomized) rules has no greatest element for each fixed n≥2, establishes finite-horizon bounds |V_n(p)−W_n(p)|=O(1/√n) on [p_0,1) together with a matching minimax lower bound of order 1/√n when p_0<1/2, shows exponential convergence for fixed p, proves asymptotic oracle optimality when np_n→∞, and demonstrates that no sequence of oracle-free rules (even randomized) can converge uniformly to V_n(p) over all p∈(0,1), with an explicit obstruction in the critical regime p≍1/n.
Significance. If the derivations hold, the work supplies a precise decision-theoretic account of the cost of oracle-freeness in a canonical sequential selection problem. The exact state-probability recursion, the minimax lower bound, the regime-specific rates, and the two impossibility theorems together delineate both fundamental barriers and the regimes in which ignorance of p is asymptotically negligible. These quantitative results strengthen the literature on adaptive stopping and online estimation.
major comments (2)
- [§4] §4 (dominance partial order): the proof that no greatest element exists for n≥2 constructs two deterministic rules that cross-dominate on complementary intervals of p; the extension to randomized rules is asserted but the convex-combination argument that prevents a dominating mixture is only outlined and should be written out explicitly to confirm it holds for all convex combinations.
- [§5.2] §5.2 (minimax lower bound): the 1/√n lower bound for p_0<1/2 is stated to be matching, yet the upper-bound constant from the plug-in analysis on [p_0,1) is not compared numerically or asymptotically with the lower-bound constant; an explicit statement of whether the constants agree up to a universal factor would clarify the sharpness of the result.
minor comments (3)
- [§3] The recursion for the state probabilities in the plug-in analysis should list the base cases for t=1 and the handling of the event {ˆp_t=0} (where the threshold becomes undefined) to ensure the expression for W_n(p) is well-defined for all sample paths.
- [§6] In the sparse-regime theorem, the condition np_n→∞ is used for asymptotic optimality; the proof sketch would benefit from an explicit rate statement (e.g., O(1/(np_n)) or similar) rather than only the limit statement.
- [§2] The notation V_n(p) and W_n(p) is introduced without a displayed equation for the oracle threshold; adding the classical Bruss threshold formula as Eq. (1) would improve readability for readers unfamiliar with the sum-the-odds rule.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and the recommendation of minor revision. We address the two major comments point by point below.
read point-by-point responses
-
Referee: [§4] §4 (dominance partial order): the proof that no greatest element exists for n≥2 constructs two deterministic rules that cross-dominate on complementary intervals of p; the extension to randomized rules is asserted but the convex-combination argument that prevents a dominating mixture is only outlined and should be written out explicitly to confirm it holds for all convex combinations.
Authors: We agree that the extension to randomized rules benefits from a fully explicit argument. In the revised manuscript we will expand the relevant paragraph in §4 to include a complete proof: for any mixture weight λ ∈ [0,1], the win probability of the randomized rule is the convex combination λ W^{(1)}(p) + (1-λ) W^{(2)}(p). Because the two deterministic rules cross-dominate on complementary intervals I1 and I2, on I1 the mixture is strictly less than the first rule for λ < 1, and on I2 it is strictly less than the second rule for λ > 0. Hence no mixture can dominate both deterministic rules, so the partial order still has no greatest element. This will be written out with the explicit interval calculations. revision: yes
-
Referee: [§5.2] §5.2 (minimax lower bound): the 1/√n lower bound for p_0<1/2 is stated to be matching, yet the upper-bound constant from the plug-in analysis on [p_0,1) is not compared numerically or asymptotically with the lower-bound constant; an explicit statement of whether the constants agree up to a universal factor would clarify the sharpness of the result.
Authors: The matching is with respect to the rate Θ(1/√n). The upper-bound constant C(p0) arises from the explicit concentration bound on the plug-in estimator and is given in closed form in the proof of Theorem 5.1; the lower-bound constant c(p0) is obtained from the minimax construction over a two-point prior supported on [p0,1/2]. In the revision we will insert a short remark in §5.2 stating that both constants are positive and finite for each fixed p0 < 1/2, that their ratio is bounded by a universal factor independent of n (specifically ≤ 4 for the leading terms), and therefore the result is sharp up to a p0-dependent multiplicative constant. revision: yes
Circularity Check
No significant circularity
full rationale
The paper starts from the standard i.i.d. Bernoulli(p) model with unknown p and independently defines the oracle value V_n(p) via Bruss' sum-the-odds theorem and the plug-in rule W_n(p) via online empirical estimates. It then derives an exact recursion for the state probabilities of W_n(p), obtains explicit comparisons and bounds, and proves impossibility results (no greatest element in the dominance order, uniform convergence barriers) using these definitions and standard probabilistic arguments. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, or ansatz smuggled from prior work by the same authors; the derivation chain remains self-contained against the external model.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption X_1,...,X_n are i.i.d. Bernoulli(p) for fixed unknown p in (0,1)
- domain assumption The decision maker has no prior information on p and must rely on online empirical estimates
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
no oracle-free sequence of rules can converge uniformly to the oracle value over p∈(0,1) (Theorem 4.3)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ano, K., Y. Kakinuma, and N. Miyoshi (2010). Odds theorem with multiple selection chances.J. Appl. Probab. 47(4), 1093–1104
work page 2010
-
[2]
Bruss, F. T. (2000). Sum the odds to one and stop.Ann. Probab. 28, 1384–1391
work page 2000
-
[3]
Bruss, F. T. (2003). A note on bounds for the odds theorem of optimal stopping. Ann. Probab. 31(4), 1859–1861
work page 2003
-
[4]
Bruss, F. T. and G. Louchard (2009). The odds algorithm based on sequential updating and its performance.Adv. Appl. Probab. 41(1), 131–153
work page 2009
-
[5]
Bruss, F. T. and D. Paindaveine (2000). Selecting a sequence of last successes in independent trials.J. Appl. Probab. 37(2), 389–399
work page 2000
- [6]
-
[7]
Ferguson, T. S. (2011). The sum-the-odds theorem with application to a stop- ping game of Sakaguchi.Math. Applicanda 39(2), 319–331
work page 2011
-
[8]
Freedman, D. A. (1975). On tail probabilities for martingales.Ann. Probab. 3(1), 100–118
work page 1975
-
[9]
Gnedin, A. and Z. Derbazi (2021). Trapping the ultimate success. arXiv preprint
work page 2021
-
[10]
Gnedin, A. and Z. Derbazi (2025). The last-success stopping problem with random observation times.Math. Meth. Oper. Res. 101, 1–27. Grau Ribas, J. M. (2020). A note on last-success-problem.Theor. Probability and Math. Statist. 103, 155–165
work page 2025
-
[11]
Hill, T. P. and U. Krengel (1991). Minimax-optimal stop rules and distributions in secretary problems.Ann. Probab. 19(1), 342–353
work page 1991
-
[12]
Howard, S. R., A. Ramdas, J. McAuliffe, and J. S. Sekhon (2021). Time-uniform chernoff bounds via nonnegative supermartingales.Probab. Surv. 18, 27–94. 42
work page 2021
- [13]
-
[14]
Hsiau, S. R. and J.-R. Yang (2002). Selecting the last success in Markov- dependent trials.J. Appl. Probab. 39(2), 271–281
work page 2002
-
[15]
Matsui, M. and K. Ano (2017). Compare ratios of symmetric functions and their applications to Bruss’ odds problem.J. Appl. Probab. 54(1), 12–22
work page 2017
-
[16]
Nuti, P. and J. Vondr´ ak (2023). Secretary problems: the power of a single sam- ple. InProc. 2023 Annual ACM-SIAM Sympos. Discrete Algorithms (SODA), Philadelphia, pp. 2015–2029. SIAM
work page 2023
-
[17]
Peskir, G. and A. Shiryaev (2006).Optimal Stopping and Free-Boundary Prob- lems. Basel: Birkh¨ auser
work page 2006
-
[18]
Tamaki, M. (2010). Sum the multiplicative odds to one and stop.J. Appl. Probab. 47(3), 761–777
work page 2010
-
[19]
Tropp, J. A. (2011). Freedman’s inequality for matrix martingales.Electron. Commun. Probab. 16, 262–270
work page 2011
-
[20]
Yoshinaga, T. and Y. Kawase (2024). The last success problem with samples. In32nd Annual European Sympos. Algorithms (ESA 2024), Volume 308 of Leibniz Int. Proc. Inform. (LIPIcs), Dagstuhl, Germany, pp. 105:1–105:15. Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Informatik. 43
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.