Stopping on the last success with unknown odds: Impossibility barriers and quantitative oracle bounds

Davy Paindaveine

arxiv: 2604.07183 · v1 · submitted 2026-04-08 · 🧮 math.PR

Stopping on the last success with unknown odds: Impossibility barriers and quantitative oracle bounds

Davy Paindaveine This is my paper

Pith reviewed 2026-05-10 18:01 UTC · model grok-4.3

classification 🧮 math.PR

keywords last success problemstopping rulesunknown probabilityoracle boundsBernoulli trialsminimax lower boundsplug-in rules

0 comments

The pith

No oracle-free sequence of rules converges uniformly to the known-p value over all success probabilities

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines the last-success stopping problem when the success probability p is unknown to the decision maker. It proves that no sequence of rules relying only on the observed data can make the probability of stopping on the last success approach the best possible value that would be achievable if p were known, and this holds uniformly over all possible values of p even when randomization is allowed. A simple rule that plugs in the current estimate of p based on past trials is analyzed in detail, showing good performance when p is not too small but revealing gaps when p is around 1/n. The work also establishes that for any fixed number of trials, among rules that do not know p, there is no single rule that dominates all others.

Core claim

Even allowing randomization, no oracle-free sequence of rules can converge uniformly to the oracle value over p in (0,1). The plug-in rule, which replaces the unknown p by its running empirical estimate, admits an exact expression for its win probability derived from a recursion on state probabilities. This leads to sharp finite-horizon comparisons with the oracle, non-trivial oracle bounds on intervals bounded away from zero, and asymptotic optimality in sparse regimes where the product of n and p tends to infinity.

What carries the argument

The set of p-blind stopping rules equipped with the dominance partial order, which has no greatest element for fixed n, together with the recursion that computes the exact win probability of the empirical plug-in rule

Load-bearing premise

The observations are independent Bernoulli trials with the same unknown success probability p throughout, and all decisions must be made using only the sequence of outcomes seen so far.

What would settle it

The central impossibility would be falsified by constructing any sequence of possibly randomized rules based solely on the data such that, as the horizon n grows, the largest difference between their win probability and the oracle win probability, taken over all p in (0,1), goes to zero.

Figures

Figures reproduced from arXiv: 2604.07183 by Davy Paindaveine.

**Figure 1.** Figure 1: (Left panel:) plug-in win probability Wn(p) and oracle win probability Vn(p) as functions of p ∈ (0, 1), for n ∈ {5, 10, 30} (for each n, the upper curve is Vn(p) since Vn(p) ≥ Wn(p)). (Right panel:) the deficit Vn(p)−Wn(p) for n ∈ {5, 10, 30}. Vertical gray lines in both panels mark the non-differentiability points of V5(p), i.e., p = 1/k, with k = 2, 3, 4, 5. Proof of Proposition 2.1(iii). Let t⋆ := ⌈ n… view at source ↗

**Figure 2.** Figure 2: (Left panel:) deficit Vn(p)−Wn(p) as a function of n, evaluated at the horizons n = 10, 20, . . . , 800, for several fixed values of p between the boundary points p = 1/3 and p = 1/4 (logarithmic y-axis). (Right panel:) √ n-scaled worst-case deficit as a function of n, for p0 ∈ {0.2, 0.3, 0.4}. faster for any fixed p, the worst-case performance on [p0, 1) is governed by a genuinely hardest region that enf… view at source ↗

**Figure 3.** Figure 3: n=30 n=100 n=250 0.02 0.04 0.06 0.08 0.10 p 0.02 0.04 0.06 0.08 0.10 0.12 Vn(p) - Wn(p) [PITH_FULL_IMAGE:figures/full_fig_p033_3.png] view at source ↗

**Figure 4.** Figure 4: (Left panel:) Wn(p) as a function of p ∈ (0, 1) for n = 59 and n = 60 (the curves are visually indistinguishable at this scale); the vertical lines indicate φ and ψ, the algebraic endpoints of the interval I on which W60(p) is monotone decreasing. (Middle panel:) zoom of W60(p) on a region containing I. (Right panel:) the same zoom for W59(p). Proof of Proposition 2.1(i)–(ii). Note that the oracle value in… view at source ↗

read the original abstract

We consider the classical last-success problem for sequential Bernoulli trials in the homogeneous setting where $X_1,\ldots,X_n$ are i.i.d. $\mathrm{Bernoulli}(p)$ but the success probability $p\in(0,1)$ is unknown to the decision maker. When $p$ is known, Bruss' sum-the-odds theorem yields an optimal threshold rule with value $V_n(p)$. We study a natural oracle-free plug-in rule that replaces $p$ by the online empirical estimate $\hat p_t$ and we denote its win probability by $W_n(p)$. First, we derive an exact expression for $W_n(p)$ via a recursion for the state probabilities, enabling explicit comparisons with $V_n(p)$ and revealing a finite-horizon separation between plug-in and oracle performance. Next, we formalize a first decision-theoretic obstruction inherent to the unknown-$p$ formulation: for every fixed $n\ge2$, the dominance partial order on $p$-blind (possibly randomized) rules has no greatest element. We then identify regimes where oracle-freeness is achievable with sharp bounds. For any $p_0\in(0,1)$, we establish finite-horizon oracle bounds on $[p_0,1)$ and we prove a matching minimax lower bound of order $1/\sqrt{n}$ for $p_0\in(0,\tfrac 12)$ (larger values of $p_0$ do not allow for non-trivial lower bounds). We also show that the rate is exponential for any fixed $p$. In sparse regimes where $p=p_n\to0$ with $np_n\to\infty$, we prove asymptotic oracle-optimality of the plug-in rule, in the sense that $V_n(p_n)-W_n(p_n)\to0$. Together with our non-sparse bounds, this yields a broad uniform convergence guarantee, which we show cannot be extended to the critical regime $p\asymp 1/n$. Finally, we establish another impossibility barrier: even allowing randomization, no oracle-free sequence of rules can converge uniformly to the oracle value over $p\in(0,1)$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper pins down sharp impossibility results for uniform oracle approximation in the unknown-p last-success problem, plus matching positive bounds in non-critical regimes.

read the letter

This paper's main contribution is a set of impossibility results and matching quantitative bounds for the last-success problem when the success probability p is unknown. Specifically, it shows that even randomized oracle-free rules cannot converge uniformly to the oracle value over all p in (0,1), while providing exact expressions and good performance guarantees away from the critical regime around p = 1/n. The work does several things well. It derives an exact recursion for the win probability W_n(p) of the natural plug-in rule that uses the running empirical frequency. This recursion allows direct comparison to the known-p oracle value V_n(p) from Bruss' theorem and reveals the finite-horizon performance gap. The dominance-order argument that no p-blind rule is best for fixed n is a nice decision-theoretic observation. The 1/sqrt(n) minimax lower bound for small p0, the exponential rate for fixed p, and the asymptotic optimality when np_n tends to infinity are all sharp and well-supported by the analysis. The uniform convergence guarantee on [p0,1) for any p0 >0 is also useful. The arguments appear sound, relying on recursive state probabilities and minimax techniques without circularity or extra assumptions beyond the standard i.i.d. Bernoulli setup. The stress test found no internal gaps in the central claims. One soft spot is that the paper remains entirely theoretical. There are no simulations to illustrate how large the gaps are for practical n, or how the plug-in compares to other simple adaptive rules. This is not a major issue for the theoretical claims but might limit immediate applicability. The paper is for specialists in optimal stopping, sequential analysis, and related areas of probability. A reader working on adaptive stopping rules or prophet-type problems would get concrete value from the regime-specific results and the impossibility barriers. It has enough novelty and technical substance to deserve serious peer review. I recommend sending it out for refereeing.

Referee Report

2 major / 3 minor

Summary. The paper studies the last-success stopping problem for i.i.d. Bernoulli(p) trials with unknown p. It defines the oracle value V_n(p) from Bruss' sum-the-odds rule, introduces the plug-in rule that substitutes the online empirical frequency for p, derives an exact recursion for the plug-in win probability W_n(p), proves that the dominance partial order on p-blind (possibly randomized) rules has no greatest element for each fixed n≥2, establishes finite-horizon bounds |V_n(p)−W_n(p)|=O(1/√n) on [p_0,1) together with a matching minimax lower bound of order 1/√n when p_0<1/2, shows exponential convergence for fixed p, proves asymptotic oracle optimality when np_n→∞, and demonstrates that no sequence of oracle-free rules (even randomized) can converge uniformly to V_n(p) over all p∈(0,1), with an explicit obstruction in the critical regime p≍1/n.

Significance. If the derivations hold, the work supplies a precise decision-theoretic account of the cost of oracle-freeness in a canonical sequential selection problem. The exact state-probability recursion, the minimax lower bound, the regime-specific rates, and the two impossibility theorems together delineate both fundamental barriers and the regimes in which ignorance of p is asymptotically negligible. These quantitative results strengthen the literature on adaptive stopping and online estimation.

major comments (2)

[§4] §4 (dominance partial order): the proof that no greatest element exists for n≥2 constructs two deterministic rules that cross-dominate on complementary intervals of p; the extension to randomized rules is asserted but the convex-combination argument that prevents a dominating mixture is only outlined and should be written out explicitly to confirm it holds for all convex combinations.
[§5.2] §5.2 (minimax lower bound): the 1/√n lower bound for p_0<1/2 is stated to be matching, yet the upper-bound constant from the plug-in analysis on [p_0,1) is not compared numerically or asymptotically with the lower-bound constant; an explicit statement of whether the constants agree up to a universal factor would clarify the sharpness of the result.

minor comments (3)

[§3] The recursion for the state probabilities in the plug-in analysis should list the base cases for t=1 and the handling of the event {ˆp_t=0} (where the threshold becomes undefined) to ensure the expression for W_n(p) is well-defined for all sample paths.
[§6] In the sparse-regime theorem, the condition np_n→∞ is used for asymptotic optimality; the proof sketch would benefit from an explicit rate statement (e.g., O(1/(np_n)) or similar) rather than only the limit statement.
[§2] The notation V_n(p) and W_n(p) is introduced without a displayed equation for the oracle threshold; adding the classical Bruss threshold formula as Eq. (1) would improve readability for readers unfamiliar with the sum-the-odds rule.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation of minor revision. We address the two major comments point by point below.

read point-by-point responses

Referee: [§4] §4 (dominance partial order): the proof that no greatest element exists for n≥2 constructs two deterministic rules that cross-dominate on complementary intervals of p; the extension to randomized rules is asserted but the convex-combination argument that prevents a dominating mixture is only outlined and should be written out explicitly to confirm it holds for all convex combinations.

Authors: We agree that the extension to randomized rules benefits from a fully explicit argument. In the revised manuscript we will expand the relevant paragraph in §4 to include a complete proof: for any mixture weight λ ∈ [0,1], the win probability of the randomized rule is the convex combination λ W^{(1)}(p) + (1-λ) W^{(2)}(p). Because the two deterministic rules cross-dominate on complementary intervals I1 and I2, on I1 the mixture is strictly less than the first rule for λ < 1, and on I2 it is strictly less than the second rule for λ > 0. Hence no mixture can dominate both deterministic rules, so the partial order still has no greatest element. This will be written out with the explicit interval calculations. revision: yes
Referee: [§5.2] §5.2 (minimax lower bound): the 1/√n lower bound for p_0<1/2 is stated to be matching, yet the upper-bound constant from the plug-in analysis on [p_0,1) is not compared numerically or asymptotically with the lower-bound constant; an explicit statement of whether the constants agree up to a universal factor would clarify the sharpness of the result.

Authors: The matching is with respect to the rate Θ(1/√n). The upper-bound constant C(p0) arises from the explicit concentration bound on the plug-in estimator and is given in closed form in the proof of Theorem 5.1; the lower-bound constant c(p0) is obtained from the minimax construction over a two-point prior supported on [p0,1/2]. In the revision we will insert a short remark in §5.2 stating that both constants are positive and finite for each fixed p0 < 1/2, that their ratio is bounded by a universal factor independent of n (specifically ≤ 4 for the leading terms), and therefore the result is sharp up to a p0-dependent multiplicative constant. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper starts from the standard i.i.d. Bernoulli(p) model with unknown p and independently defines the oracle value V_n(p) via Bruss' sum-the-odds theorem and the plug-in rule W_n(p) via online empirical estimates. It then derives an exact recursion for the state probabilities of W_n(p), obtains explicit comparisons and bounds, and proves impossibility results (no greatest element in the dominance order, uniform convergence barriers) using these definitions and standard probabilistic arguments. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, or ansatz smuggled from prior work by the same authors; the derivation chain remains self-contained against the external model.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the standard i.i.d. Bernoulli assumption for the trials and the definition of oracle rules based on known p, with no free parameters fitted to data or new entities postulated.

axioms (2)

domain assumption X_1,...,X_n are i.i.d. Bernoulli(p) for fixed unknown p in (0,1)
This is the homogeneous setting stated at the outset of the problem.
domain assumption The decision maker has no prior information on p and must rely on online empirical estimates
This defines the p-blind or oracle-free formulation central to all results.

pith-pipeline@v0.9.0 · 5701 in / 1454 out tokens · 79824 ms · 2026-05-10T18:01:46.002109+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

no oracle-free sequence of rules can converge uniformly to the oracle value over p∈(0,1) (Theorem 4.3)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Kakinuma, and N

Ano, K., Y. Kakinuma, and N. Miyoshi (2010). Odds theorem with multiple selection chances.J. Appl. Probab. 47(4), 1093–1104

work page 2010
[2]

Bruss, F. T. (2000). Sum the odds to one and stop.Ann. Probab. 28, 1384–1391

work page 2000
[3]

Bruss, F. T. (2003). A note on bounds for the odds theorem of optimal stopping. Ann. Probab. 31(4), 1859–1861

work page 2003
[4]

Bruss, F. T. and G. Louchard (2009). The odds algorithm based on sequential updating and its performance.Adv. Appl. Probab. 41(1), 131–153

work page 2009
[5]

Bruss, F. T. and D. Paindaveine (2000). Selecting a sequence of last successes in independent trials.J. Appl. Probab. 37(2), 389–399

work page 2000
[6]

Dendievel, R. (2012). New developments of the odds theorem. arXiv preprint 1212.1391

work page arXiv 2012
[7]

Ferguson, T. S. (2011). The sum-the-odds theorem with application to a stop- ping game of Sakaguchi.Math. Applicanda 39(2), 319–331

work page 2011
[8]

Freedman, D. A. (1975). On tail probabilities for martingales.Ann. Probab. 3(1), 100–118

work page 1975
[9]

Gnedin, A. and Z. Derbazi (2021). Trapping the ultimate success. arXiv preprint

work page 2021
[10]

Gnedin, A. and Z. Derbazi (2025). The last-success stopping problem with random observation times.Math. Meth. Oper. Res. 101, 1–27. Grau Ribas, J. M. (2020). A note on last-success-problem.Theor. Probability and Math. Statist. 103, 155–165

work page 2025
[11]

Hill, T. P. and U. Krengel (1991). Minimax-optimal stop rules and distributions in secretary problems.Ann. Probab. 19(1), 342–353

work page 1991
[12]

Howard, S. R., A. Ramdas, J. McAuliffe, and J. S. Sekhon (2021). Time-uniform chernoff bounds via nonnegative supermartingales.Probab. Surv. 18, 27–94. 42

work page 2021
[13]

and J.-R

Hsiau, S.-R. and J.-R. Yang (2000). A natural variation of the standard secretary problem.Statist. Sinica 10, 639–646

work page 2000
[14]

Hsiau, S. R. and J.-R. Yang (2002). Selecting the last success in Markov- dependent trials.J. Appl. Probab. 39(2), 271–281

work page 2002
[15]

Matsui, M. and K. Ano (2017). Compare ratios of symmetric functions and their applications to Bruss’ odds problem.J. Appl. Probab. 54(1), 12–22

work page 2017
[16]

Nuti, P. and J. Vondr´ ak (2023). Secretary problems: the power of a single sam- ple. InProc. 2023 Annual ACM-SIAM Sympos. Discrete Algorithms (SODA), Philadelphia, pp. 2015–2029. SIAM

work page 2023
[17]

Peskir, G. and A. Shiryaev (2006).Optimal Stopping and Free-Boundary Prob- lems. Basel: Birkh¨ auser

work page 2006
[18]

Tamaki, M. (2010). Sum the multiplicative odds to one and stop.J. Appl. Probab. 47(3), 761–777

work page 2010
[19]

Tropp, J. A. (2011). Freedman’s inequality for matrix martingales.Electron. Commun. Probab. 16, 262–270

work page 2011
[20]

Yoshinaga, T. and Y. Kawase (2024). The last success problem with samples. In32nd Annual European Sympos. Algorithms (ESA 2024), Volume 308 of Leibniz Int. Proc. Inform. (LIPIcs), Dagstuhl, Germany, pp. 105:1–105:15. Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Informatik. 43

work page 2024

[1] [1]

Kakinuma, and N

Ano, K., Y. Kakinuma, and N. Miyoshi (2010). Odds theorem with multiple selection chances.J. Appl. Probab. 47(4), 1093–1104

work page 2010

[2] [2]

Bruss, F. T. (2000). Sum the odds to one and stop.Ann. Probab. 28, 1384–1391

work page 2000

[3] [3]

Bruss, F. T. (2003). A note on bounds for the odds theorem of optimal stopping. Ann. Probab. 31(4), 1859–1861

work page 2003

[4] [4]

Bruss, F. T. and G. Louchard (2009). The odds algorithm based on sequential updating and its performance.Adv. Appl. Probab. 41(1), 131–153

work page 2009

[5] [5]

Bruss, F. T. and D. Paindaveine (2000). Selecting a sequence of last successes in independent trials.J. Appl. Probab. 37(2), 389–399

work page 2000

[6] [6]

Dendievel, R. (2012). New developments of the odds theorem. arXiv preprint 1212.1391

work page arXiv 2012

[7] [7]

Ferguson, T. S. (2011). The sum-the-odds theorem with application to a stop- ping game of Sakaguchi.Math. Applicanda 39(2), 319–331

work page 2011

[8] [8]

Freedman, D. A. (1975). On tail probabilities for martingales.Ann. Probab. 3(1), 100–118

work page 1975

[9] [9]

Gnedin, A. and Z. Derbazi (2021). Trapping the ultimate success. arXiv preprint

work page 2021

[10] [10]

Gnedin, A. and Z. Derbazi (2025). The last-success stopping problem with random observation times.Math. Meth. Oper. Res. 101, 1–27. Grau Ribas, J. M. (2020). A note on last-success-problem.Theor. Probability and Math. Statist. 103, 155–165

work page 2025

[11] [11]

Hill, T. P. and U. Krengel (1991). Minimax-optimal stop rules and distributions in secretary problems.Ann. Probab. 19(1), 342–353

work page 1991

[12] [12]

Howard, S. R., A. Ramdas, J. McAuliffe, and J. S. Sekhon (2021). Time-uniform chernoff bounds via nonnegative supermartingales.Probab. Surv. 18, 27–94. 42

work page 2021

[13] [13]

and J.-R

Hsiau, S.-R. and J.-R. Yang (2000). A natural variation of the standard secretary problem.Statist. Sinica 10, 639–646

work page 2000

[14] [14]

Hsiau, S. R. and J.-R. Yang (2002). Selecting the last success in Markov- dependent trials.J. Appl. Probab. 39(2), 271–281

work page 2002

[15] [15]

Matsui, M. and K. Ano (2017). Compare ratios of symmetric functions and their applications to Bruss’ odds problem.J. Appl. Probab. 54(1), 12–22

work page 2017

[16] [16]

Nuti, P. and J. Vondr´ ak (2023). Secretary problems: the power of a single sam- ple. InProc. 2023 Annual ACM-SIAM Sympos. Discrete Algorithms (SODA), Philadelphia, pp. 2015–2029. SIAM

work page 2023

[17] [17]

Peskir, G. and A. Shiryaev (2006).Optimal Stopping and Free-Boundary Prob- lems. Basel: Birkh¨ auser

work page 2006

[18] [18]

Tamaki, M. (2010). Sum the multiplicative odds to one and stop.J. Appl. Probab. 47(3), 761–777

work page 2010

[19] [19]

Tropp, J. A. (2011). Freedman’s inequality for matrix martingales.Electron. Commun. Probab. 16, 262–270

work page 2011

[20] [20]

Yoshinaga, T. and Y. Kawase (2024). The last success problem with samples. In32nd Annual European Sympos. Algorithms (ESA 2024), Volume 308 of Leibniz Int. Proc. Inform. (LIPIcs), Dagstuhl, Germany, pp. 105:1–105:15. Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Informatik. 43

work page 2024