pith. sign in

arxiv: 2604.07183 · v1 · submitted 2026-04-08 · 🧮 math.PR

Stopping on the last success with unknown odds: Impossibility barriers and quantitative oracle bounds

Pith reviewed 2026-05-10 18:01 UTC · model grok-4.3

classification 🧮 math.PR
keywords last success problemstopping rulesunknown probabilityoracle boundsBernoulli trialsminimax lower boundsplug-in rules
0
0 comments X

The pith

No oracle-free sequence of rules converges uniformly to the known-p value over all success probabilities

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines the last-success stopping problem when the success probability p is unknown to the decision maker. It proves that no sequence of rules relying only on the observed data can make the probability of stopping on the last success approach the best possible value that would be achievable if p were known, and this holds uniformly over all possible values of p even when randomization is allowed. A simple rule that plugs in the current estimate of p based on past trials is analyzed in detail, showing good performance when p is not too small but revealing gaps when p is around 1/n. The work also establishes that for any fixed number of trials, among rules that do not know p, there is no single rule that dominates all others.

Core claim

Even allowing randomization, no oracle-free sequence of rules can converge uniformly to the oracle value over p in (0,1). The plug-in rule, which replaces the unknown p by its running empirical estimate, admits an exact expression for its win probability derived from a recursion on state probabilities. This leads to sharp finite-horizon comparisons with the oracle, non-trivial oracle bounds on intervals bounded away from zero, and asymptotic optimality in sparse regimes where the product of n and p tends to infinity.

What carries the argument

The set of p-blind stopping rules equipped with the dominance partial order, which has no greatest element for fixed n, together with the recursion that computes the exact win probability of the empirical plug-in rule

Load-bearing premise

The observations are independent Bernoulli trials with the same unknown success probability p throughout, and all decisions must be made using only the sequence of outcomes seen so far.

What would settle it

The central impossibility would be falsified by constructing any sequence of possibly randomized rules based solely on the data such that, as the horizon n grows, the largest difference between their win probability and the oracle win probability, taken over all p in (0,1), goes to zero.

Figures

Figures reproduced from arXiv: 2604.07183 by Davy Paindaveine.

Figure 1
Figure 1. Figure 1: (Left panel:) plug-in win probability Wn(p) and oracle win proba￾bility Vn(p) as functions of p ∈ (0, 1), for n ∈ {5, 10, 30} (for each n, the upper curve is Vn(p) since Vn(p) ≥ Wn(p)). (Right panel:) the deficit Vn(p)−Wn(p) for n ∈ {5, 10, 30}. Vertical gray lines in both panels mark the non-differentiability points of V5(p), i.e., p = 1/k, with k = 2, 3, 4, 5. Proof of Proposition 2.1(iii). Let t⋆ := ⌈ n… view at source ↗
Figure 2
Figure 2. Figure 2: (Left panel:) deficit Vn(p)−Wn(p) as a function of n, evaluated at the horizons n = 10, 20, . . . , 800, for several fixed values of p between the boundary points p = 1/3 and p = 1/4 (logarithmic y-axis). (Right panel:) √ n-scaled worst-case deficit as a function of n, for p0 ∈ {0.2, 0.3, 0.4}. faster for any fixed p, the worst-case performance on [p0, 1) is governed by a gen￾uinely hardest region that enf… view at source ↗
Figure 3
Figure 3. Figure 3: n=30 n=100 n=250 0.02 0.04 0.06 0.08 0.10 p 0.02 0.04 0.06 0.08 0.10 0.12 Vn(p) - Wn(p) [PITH_FULL_IMAGE:figures/full_fig_p033_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (Left panel:) Wn(p) as a function of p ∈ (0, 1) for n = 59 and n = 60 (the curves are visually indistinguishable at this scale); the vertical lines indicate φ and ψ, the algebraic endpoints of the interval I on which W60(p) is monotone decreasing. (Middle panel:) zoom of W60(p) on a region containing I. (Right panel:) the same zoom for W59(p). Proof of Proposition 2.1(i)–(ii). Note that the oracle value in… view at source ↗
read the original abstract

We consider the classical last-success problem for sequential Bernoulli trials in the homogeneous setting where $X_1,\ldots,X_n$ are i.i.d. $\mathrm{Bernoulli}(p)$ but the success probability $p\in(0,1)$ is unknown to the decision maker. When $p$ is known, Bruss' sum-the-odds theorem yields an optimal threshold rule with value $V_n(p)$. We study a natural oracle-free plug-in rule that replaces $p$ by the online empirical estimate $\hat p_t$ and we denote its win probability by $W_n(p)$. First, we derive an exact expression for $W_n(p)$ via a recursion for the state probabilities, enabling explicit comparisons with $V_n(p)$ and revealing a finite-horizon separation between plug-in and oracle performance. Next, we formalize a first decision-theoretic obstruction inherent to the unknown-$p$ formulation: for every fixed $n\ge2$, the dominance partial order on $p$-blind (possibly randomized) rules has no greatest element. We then identify regimes where oracle-freeness is achievable with sharp bounds. For any $p_0\in(0,1)$, we establish finite-horizon oracle bounds on $[p_0,1)$ and we prove a matching minimax lower bound of order $1/\sqrt{n}$ for $p_0\in(0,\tfrac 12)$ (larger values of $p_0$ do not allow for non-trivial lower bounds). We also show that the rate is exponential for any fixed $p$. In sparse regimes where $p=p_n\to0$ with $np_n\to\infty$, we prove asymptotic oracle-optimality of the plug-in rule, in the sense that $V_n(p_n)-W_n(p_n)\to0$. Together with our non-sparse bounds, this yields a broad uniform convergence guarantee, which we show cannot be extended to the critical regime $p\asymp 1/n$. Finally, we establish another impossibility barrier: even allowing randomization, no oracle-free sequence of rules can converge uniformly to the oracle value over $p\in(0,1)$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper studies the last-success stopping problem for i.i.d. Bernoulli(p) trials with unknown p. It defines the oracle value V_n(p) from Bruss' sum-the-odds rule, introduces the plug-in rule that substitutes the online empirical frequency for p, derives an exact recursion for the plug-in win probability W_n(p), proves that the dominance partial order on p-blind (possibly randomized) rules has no greatest element for each fixed n≥2, establishes finite-horizon bounds |V_n(p)−W_n(p)|=O(1/√n) on [p_0,1) together with a matching minimax lower bound of order 1/√n when p_0<1/2, shows exponential convergence for fixed p, proves asymptotic oracle optimality when np_n→∞, and demonstrates that no sequence of oracle-free rules (even randomized) can converge uniformly to V_n(p) over all p∈(0,1), with an explicit obstruction in the critical regime p≍1/n.

Significance. If the derivations hold, the work supplies a precise decision-theoretic account of the cost of oracle-freeness in a canonical sequential selection problem. The exact state-probability recursion, the minimax lower bound, the regime-specific rates, and the two impossibility theorems together delineate both fundamental barriers and the regimes in which ignorance of p is asymptotically negligible. These quantitative results strengthen the literature on adaptive stopping and online estimation.

major comments (2)
  1. [§4] §4 (dominance partial order): the proof that no greatest element exists for n≥2 constructs two deterministic rules that cross-dominate on complementary intervals of p; the extension to randomized rules is asserted but the convex-combination argument that prevents a dominating mixture is only outlined and should be written out explicitly to confirm it holds for all convex combinations.
  2. [§5.2] §5.2 (minimax lower bound): the 1/√n lower bound for p_0<1/2 is stated to be matching, yet the upper-bound constant from the plug-in analysis on [p_0,1) is not compared numerically or asymptotically with the lower-bound constant; an explicit statement of whether the constants agree up to a universal factor would clarify the sharpness of the result.
minor comments (3)
  1. [§3] The recursion for the state probabilities in the plug-in analysis should list the base cases for t=1 and the handling of the event {ˆp_t=0} (where the threshold becomes undefined) to ensure the expression for W_n(p) is well-defined for all sample paths.
  2. [§6] In the sparse-regime theorem, the condition np_n→∞ is used for asymptotic optimality; the proof sketch would benefit from an explicit rate statement (e.g., O(1/(np_n)) or similar) rather than only the limit statement.
  3. [§2] The notation V_n(p) and W_n(p) is introduced without a displayed equation for the oracle threshold; adding the classical Bruss threshold formula as Eq. (1) would improve readability for readers unfamiliar with the sum-the-odds rule.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation of minor revision. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: [§4] §4 (dominance partial order): the proof that no greatest element exists for n≥2 constructs two deterministic rules that cross-dominate on complementary intervals of p; the extension to randomized rules is asserted but the convex-combination argument that prevents a dominating mixture is only outlined and should be written out explicitly to confirm it holds for all convex combinations.

    Authors: We agree that the extension to randomized rules benefits from a fully explicit argument. In the revised manuscript we will expand the relevant paragraph in §4 to include a complete proof: for any mixture weight λ ∈ [0,1], the win probability of the randomized rule is the convex combination λ W^{(1)}(p) + (1-λ) W^{(2)}(p). Because the two deterministic rules cross-dominate on complementary intervals I1 and I2, on I1 the mixture is strictly less than the first rule for λ < 1, and on I2 it is strictly less than the second rule for λ > 0. Hence no mixture can dominate both deterministic rules, so the partial order still has no greatest element. This will be written out with the explicit interval calculations. revision: yes

  2. Referee: [§5.2] §5.2 (minimax lower bound): the 1/√n lower bound for p_0<1/2 is stated to be matching, yet the upper-bound constant from the plug-in analysis on [p_0,1) is not compared numerically or asymptotically with the lower-bound constant; an explicit statement of whether the constants agree up to a universal factor would clarify the sharpness of the result.

    Authors: The matching is with respect to the rate Θ(1/√n). The upper-bound constant C(p0) arises from the explicit concentration bound on the plug-in estimator and is given in closed form in the proof of Theorem 5.1; the lower-bound constant c(p0) is obtained from the minimax construction over a two-point prior supported on [p0,1/2]. In the revision we will insert a short remark in §5.2 stating that both constants are positive and finite for each fixed p0 < 1/2, that their ratio is bounded by a universal factor independent of n (specifically ≤ 4 for the leading terms), and therefore the result is sharp up to a p0-dependent multiplicative constant. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper starts from the standard i.i.d. Bernoulli(p) model with unknown p and independently defines the oracle value V_n(p) via Bruss' sum-the-odds theorem and the plug-in rule W_n(p) via online empirical estimates. It then derives an exact recursion for the state probabilities of W_n(p), obtains explicit comparisons and bounds, and proves impossibility results (no greatest element in the dominance order, uniform convergence barriers) using these definitions and standard probabilistic arguments. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, or ansatz smuggled from prior work by the same authors; the derivation chain remains self-contained against the external model.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the standard i.i.d. Bernoulli assumption for the trials and the definition of oracle rules based on known p, with no free parameters fitted to data or new entities postulated.

axioms (2)
  • domain assumption X_1,...,X_n are i.i.d. Bernoulli(p) for fixed unknown p in (0,1)
    This is the homogeneous setting stated at the outset of the problem.
  • domain assumption The decision maker has no prior information on p and must rely on online empirical estimates
    This defines the p-blind or oracle-free formulation central to all results.

pith-pipeline@v0.9.0 · 5701 in / 1454 out tokens · 79824 ms · 2026-05-10T18:01:46.002109+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    Kakinuma, and N

    Ano, K., Y. Kakinuma, and N. Miyoshi (2010). Odds theorem with multiple selection chances.J. Appl. Probab. 47(4), 1093–1104

  2. [2]

    Bruss, F. T. (2000). Sum the odds to one and stop.Ann. Probab. 28, 1384–1391

  3. [3]

    Bruss, F. T. (2003). A note on bounds for the odds theorem of optimal stopping. Ann. Probab. 31(4), 1859–1861

  4. [4]

    Bruss, F. T. and G. Louchard (2009). The odds algorithm based on sequential updating and its performance.Adv. Appl. Probab. 41(1), 131–153

  5. [5]

    Bruss, F. T. and D. Paindaveine (2000). Selecting a sequence of last successes in independent trials.J. Appl. Probab. 37(2), 389–399

  6. [6]

    Dendievel, R. (2012). New developments of the odds theorem. arXiv preprint 1212.1391

  7. [7]

    Ferguson, T. S. (2011). The sum-the-odds theorem with application to a stop- ping game of Sakaguchi.Math. Applicanda 39(2), 319–331

  8. [8]

    Freedman, D. A. (1975). On tail probabilities for martingales.Ann. Probab. 3(1), 100–118

  9. [9]

    Gnedin, A. and Z. Derbazi (2021). Trapping the ultimate success. arXiv preprint

  10. [10]

    Gnedin, A. and Z. Derbazi (2025). The last-success stopping problem with random observation times.Math. Meth. Oper. Res. 101, 1–27. Grau Ribas, J. M. (2020). A note on last-success-problem.Theor. Probability and Math. Statist. 103, 155–165

  11. [11]

    Hill, T. P. and U. Krengel (1991). Minimax-optimal stop rules and distributions in secretary problems.Ann. Probab. 19(1), 342–353

  12. [12]

    Howard, S. R., A. Ramdas, J. McAuliffe, and J. S. Sekhon (2021). Time-uniform chernoff bounds via nonnegative supermartingales.Probab. Surv. 18, 27–94. 42

  13. [13]

    and J.-R

    Hsiau, S.-R. and J.-R. Yang (2000). A natural variation of the standard secretary problem.Statist. Sinica 10, 639–646

  14. [14]

    Hsiau, S. R. and J.-R. Yang (2002). Selecting the last success in Markov- dependent trials.J. Appl. Probab. 39(2), 271–281

  15. [15]

    Matsui, M. and K. Ano (2017). Compare ratios of symmetric functions and their applications to Bruss’ odds problem.J. Appl. Probab. 54(1), 12–22

  16. [16]

    Nuti, P. and J. Vondr´ ak (2023). Secretary problems: the power of a single sam- ple. InProc. 2023 Annual ACM-SIAM Sympos. Discrete Algorithms (SODA), Philadelphia, pp. 2015–2029. SIAM

  17. [17]

    Peskir, G. and A. Shiryaev (2006).Optimal Stopping and Free-Boundary Prob- lems. Basel: Birkh¨ auser

  18. [18]

    Tamaki, M. (2010). Sum the multiplicative odds to one and stop.J. Appl. Probab. 47(3), 761–777

  19. [19]

    Tropp, J. A. (2011). Freedman’s inequality for matrix martingales.Electron. Commun. Probab. 16, 262–270

  20. [20]

    Yoshinaga, T. and Y. Kawase (2024). The last success problem with samples. In32nd Annual European Sympos. Algorithms (ESA 2024), Volume 308 of Leibniz Int. Proc. Inform. (LIPIcs), Dagstuhl, Germany, pp. 105:1–105:15. Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Informatik. 43