Optimal strategies in the all-heads coin game

Peter Pfaffelhuber

arxiv: 2604.22991 · v2 · pith:U6JXXOJDnew · submitted 2026-04-24 · 🧮 math.PR

Optimal strategies in the all-heads coin game

Peter Pfaffelhuber This is my paper

Pith reviewed 2026-05-08 09:58 UTC · model grok-4.3

classification 🧮 math.PR

keywords coin flipping gameMarkov decision processoptimal strategywinning probabilityperturbation expansionBellman equationstochastic shortest path

0 comments

The pith

When the coin is biased toward heads, always setting aside exactly one head is optimal and the winning probability increases with the number of coins.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies a sequential game in which a player must set aside at least one heads from n coins each round or lose. It compares two strategies: setting aside one head or all heads. For bias p exceeding one half, the single-head strategy is optimal, the winning probability grows strictly with n, and its limit has an explicit series form between p and one. For fair coins every strategy wins with probability one half. Near fair from below a perturbation analysis shows the winning probabilities have a local minimum at five coins to first order.

Core claim

For p greater than one half the strategy of setting aside exactly one head is optimal, the map n to winning probability is strictly increasing, and the limit W(p) admits an explicit series representation satisfying p less than or equal to W(p) less than one. For p equal to one half every strategy achieves winning probability one half. For p less than one half near one half the deficit from one half is approximately delta times c_n to first order, where c_n satisfies a linear recursion for n at least seven with limit about 1.7035, so the optimal value sequence has a strict local minimum at n equals five and no local maximum.

What carries the argument

The Bellman equation for the stochastic shortest-path Markov decision process, which features a nonlinear suffix-maximum operator, solved by comparing the two natural strategies of setting aside one or all heads.

If this is right

For p > 1/2 the winning probability is strictly increasing in the initial number of coins.
The limiting winning probability W(p) has an explicit series representation.
W(p) lies between p and 1.
To first order in the bias deficit the optimal values have a strict local minimum at n=5.
The sequence c_n obeys a linear recursion for n >=7 and converges to approximately 1.7035.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the bias is only slightly below one half, players would do well to avoid starting with exactly five coins.
The series representation allows numerical computation of the limiting probability for any given bias above one half.
Local maxima in the winning probability sequence may emerge only when the bias against heads is stronger than the perturbative regime.

Load-bearing premise

That for every number of coins and every bias the optimal policy is one of the two simple strategies of setting aside one head or all heads.

What would settle it

For a fixed n and p greater than one half, solve the full Bellman optimality equations numerically to check whether the optimal action in every state is always to set aside exactly one head.

Figures

Figures reproduced from arXiv: 2604.22991 by Peter Pfaffelhuber.

**Figure 1.** Figure 1: The limit W(p) = limn→∞ wn,p as a function of p, for p ∈ ( 1 2 , 1). Computed via the recursion of Corollary 3.2 iterated to n = 400 in 80-digit arithmetic. Selected values: W(0.55) ≈ 0.6288, W(0.6) ≈ 0.7482, W(0.7) ≈ 0.9255, W(0.9) ≈ 0.99998. The limit is monotone in p, bounded below by the identity line (since W(p) ≥ p) and strictly below 1 by Theorem 3.3. 4 Below the fair coin: perturbation in δ = 1 2 −… view at source ↗

**Figure 2.** Figure 2: The sequence n 7→ cn for 1 ≤ n ≤ 25. Computed via the recursion of Proposition 4.3 in 80-digit rational arithmetic. The sequence increases on {1, . . . , 5} (with maximum at n = 5, marked by a star), strictly decreases for n ≥ 5, and converges from above to L ≈ 1.7035. The lower bound 27 16 from Lemma 4.6 is shown as a dotted line. 4.3 Collapse, lower bound, and linear recursion The four results below — th… view at source ↗

**Figure 3.** Figure 3: plots wn,p against n for five values of p. For p = 0.49 (first-order regime) the sequence bottoms out at n = 5 and increases thereafter, in agreement with Corollary 4.11. For p = 0.42 and p = 0.45 a non-monotone structure emerges: the sequence first decreases, passes through a local minimum, then rises to a local maximum before decreasing again and converging. 1 3 5 7 9 11 13 15 17 19 n (number of coins) 0… view at source ↗

**Figure 4.** Figure 4: Zoomed-in view of wn,p for p ∈ {0.42, 0.45} and n ∈ [5, 20]. Stars mark local maxima, inverted triangles mark local minima. At p = 0.42 there is a local maximum at n = 9; at p = 0.45 there is a local maximum at n = 15 view at source ↗

read the original abstract

We study a sequential coin-flipping game: a player starts with $n$~coins, each heads with probability~$p$, and in each round flips all remaining coins and must set aside at least one head, losing if none shows. The player wins once all coins have been set aside. The optimal winning probability~$w_{n,p}$ obeys a Bellman equation with a nonlinear suffix-maximum operator. For $p=\tfrac12$ every strategy achieves $w_{n,1/2}=\tfrac12$. For $p>\tfrac12$ the strategy~\One{} (set aside a single head) is optimal, $n\mapsto w_{n,p}$ is strictly increasing, and the limit $W(p):=\lim_n w_{n,p}$ has an explicit series representation with $p\le W(p)<1$. For $p<\tfrac12$ near~$\tfrac12$ we give a first-order perturbation expansion in $\delta:=\tfrac12-p$: the deficit satisfies $\tfrac12-w_{n,\,1/2-\delta}\approx\delta\,c_n$, where $c_n$ obeys a linear recursion for $n\ge7$ with limit $L\approx1.7035$. To first order the optimal-value sequence has a strict local minimum at $n=5$ and no local maximum.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives an explicit series for the limiting win probability W(p) and a first-order perturbation recursion for c_n that shows a local minimum at n=5 when p is slightly below 1/2.

read the letter

The key points are that One is optimal for p > 1/2 with the win probability strictly increasing in n, and that near p = 1/2 from below the first-order correction has a local minimum at n = 5. The paper supplies an explicit series for the limiting W(p) and a linear recursion for the coefficients in the perturbation. This is new material. The recursion for c_n is derived from the Bellman equation under the optimal policy, and they compute its limit numerically to about 1.7035. The work also confirms the trivial case at p = 1/2 where everything equals 1/2. They do a clean job setting up the MDP and extracting these quantitative features without extra assumptions. The main soft spot is the optimality claim for One when p > 1/2. It rests on showing that the suffix-max operator in the Bellman equation always picks k=1 over other choices. The paper compares One and All directly and uses monotonicity, but a full induction or inequality proof ruling out all intermediate k for every state would be needed to be airtight. If that's fully carried out, the series follows directly; otherwise the expressions are only for those two policies. The perturbation being first-order means it doesn't speak to larger deviations from 1/2. This paper is for researchers in stochastic control or probability who want exact or perturbative results on small-state MDPs like this coin game. It won't change broad practice but gives verifiable formulas a specialist can use or extend. The math is honest and the derivations look reproducible from the recursions given. I would recommend sending it to peer review. The explicit results make it worth a referee's time to check the proofs in detail.

Referee Report

2 major / 2 minor

Summary. The manuscript studies a stochastic shortest-path MDP for a coin-flipping game starting with n coins, each heads with probability p. In each step the player flips all remaining coins and must set aside at least one heads (losing on all tails); the goal is to set aside all coins. The Bellman operator is a nonlinear suffix-maximum over the number k of heads to set aside. The paper analyzes the pure policies 'One' (k=1) and 'All' (k=H), shows that for p=1/2 every policy yields w_{n,1/2}=1/2, claims that for p>1/2 the policy One is optimal with n↦w_{n,p} strictly increasing, derives an explicit series for the limit W(p) satisfying p≤W(p)<1, and for p<1/2 near 1/2 performs a first-order perturbation in δ=1/2-p showing that the deficit is ≈δ c_n where c_n obeys a linear recursion (n≥7) with limit L≈1.7035, implying a strict local minimum at n=5 to first order (local maxima being non-perturbative).

Significance. If the optimality and perturbation claims hold, the work supplies a complete characterization of the value function for this MDP, including an explicit series representation for the limiting winning probability and a recursion that yields a concrete numerical constant L from the model itself rather than external fitting. These features, together with the demonstration that the first-order local-minimum structure at n=5 is a perturbative phenomenon, constitute the principal contributions.

major comments (2)

[the optimality analysis for p>1/2] The claim that One is optimal for p>1/2 (and the consequent identification of w_{n,p} with the One recursion) rests on direct comparison of the two pure policies but does not establish that the argmax in the nonlinear suffix-max Bellman operator is attained at k=1 rather than at some intermediate k when H>1. No induction on the state m or first-order optimality condition for the value function v_m is supplied to rule out intermediate actions. This is load-bearing for the monotonicity statement, the series for W(p), and all downstream conclusions.
[the perturbation analysis and recursion for c_n] The first-order perturbation expansion for p=1/2-δ and the resulting local-minimum conclusion at n=5 rely on the same identification of the optimal policy; if intermediate k can improve the value for some states when p<1/2, the linear recursion for c_n and the sign pattern that produces the local minimum at n=5 would require re-derivation. The manuscript should verify that the perturbation preserves the optimality of the extreme policies to the order considered.

minor comments (2)

[§2 (model and Bellman equation)] The notation for the value function under a general policy versus the optimal w_{n,p} should be introduced with a single consistent definition early in the text to avoid ambiguity when switching between policy values and the Bellman operator.
[the section presenting the recursion and its limit] The numerical evaluation of L≈1.7035 is obtained by iterating the linear recursion; a short table or plot of the first 20 terms of c_n would make the convergence rate and the claimed limit transparent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for identifying the need to strengthen the optimality argument and its implications for the perturbation analysis. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [the optimality analysis for p>1/2] The claim that One is optimal for p>1/2 (and the consequent identification of w_{n,p} with the One recursion) rests on direct comparison of the two pure policies but does not establish that the argmax in the nonlinear suffix-max Bellman operator is attained at k=1 rather than at some intermediate k when H>1. No induction on the state m or first-order optimality condition for the value function v_m is supplied to rule out intermediate actions. This is load-bearing for the monotonicity statement, the series for W(p), and all downstream conclusions.

Authors: We agree that establishing optimality of the One policy requires more than comparing it to the All policy; an explicit argument is needed to confirm that the Bellman operator's argmax occurs at k=1 for every state when p>1/2. The manuscript demonstrates that One strictly outperforms All for p>1/2 and derives the associated recursion, but does not supply the full induction ruling out intermediate k. In the revision we will add an inductive proof on the state size m: assuming the claim holds for all smaller states, we show that for any current state with H heads the expected value under k=1 exceeds that under any k>1 when p>1/2. This induction simultaneously yields the strict monotonicity of w_{n,p} and justifies the series for the limit W(p). revision: yes
Referee: [the perturbation analysis and recursion for c_n] The first-order perturbation expansion for p=1/2-δ and the resulting local-minimum conclusion at n=5 rely on the same identification of the optimal policy; if intermediate k can improve the value for some states when p<1/2, the linear recursion for c_n and the sign pattern that produces the local minimum at n=5 would require re-derivation. The manuscript should verify that the perturbation preserves the optimality of the extreme policies to the order considered.

Authors: We concur that the first-order expansion in δ presupposes that the optimal policy remains at an extreme (k=1 or k=H) through O(δ). At δ=0 every policy yields exactly 1/2, so the linear correction terms must be checked to ensure no intermediate k produces a strictly better O(δ) improvement. In the revised manuscript we will augment the perturbation section with a verification that, to first order, the Bellman operator continues to select an extreme action; this preserves the linear recursion satisfied by c_n for n≥7 and the resulting sign pattern that produces the strict local minimum at n=5 to first order in δ. revision: yes

Circularity Check

0 steps flagged

No circularity; derivations follow directly from Bellman recursion and explicit policy comparison

full rationale

The paper sets up the problem as a stochastic shortest-path MDP whose Bellman operator is the nonlinear suffix-maximum over admissible k. It explicitly compares the value functions of the two pure policies One and All, obtains closed recursions for each, and for p > 1/2 shows that the One recursion yields a strictly increasing sequence whose limit admits an explicit series. For p < 1/2 the first-order perturbation coefficients c_n obey a linear recursion whose coefficients are the model's own transition probabilities; the numerical limit L is obtained by iterating that recursion, not by fitting. No parameter is estimated from data and then re-used as a prediction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in. The central claims are therefore self-contained consequences of the stated Bellman equation and the two-policy comparison.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard probability axioms, the existence of an optimal policy for finite-state stochastic shortest-path MDPs, and the validity of first-order perturbation expansions around p=1/2; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption The game is a finite-state stochastic shortest-path Markov decision process whose value function satisfies the Bellman equation with the nonlinear suffix-maximum operator.
Explicitly invoked in the abstract as the modeling framework.
standard math Standard results on existence of optimal policies and value iteration apply to this MDP.
Implicit foundation for comparing the two strategies and deriving the value function.

pith-pipeline@v0.9.0 · 9844 in / 1681 out tokens · 112781 ms · 2026-05-08T09:58:53.911475+00:00 · methodology

Optimal strategies in the all-heads coin game

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)