Mismatched Guesswork

Ahmad Beirami; Litian Liu; Muriel M\'edard; Salman Salamatian

arxiv: 1907.00531 · v1 · pith:YJW3LT37new · submitted 2019-07-01 · 💻 cs.IT · math.IT

Mismatched Guesswork

Salman Salamatian , Litian Liu , Ahmad Beirami , Muriel M\'edard This is my paper

Pith reviewed 2026-05-25 12:16 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords guessworkmismatched distributionsone-to-one codessource codinglarge deviationstilted familiesexponential familiesKullback-Leibler divergence

0 comments

The pith

The cost of mismatch for one-to-one codes is at most the divergence between true and mismatched distributions, and vanishes exactly when the mismatch lies on the tilted family of the true source.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the number of symbols that appear more likely than a true sample when ranked by a mismatched distribution. It derives characterizations of this guesswork quantity and the associated probability using the tilted and exponential families of the two distributions. These characterizations are then used to analyze one-to-one source coding without a prefix-free requirement. The resulting bound on the extra average codeword length caused by mismatch is no worse than the bound that holds for prefix-free codes. Moreover the extra length disappears under a weaker condition on the mismatch than the one required for prefix-free codes.

Core claim

In one-to-one source coding the extra average codeword length incurred by using a mismatched distribution ν in place of the true distribution μ is at most D(μ || ν). This extra length is exactly zero if and only if ν belongs to the tilted family generated by μ. The same mismatch cost for prefix-free codes does not vanish under this weaker condition.

What carries the argument

The tilted (exponential) families of the true distribution μ and the mismatched distribution ν, used to characterize both the value of guesswork and the probability that a given symbol is guessed.

If this is right

Mismatched guesswork obeys a large-deviation principle whose rate function is expressed through information-theoretic quantities.
One-to-one codes remain at least as robust to mismatch as prefix-free codes and are strictly more robust on the tilted-family condition.
The probability of successful guessing under mismatch is governed by an exponential family that passes through the true distribution μ.
The value of guesswork itself is governed by the tilted family of the mismatched distribution ν.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

One-to-one codes may therefore be preferable whenever source statistics are known only approximately.
The large-deviation rate for mismatched guesswork could be used to obtain precise exponents for guessing-error probabilities in finite-block settings.
The same tilted-family condition might govern robustness in other non-prefix coding schemes such as arithmetic coding without termination constraints.

Load-bearing premise

The tilted and exponential families generated by the true and mismatched distributions exist and can be used to express the guesswork value and the guessing probability.

What would settle it

Fix a concrete pair of distributions μ and ν where ν does not lie on the tilted family of μ; compute the difference in average codeword length between the optimal one-to-one code for μ and the one-to-one code constructed from ν; check whether that difference is strictly less than D(μ || ν) or exactly zero.

Figures

Figures reproduced from arXiv: 1907.00531 by Ahmad Beirami, Litian Liu, Muriel M\'edard, Salman Salamatian.

**Figure 1.** Figure 1: Representation of the 3-dimensional simplex, each point in the triangle represents a distribution over [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Rate function J(t) of { 1 n gν(Xn)}, for a distribution over three symbols µ = (0.05, 0.1, 0.85). We proceed with the proof in three separate cases. Case (a): We let t ∈ (H(ν), log |X |), which implies α(t) ∈ (0, 1) by monotonicity of H(T(ν, α)) for non-negative α. Note that (28) and (30) respecitvely imply lim ↓0 lim sup n→∞ 1 n log Pµn [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of Corollary 2. The distributions are identical as in Figure 1. Note that, as [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

We study the problem of mismatched guesswork, where we evaluate the number of symbols $y \in \mathcal{Y}$ which have higher likelihood than $X \sim \mu$ according to a mismatched distribution $\nu$. We discuss the role of the tilted/exponential families of the source distribution $\mu$ and of the mismatched distribution $\nu$. We show that the value of guesswork can be characterized using the tilted family of the mismatched distribution $\nu$, while the probability of guessing is characterized by an exponential family which passes through $\mu$. Using this characterization, we demonstrate that the mismatched guesswork follows a large deviation principle (LDP), where the rate function is described implicitly using information theoretic quantities. We apply these results to one-to-one source coding (without prefix free constraint) to obtain the cost of mismatch in terms of average codeword length. We show that the cost of mismatch in one-to-one codes is no larger than that of the prefix-free codes, i.e., $D(\mu\| \nu)$. Further, the cost of mismatch vanishes if and only if $\nu$ lies on the tilted family of the true distribution $\mu$, which is in stark contrast to the prefix-free codes. These results imply that one-to-one codes are inherently more robust to mismatch.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

One-to-one codes are more robust to mismatch than prefix-free codes in guesswork, with cost vanishing exactly on the tilted family of the true distribution.

read the letter

The main takeaway is that one-to-one codes show a clear robustness advantage over prefix-free codes under distribution mismatch for guesswork. The extra cost is at most the KL divergence D(μ||ν), and it drops to zero if and only if the mismatched distribution lies on the tilted family of the true one. This is the opposite of what happens with prefix-free codes, where the cost does not vanish under the same condition. The paper reaches this by characterizing the guesswork value through the tilted family of the mismatched distribution and the probability of guessing through an exponential family passing through the true distribution, then applying large deviations to obtain the rate function for the mismatch cost in one-to-one source coding. The argument uses the fact that power tilts preserve the ordering of blocks, so the rank distribution (and thus the one-to-one length) stays the same. This looks like a genuine addition relative to earlier mismatched source coding work. The derivations rest on standard KL divergence and LDP tools without circular redefinitions, and the stress-test confirms the structure holds in the usual finite-alphabet setting with no extra moment conditions required. The only soft spot is that the abstract leaves the precise alphabet and existence conditions implicit, so a referee would probably want those stated explicitly in the theorems, but nothing appears load-bearing. This is for information theorists working on guesswork, mismatched decoding, or source coding robustness. A reader already following that literature would get a specific, usable result. It deserves a serious referee because the iff condition and the one-to-one versus prefix-free contrast are concrete enough to check and build on. I would send it out for review.

Referee Report

0 major / 3 minor

Summary. The paper studies mismatched guesswork, defined as the number of symbols y with higher likelihood than a sample X~μ under a mismatched distribution ν. It characterizes the value of guesswork via the tilted family of ν and the probability of guessing via an exponential family passing through μ. The mismatched guesswork is shown to satisfy a large-deviation principle whose rate function is expressed in information-theoretic terms. These results are applied to one-to-one (non-prefix-free) source coding, yielding that the mismatch cost in average codeword length is at most D(μ‖ν) and equals zero if and only if ν belongs to the tilted family of μ—contrasting with the prefix-free case and implying greater robustness of one-to-one codes.

Significance. If the characterizations and the LDP hold, the work supplies a precise asymptotic comparison between one-to-one and prefix-free coding under mismatch, identifying an explicit condition (membership in the power-tilted family) under which mismatch cost vanishes. The contrast with the classical D(μ‖ν) bound for prefix-free codes is a concrete, falsifiable distinction that could inform the choice of coding constraints when source statistics are imperfectly known.

minor comments (3)

[Abstract] Abstract: the phrase 'the cost of mismatch in one-to-one codes is no larger than that of the prefix-free codes, i.e., D(μ‖ν)' should be rephrased for precision; the subsequent sentence already states the stricter claim that the cost is ≤ D(μ‖ν) and vanishes under the tilted-family condition.
[Introduction / §2] The manuscript should state the alphabet cardinality and moment conditions (finite support, positivity of μ and ν) under which the normalizing constants of the tilted and exponential families exist; while standard for finite alphabets, an explicit sentence would remove any ambiguity about the domain of the LDP.
[§3] Notation for the tilted family ν_θ ∝ ν^θ (or the equivalent power tilt of μ) should be introduced once, with a short display equation, before its repeated use in the LDP and coding sections.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation of minor revision. The summary accurately captures the core contributions on characterizing mismatched guesswork via tilted/exponential families, the associated LDP, and the application to one-to-one source coding.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation relies on standard characterizations of guesswork via tilted/exponential families and large-deviation principles drawn from prior literature (not self-citations). The mismatch cost bound D(μ‖ν) for one-to-one codes and the vanishing condition are obtained by applying these established tools to the rank distribution induced by ν; no step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation chain. The paper is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract invokes the existence of tilted and exponential families as the mechanism for characterization; these are standard constructions in information theory but function here as the key modeling step that enables the LDP and coding bounds.

axioms (1)

domain assumption Existence and usability of tilted/exponential families for arbitrary distributions μ and ν to characterize guesswork and guessing probability
Stated in the first two sentences of the abstract as the basis for all subsequent claims.

pith-pipeline@v0.9.0 · 5758 in / 1297 out tokens · 28909 ms · 2026-05-25T12:16:08.108047+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective / generatorOfLawsOfLogic echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

the cost of mismatch vanishes if and only if ν lies on the tilted family of the true distribution μ
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel / dAlembert_to_ODE_general echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Gν(x) = Gμ(x) when μ ∈ T+ν (Lemma 6); L(ν∥μ) = H(ΠTν(μ)) (Theorem 3)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Theoretical Limits of Language Model Alignment
cs.LG 2026-05 unverdicted novelty 7.0

The maximum reward gain under KL-regularized LM alignment is a Jeffreys divergence term, estimable as covariance from base samples, with best-of-N approaching the theoretical limit.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Guessing and entropy,

J. L. Massey, “Guessing and entropy,” in IEEE ISIT, 1994, p. 204

work page 1994
[2]

An inequality on guessing and its application to sequential decoding,

E. Arikan, “An inequality on guessing and its application to sequential decoding,” IEEE Trans. on Inf. Theory , vol. 42, no. 1, pp. 99–105, 1996

work page 1996
[3]

Multi-user guesswork and brute force security,

M. M. Christiansen, K. R. Duffy, F. du Pin Calmon, and M. M ´edard, “Multi-user guesswork and brute force security,” IEEE Trans. Inf. Theory , vol. 61, no. 12, pp. 6876–6886, 2015

work page 2015
[4]

Quantifying computational security subject to source constraints, guesswork and inscrutability,

A. Beirami, R. Calderbank, K. Duffy, and M. M ´edard, “Quantifying computational security subject to source constraints, guesswork and inscrutability,” in IEEE ISIT, 2015

work page 2015
[5]

List decoding-random coding exponents and expurgated exponents,

N. Merhav, “List decoding-random coding exponents and expurgated exponents,” IEEE Trans. on Inf. Theory, vol. 60, no. 11, pp. 6749–6759, Nov. 2014

work page 2014
[6]

Erasure/list random coding error exponents are not universally achievable,

W. Huleihel, N. Weinberger, and N. Merhav, “Erasure/list random coding error exponents are not universally achievable,” IEEE Trans. on Inf. Theory , 2014

work page 2014
[7]

Asymptotics and non-asymptotics for universal ﬁxed-to-variable source coding,

O. Kosut and L. Sankar, “Asymptotics and non-asymptotics for universal ﬁxed-to-variable source coding,” IIEEE Trans. Inf. Theory , 2017

work page 2017
[8]

Guessing subject to distortion,

E. Arikan and N. Merhav, “Guessing subject to distortion,” IEEE Trans. on Inf. Theory , vol. 44, no. 3, pp. 1041–1056, May 1998

work page 1998
[9]

Guessing under source uncertainty,

R. Sundaresan, “Guessing under source uncertainty,” IEEE Trans. on Inf. Theory , vol. 53, no. 1, pp. 525–526, Jan. 2007

work page 2007
[10]

Guessing revisited: A large deviations approach,

M. K. Hanawal and R. Sundaresan, “Guessing revisited: A large deviations approach,” IEEE Trans. on Inf. Theory , vol. 57, no. 1, pp. 70–78, Jan. 2011

work page 2011
[11]

Universal Randomized Guessing with Application to Asynchronous Decentralized Brute-Force Attacks

N. Merhav and A. Cohen, “Universal randomized guessing with application to asynchronous decentralized brute-force attacks,” arXiv preprint arXiv:1811.04363, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Guesswork, large deviations, and Shannon entropy,

M. M. Christiansen and K. R. Duffy, “Guesswork, large deviations, and Shannon entropy,” IEEE Trans. on Inf. Theory , vol. 59, no. 2, pp. 796–802, Feb. 2013

work page 2013
[13]

A characterization of guesswork on swiftly tilting curves,

A. Beirami, R. Calderbank, M. Christiansen, K. Duffy, and M. M ´edard, “A characterization of guesswork on swiftly tilting curves,” IEEE Trans. Inf. Theory, 2019

work page 2019
[14]

Centralized vs decentralized multi-agent guesswork,

S. Salamatian, A. Beirami, A. Cohen, and M. M ´edard, “Centralized vs decentralized multi-agent guesswork,” in IEEE ISIT, 2017

work page 2017
[15]

Guesswork subject to a total entropy budget,

A. Rezaee, A. Beirami, A. Makhdoumi, M. M ´edard, and K. Duffy, “Guesswork subject to a total entropy budget,” in Allerton Conf. on Comm., Control, and Computing. IEEE, 2017, pp. 1008–1015

work page 2017
[16]

Why botnets work: distributed brute-force attacks need no synchronization,

S. Salamatian, W. Huleihel, A. Beirami, A. Cohen, and M. M ´edard, “Why botnets work: distributed brute-force attacks need no synchronization,” IEEE Trans. on Inform. Forensics and Security , 2019

work page 2019
[17]

A geometric perspective on guesswork,

A. Beirami, R. Calderbank, M. Christiansen, K. Duffy, A. Makhdoumi, and M. M ´edard, “A geometric perspective on guesswork,” in Allerton Conf. on Comm., Control, and Computing , 2015

work page 2015
[18]

A one-to-one code and its anti-redundancy,

W. Szpankowski, “A one-to-one code and its anti-redundancy,” IEEE Trans. on Inf. Theory , vol. 54, no. 10, pp. 4762–4766, 2008

work page 2008
[19]

Fundamental limits of universal lossless one-to-one compression of parametric sources,

A. Beirami and F. Fekri, “Fundamental limits of universal lossless one-to-one compression of parametric sources,” in IEEE ITW, 2014

work page 2014
[20]

Information theory and statistics: A tutorial,

I. Csisz ´ar, P. C. Shields et al. , “Information theory and statistics: A tutorial,” Foundations and Trends R⃝ in Comm. and Inf. Theory , vol. 1, no. 4, pp. 417–528, 2004

work page 2004
[21]

Dembo and O

A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications . Springer, 1998

work page 1998
[22]

Boyd and L

S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004

work page 2004
[23]

Cumulant generating function of codeword lengths in optimal lossless compression

T. A. Courtade and S. Verd ´u, “Cumulant generating function of codeword lengths in optimal lossless compression.” in ISIT, 2014

work page 2014
[24]

I-projection and the geometry of error exponents,

S. Borade and L. Zheng, “I-projection and the geometry of error exponents,” in Allerton Conf. on Comm., Control, and Computing , 2006

work page 2006

[1] [1]

Guessing and entropy,

J. L. Massey, “Guessing and entropy,” in IEEE ISIT, 1994, p. 204

work page 1994

[2] [2]

An inequality on guessing and its application to sequential decoding,

E. Arikan, “An inequality on guessing and its application to sequential decoding,” IEEE Trans. on Inf. Theory , vol. 42, no. 1, pp. 99–105, 1996

work page 1996

[3] [3]

Multi-user guesswork and brute force security,

M. M. Christiansen, K. R. Duffy, F. du Pin Calmon, and M. M ´edard, “Multi-user guesswork and brute force security,” IEEE Trans. Inf. Theory , vol. 61, no. 12, pp. 6876–6886, 2015

work page 2015

[4] [4]

Quantifying computational security subject to source constraints, guesswork and inscrutability,

A. Beirami, R. Calderbank, K. Duffy, and M. M ´edard, “Quantifying computational security subject to source constraints, guesswork and inscrutability,” in IEEE ISIT, 2015

work page 2015

[5] [5]

List decoding-random coding exponents and expurgated exponents,

N. Merhav, “List decoding-random coding exponents and expurgated exponents,” IEEE Trans. on Inf. Theory, vol. 60, no. 11, pp. 6749–6759, Nov. 2014

work page 2014

[6] [6]

Erasure/list random coding error exponents are not universally achievable,

W. Huleihel, N. Weinberger, and N. Merhav, “Erasure/list random coding error exponents are not universally achievable,” IEEE Trans. on Inf. Theory , 2014

work page 2014

[7] [7]

Asymptotics and non-asymptotics for universal ﬁxed-to-variable source coding,

O. Kosut and L. Sankar, “Asymptotics and non-asymptotics for universal ﬁxed-to-variable source coding,” IIEEE Trans. Inf. Theory , 2017

work page 2017

[8] [8]

Guessing subject to distortion,

E. Arikan and N. Merhav, “Guessing subject to distortion,” IEEE Trans. on Inf. Theory , vol. 44, no. 3, pp. 1041–1056, May 1998

work page 1998

[9] [9]

Guessing under source uncertainty,

R. Sundaresan, “Guessing under source uncertainty,” IEEE Trans. on Inf. Theory , vol. 53, no. 1, pp. 525–526, Jan. 2007

work page 2007

[10] [10]

Guessing revisited: A large deviations approach,

M. K. Hanawal and R. Sundaresan, “Guessing revisited: A large deviations approach,” IEEE Trans. on Inf. Theory , vol. 57, no. 1, pp. 70–78, Jan. 2011

work page 2011

[11] [11]

Universal Randomized Guessing with Application to Asynchronous Decentralized Brute-Force Attacks

N. Merhav and A. Cohen, “Universal randomized guessing with application to asynchronous decentralized brute-force attacks,” arXiv preprint arXiv:1811.04363, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Guesswork, large deviations, and Shannon entropy,

M. M. Christiansen and K. R. Duffy, “Guesswork, large deviations, and Shannon entropy,” IEEE Trans. on Inf. Theory , vol. 59, no. 2, pp. 796–802, Feb. 2013

work page 2013

[13] [13]

A characterization of guesswork on swiftly tilting curves,

A. Beirami, R. Calderbank, M. Christiansen, K. Duffy, and M. M ´edard, “A characterization of guesswork on swiftly tilting curves,” IEEE Trans. Inf. Theory, 2019

work page 2019

[14] [14]

Centralized vs decentralized multi-agent guesswork,

S. Salamatian, A. Beirami, A. Cohen, and M. M ´edard, “Centralized vs decentralized multi-agent guesswork,” in IEEE ISIT, 2017

work page 2017

[15] [15]

Guesswork subject to a total entropy budget,

A. Rezaee, A. Beirami, A. Makhdoumi, M. M ´edard, and K. Duffy, “Guesswork subject to a total entropy budget,” in Allerton Conf. on Comm., Control, and Computing. IEEE, 2017, pp. 1008–1015

work page 2017

[16] [16]

Why botnets work: distributed brute-force attacks need no synchronization,

S. Salamatian, W. Huleihel, A. Beirami, A. Cohen, and M. M ´edard, “Why botnets work: distributed brute-force attacks need no synchronization,” IEEE Trans. on Inform. Forensics and Security , 2019

work page 2019

[17] [17]

A geometric perspective on guesswork,

A. Beirami, R. Calderbank, M. Christiansen, K. Duffy, A. Makhdoumi, and M. M ´edard, “A geometric perspective on guesswork,” in Allerton Conf. on Comm., Control, and Computing , 2015

work page 2015

[18] [18]

A one-to-one code and its anti-redundancy,

W. Szpankowski, “A one-to-one code and its anti-redundancy,” IEEE Trans. on Inf. Theory , vol. 54, no. 10, pp. 4762–4766, 2008

work page 2008

[19] [19]

Fundamental limits of universal lossless one-to-one compression of parametric sources,

A. Beirami and F. Fekri, “Fundamental limits of universal lossless one-to-one compression of parametric sources,” in IEEE ITW, 2014

work page 2014

[20] [20]

Information theory and statistics: A tutorial,

I. Csisz ´ar, P. C. Shields et al. , “Information theory and statistics: A tutorial,” Foundations and Trends R⃝ in Comm. and Inf. Theory , vol. 1, no. 4, pp. 417–528, 2004

work page 2004

[21] [21]

Dembo and O

A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications . Springer, 1998

work page 1998

[22] [22]

Boyd and L

S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004

work page 2004

[23] [23]

Cumulant generating function of codeword lengths in optimal lossless compression

T. A. Courtade and S. Verd ´u, “Cumulant generating function of codeword lengths in optimal lossless compression.” in ISIT, 2014

work page 2014

[24] [24]

I-projection and the geometry of error exponents,

S. Borade and L. Zheng, “I-projection and the geometry of error exponents,” in Allerton Conf. on Comm., Control, and Computing , 2006

work page 2006