Mismatched Guesswork
Pith reviewed 2026-05-25 12:16 UTC · model grok-4.3
The pith
The cost of mismatch for one-to-one codes is at most the divergence between true and mismatched distributions, and vanishes exactly when the mismatch lies on the tilted family of the true source.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In one-to-one source coding the extra average codeword length incurred by using a mismatched distribution ν in place of the true distribution μ is at most D(μ || ν). This extra length is exactly zero if and only if ν belongs to the tilted family generated by μ. The same mismatch cost for prefix-free codes does not vanish under this weaker condition.
What carries the argument
The tilted (exponential) families of the true distribution μ and the mismatched distribution ν, used to characterize both the value of guesswork and the probability that a given symbol is guessed.
If this is right
- Mismatched guesswork obeys a large-deviation principle whose rate function is expressed through information-theoretic quantities.
- One-to-one codes remain at least as robust to mismatch as prefix-free codes and are strictly more robust on the tilted-family condition.
- The probability of successful guessing under mismatch is governed by an exponential family that passes through the true distribution μ.
- The value of guesswork itself is governed by the tilted family of the mismatched distribution ν.
Where Pith is reading between the lines
- One-to-one codes may therefore be preferable whenever source statistics are known only approximately.
- The large-deviation rate for mismatched guesswork could be used to obtain precise exponents for guessing-error probabilities in finite-block settings.
- The same tilted-family condition might govern robustness in other non-prefix coding schemes such as arithmetic coding without termination constraints.
Load-bearing premise
The tilted and exponential families generated by the true and mismatched distributions exist and can be used to express the guesswork value and the guessing probability.
What would settle it
Fix a concrete pair of distributions μ and ν where ν does not lie on the tilted family of μ; compute the difference in average codeword length between the optimal one-to-one code for μ and the one-to-one code constructed from ν; check whether that difference is strictly less than D(μ || ν) or exactly zero.
Figures
read the original abstract
We study the problem of mismatched guesswork, where we evaluate the number of symbols $y \in \mathcal{Y}$ which have higher likelihood than $X \sim \mu$ according to a mismatched distribution $\nu$. We discuss the role of the tilted/exponential families of the source distribution $\mu$ and of the mismatched distribution $\nu$. We show that the value of guesswork can be characterized using the tilted family of the mismatched distribution $\nu$, while the probability of guessing is characterized by an exponential family which passes through $\mu$. Using this characterization, we demonstrate that the mismatched guesswork follows a large deviation principle (LDP), where the rate function is described implicitly using information theoretic quantities. We apply these results to one-to-one source coding (without prefix free constraint) to obtain the cost of mismatch in terms of average codeword length. We show that the cost of mismatch in one-to-one codes is no larger than that of the prefix-free codes, i.e., $D(\mu\| \nu)$. Further, the cost of mismatch vanishes if and only if $\nu$ lies on the tilted family of the true distribution $\mu$, which is in stark contrast to the prefix-free codes. These results imply that one-to-one codes are inherently more robust to mismatch.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies mismatched guesswork, defined as the number of symbols y with higher likelihood than a sample X~μ under a mismatched distribution ν. It characterizes the value of guesswork via the tilted family of ν and the probability of guessing via an exponential family passing through μ. The mismatched guesswork is shown to satisfy a large-deviation principle whose rate function is expressed in information-theoretic terms. These results are applied to one-to-one (non-prefix-free) source coding, yielding that the mismatch cost in average codeword length is at most D(μ‖ν) and equals zero if and only if ν belongs to the tilted family of μ—contrasting with the prefix-free case and implying greater robustness of one-to-one codes.
Significance. If the characterizations and the LDP hold, the work supplies a precise asymptotic comparison between one-to-one and prefix-free coding under mismatch, identifying an explicit condition (membership in the power-tilted family) under which mismatch cost vanishes. The contrast with the classical D(μ‖ν) bound for prefix-free codes is a concrete, falsifiable distinction that could inform the choice of coding constraints when source statistics are imperfectly known.
minor comments (3)
- [Abstract] Abstract: the phrase 'the cost of mismatch in one-to-one codes is no larger than that of the prefix-free codes, i.e., D(μ‖ν)' should be rephrased for precision; the subsequent sentence already states the stricter claim that the cost is ≤ D(μ‖ν) and vanishes under the tilted-family condition.
- [Introduction / §2] The manuscript should state the alphabet cardinality and moment conditions (finite support, positivity of μ and ν) under which the normalizing constants of the tilted and exponential families exist; while standard for finite alphabets, an explicit sentence would remove any ambiguity about the domain of the LDP.
- [§3] Notation for the tilted family ν_θ ∝ ν^θ (or the equivalent power tilt of μ) should be introduced once, with a short display equation, before its repeated use in the LDP and coding sections.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and recommendation of minor revision. The summary accurately captures the core contributions on characterizing mismatched guesswork via tilted/exponential families, the associated LDP, and the application to one-to-one source coding.
Circularity Check
No significant circularity
full rationale
The derivation relies on standard characterizations of guesswork via tilted/exponential families and large-deviation principles drawn from prior literature (not self-citations). The mismatch cost bound D(μ‖ν) for one-to-one codes and the vanishing condition are obtained by applying these established tools to the rank distribution induced by ν; no step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation chain. The paper is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Existence and usability of tilted/exponential families for arbitrary distributions μ and ν to characterize guesswork and guessing probability
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective / generatorOfLawsOfLogic echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
the cost of mismatch vanishes if and only if ν lies on the tilted family of the true distribution μ
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel / dAlembert_to_ODE_general echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Gν(x) = Gμ(x) when μ ∈ T+ν (Lemma 6); L(ν∥μ) = H(ΠTν(μ)) (Theorem 3)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Theoretical Limits of Language Model Alignment
The maximum reward gain under KL-regularized LM alignment is a Jeffreys divergence term, estimable as covariance from base samples, with best-of-N approaching the theoretical limit.
Reference graph
Works this paper leans on
-
[1]
J. L. Massey, “Guessing and entropy,” in IEEE ISIT, 1994, p. 204
work page 1994
-
[2]
An inequality on guessing and its application to sequential decoding,
E. Arikan, “An inequality on guessing and its application to sequential decoding,” IEEE Trans. on Inf. Theory , vol. 42, no. 1, pp. 99–105, 1996
work page 1996
-
[3]
Multi-user guesswork and brute force security,
M. M. Christiansen, K. R. Duffy, F. du Pin Calmon, and M. M ´edard, “Multi-user guesswork and brute force security,” IEEE Trans. Inf. Theory , vol. 61, no. 12, pp. 6876–6886, 2015
work page 2015
-
[4]
Quantifying computational security subject to source constraints, guesswork and inscrutability,
A. Beirami, R. Calderbank, K. Duffy, and M. M ´edard, “Quantifying computational security subject to source constraints, guesswork and inscrutability,” in IEEE ISIT, 2015
work page 2015
-
[5]
List decoding-random coding exponents and expurgated exponents,
N. Merhav, “List decoding-random coding exponents and expurgated exponents,” IEEE Trans. on Inf. Theory, vol. 60, no. 11, pp. 6749–6759, Nov. 2014
work page 2014
-
[6]
Erasure/list random coding error exponents are not universally achievable,
W. Huleihel, N. Weinberger, and N. Merhav, “Erasure/list random coding error exponents are not universally achievable,” IEEE Trans. on Inf. Theory , 2014
work page 2014
-
[7]
Asymptotics and non-asymptotics for universal fixed-to-variable source coding,
O. Kosut and L. Sankar, “Asymptotics and non-asymptotics for universal fixed-to-variable source coding,” IIEEE Trans. Inf. Theory , 2017
work page 2017
-
[8]
Guessing subject to distortion,
E. Arikan and N. Merhav, “Guessing subject to distortion,” IEEE Trans. on Inf. Theory , vol. 44, no. 3, pp. 1041–1056, May 1998
work page 1998
-
[9]
Guessing under source uncertainty,
R. Sundaresan, “Guessing under source uncertainty,” IEEE Trans. on Inf. Theory , vol. 53, no. 1, pp. 525–526, Jan. 2007
work page 2007
-
[10]
Guessing revisited: A large deviations approach,
M. K. Hanawal and R. Sundaresan, “Guessing revisited: A large deviations approach,” IEEE Trans. on Inf. Theory , vol. 57, no. 1, pp. 70–78, Jan. 2011
work page 2011
-
[11]
Universal Randomized Guessing with Application to Asynchronous Decentralized Brute-Force Attacks
N. Merhav and A. Cohen, “Universal randomized guessing with application to asynchronous decentralized brute-force attacks,” arXiv preprint arXiv:1811.04363, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[12]
Guesswork, large deviations, and Shannon entropy,
M. M. Christiansen and K. R. Duffy, “Guesswork, large deviations, and Shannon entropy,” IEEE Trans. on Inf. Theory , vol. 59, no. 2, pp. 796–802, Feb. 2013
work page 2013
-
[13]
A characterization of guesswork on swiftly tilting curves,
A. Beirami, R. Calderbank, M. Christiansen, K. Duffy, and M. M ´edard, “A characterization of guesswork on swiftly tilting curves,” IEEE Trans. Inf. Theory, 2019
work page 2019
-
[14]
Centralized vs decentralized multi-agent guesswork,
S. Salamatian, A. Beirami, A. Cohen, and M. M ´edard, “Centralized vs decentralized multi-agent guesswork,” in IEEE ISIT, 2017
work page 2017
-
[15]
Guesswork subject to a total entropy budget,
A. Rezaee, A. Beirami, A. Makhdoumi, M. M ´edard, and K. Duffy, “Guesswork subject to a total entropy budget,” in Allerton Conf. on Comm., Control, and Computing. IEEE, 2017, pp. 1008–1015
work page 2017
-
[16]
Why botnets work: distributed brute-force attacks need no synchronization,
S. Salamatian, W. Huleihel, A. Beirami, A. Cohen, and M. M ´edard, “Why botnets work: distributed brute-force attacks need no synchronization,” IEEE Trans. on Inform. Forensics and Security , 2019
work page 2019
-
[17]
A geometric perspective on guesswork,
A. Beirami, R. Calderbank, M. Christiansen, K. Duffy, A. Makhdoumi, and M. M ´edard, “A geometric perspective on guesswork,” in Allerton Conf. on Comm., Control, and Computing , 2015
work page 2015
-
[18]
A one-to-one code and its anti-redundancy,
W. Szpankowski, “A one-to-one code and its anti-redundancy,” IEEE Trans. on Inf. Theory , vol. 54, no. 10, pp. 4762–4766, 2008
work page 2008
-
[19]
Fundamental limits of universal lossless one-to-one compression of parametric sources,
A. Beirami and F. Fekri, “Fundamental limits of universal lossless one-to-one compression of parametric sources,” in IEEE ITW, 2014
work page 2014
-
[20]
Information theory and statistics: A tutorial,
I. Csisz ´ar, P. C. Shields et al. , “Information theory and statistics: A tutorial,” Foundations and Trends R⃝ in Comm. and Inf. Theory , vol. 1, no. 4, pp. 417–528, 2004
work page 2004
-
[21]
A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications . Springer, 1998
work page 1998
-
[22]
S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004
work page 2004
-
[23]
Cumulant generating function of codeword lengths in optimal lossless compression
T. A. Courtade and S. Verd ´u, “Cumulant generating function of codeword lengths in optimal lossless compression.” in ISIT, 2014
work page 2014
-
[24]
I-projection and the geometry of error exponents,
S. Borade and L. Zheng, “I-projection and the geometry of error exponents,” in Allerton Conf. on Comm., Control, and Computing , 2006
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.