pith. sign in

arxiv: 1907.00531 · v1 · pith:YJW3LT37new · submitted 2019-07-01 · 💻 cs.IT · math.IT

Mismatched Guesswork

Pith reviewed 2026-05-25 12:16 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords guessworkmismatched distributionsone-to-one codessource codinglarge deviationstilted familiesexponential familiesKullback-Leibler divergence
0
0 comments X

The pith

The cost of mismatch for one-to-one codes is at most the divergence between true and mismatched distributions, and vanishes exactly when the mismatch lies on the tilted family of the true source.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the number of symbols that appear more likely than a true sample when ranked by a mismatched distribution. It derives characterizations of this guesswork quantity and the associated probability using the tilted and exponential families of the two distributions. These characterizations are then used to analyze one-to-one source coding without a prefix-free requirement. The resulting bound on the extra average codeword length caused by mismatch is no worse than the bound that holds for prefix-free codes. Moreover the extra length disappears under a weaker condition on the mismatch than the one required for prefix-free codes.

Core claim

In one-to-one source coding the extra average codeword length incurred by using a mismatched distribution ν in place of the true distribution μ is at most D(μ || ν). This extra length is exactly zero if and only if ν belongs to the tilted family generated by μ. The same mismatch cost for prefix-free codes does not vanish under this weaker condition.

What carries the argument

The tilted (exponential) families of the true distribution μ and the mismatched distribution ν, used to characterize both the value of guesswork and the probability that a given symbol is guessed.

If this is right

  • Mismatched guesswork obeys a large-deviation principle whose rate function is expressed through information-theoretic quantities.
  • One-to-one codes remain at least as robust to mismatch as prefix-free codes and are strictly more robust on the tilted-family condition.
  • The probability of successful guessing under mismatch is governed by an exponential family that passes through the true distribution μ.
  • The value of guesswork itself is governed by the tilted family of the mismatched distribution ν.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • One-to-one codes may therefore be preferable whenever source statistics are known only approximately.
  • The large-deviation rate for mismatched guesswork could be used to obtain precise exponents for guessing-error probabilities in finite-block settings.
  • The same tilted-family condition might govern robustness in other non-prefix coding schemes such as arithmetic coding without termination constraints.

Load-bearing premise

The tilted and exponential families generated by the true and mismatched distributions exist and can be used to express the guesswork value and the guessing probability.

What would settle it

Fix a concrete pair of distributions μ and ν where ν does not lie on the tilted family of μ; compute the difference in average codeword length between the optimal one-to-one code for μ and the one-to-one code constructed from ν; check whether that difference is strictly less than D(μ || ν) or exactly zero.

Figures

Figures reproduced from arXiv: 1907.00531 by Ahmad Beirami, Litian Liu, Muriel M\'edard, Salman Salamatian.

Figure 1
Figure 1. Figure 1: Representation of the 3-dimensional simplex, each point in the triangle represents a distribution over [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Rate function J(t) of { 1 n gν(Xn)}, for a distribution over three symbols µ = (0.05, 0.1, 0.85). We proceed with the proof in three separate cases. Case (a): We let t ∈ (H(ν), log |X |), which implies α(t) ∈ (0, 1) by monotonicity of H(T(ν, α)) for non-negative α. Note that (28) and (30) respecitvely imply lim ↓0 lim sup n→∞ 1 n log Pµn  [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of Corollary 2. The distributions are identical as in Figure 1. Note that, as [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

We study the problem of mismatched guesswork, where we evaluate the number of symbols $y \in \mathcal{Y}$ which have higher likelihood than $X \sim \mu$ according to a mismatched distribution $\nu$. We discuss the role of the tilted/exponential families of the source distribution $\mu$ and of the mismatched distribution $\nu$. We show that the value of guesswork can be characterized using the tilted family of the mismatched distribution $\nu$, while the probability of guessing is characterized by an exponential family which passes through $\mu$. Using this characterization, we demonstrate that the mismatched guesswork follows a large deviation principle (LDP), where the rate function is described implicitly using information theoretic quantities. We apply these results to one-to-one source coding (without prefix free constraint) to obtain the cost of mismatch in terms of average codeword length. We show that the cost of mismatch in one-to-one codes is no larger than that of the prefix-free codes, i.e., $D(\mu\| \nu)$. Further, the cost of mismatch vanishes if and only if $\nu$ lies on the tilted family of the true distribution $\mu$, which is in stark contrast to the prefix-free codes. These results imply that one-to-one codes are inherently more robust to mismatch.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper studies mismatched guesswork, defined as the number of symbols y with higher likelihood than a sample X~μ under a mismatched distribution ν. It characterizes the value of guesswork via the tilted family of ν and the probability of guessing via an exponential family passing through μ. The mismatched guesswork is shown to satisfy a large-deviation principle whose rate function is expressed in information-theoretic terms. These results are applied to one-to-one (non-prefix-free) source coding, yielding that the mismatch cost in average codeword length is at most D(μ‖ν) and equals zero if and only if ν belongs to the tilted family of μ—contrasting with the prefix-free case and implying greater robustness of one-to-one codes.

Significance. If the characterizations and the LDP hold, the work supplies a precise asymptotic comparison between one-to-one and prefix-free coding under mismatch, identifying an explicit condition (membership in the power-tilted family) under which mismatch cost vanishes. The contrast with the classical D(μ‖ν) bound for prefix-free codes is a concrete, falsifiable distinction that could inform the choice of coding constraints when source statistics are imperfectly known.

minor comments (3)
  1. [Abstract] Abstract: the phrase 'the cost of mismatch in one-to-one codes is no larger than that of the prefix-free codes, i.e., D(μ‖ν)' should be rephrased for precision; the subsequent sentence already states the stricter claim that the cost is ≤ D(μ‖ν) and vanishes under the tilted-family condition.
  2. [Introduction / §2] The manuscript should state the alphabet cardinality and moment conditions (finite support, positivity of μ and ν) under which the normalizing constants of the tilted and exponential families exist; while standard for finite alphabets, an explicit sentence would remove any ambiguity about the domain of the LDP.
  3. [§3] Notation for the tilted family ν_θ ∝ ν^θ (or the equivalent power tilt of μ) should be introduced once, with a short display equation, before its repeated use in the LDP and coding sections.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation of minor revision. The summary accurately captures the core contributions on characterizing mismatched guesswork via tilted/exponential families, the associated LDP, and the application to one-to-one source coding.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation relies on standard characterizations of guesswork via tilted/exponential families and large-deviation principles drawn from prior literature (not self-citations). The mismatch cost bound D(μ‖ν) for one-to-one codes and the vanishing condition are obtained by applying these established tools to the rank distribution induced by ν; no step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation chain. The paper is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract invokes the existence of tilted and exponential families as the mechanism for characterization; these are standard constructions in information theory but function here as the key modeling step that enables the LDP and coding bounds.

axioms (1)
  • domain assumption Existence and usability of tilted/exponential families for arbitrary distributions μ and ν to characterize guesswork and guessing probability
    Stated in the first two sentences of the abstract as the basis for all subsequent claims.

pith-pipeline@v0.9.0 · 5758 in / 1297 out tokens · 28909 ms · 2026-05-25T12:16:08.108047+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Theoretical Limits of Language Model Alignment

    cs.LG 2026-05 unverdicted novelty 7.0

    The maximum reward gain under KL-regularized LM alignment is a Jeffreys divergence term, estimable as covariance from base samples, with best-of-N approaching the theoretical limit.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Guessing and entropy,

    J. L. Massey, “Guessing and entropy,” in IEEE ISIT, 1994, p. 204

  2. [2]

    An inequality on guessing and its application to sequential decoding,

    E. Arikan, “An inequality on guessing and its application to sequential decoding,” IEEE Trans. on Inf. Theory , vol. 42, no. 1, pp. 99–105, 1996

  3. [3]

    Multi-user guesswork and brute force security,

    M. M. Christiansen, K. R. Duffy, F. du Pin Calmon, and M. M ´edard, “Multi-user guesswork and brute force security,” IEEE Trans. Inf. Theory , vol. 61, no. 12, pp. 6876–6886, 2015

  4. [4]

    Quantifying computational security subject to source constraints, guesswork and inscrutability,

    A. Beirami, R. Calderbank, K. Duffy, and M. M ´edard, “Quantifying computational security subject to source constraints, guesswork and inscrutability,” in IEEE ISIT, 2015

  5. [5]

    List decoding-random coding exponents and expurgated exponents,

    N. Merhav, “List decoding-random coding exponents and expurgated exponents,” IEEE Trans. on Inf. Theory, vol. 60, no. 11, pp. 6749–6759, Nov. 2014

  6. [6]

    Erasure/list random coding error exponents are not universally achievable,

    W. Huleihel, N. Weinberger, and N. Merhav, “Erasure/list random coding error exponents are not universally achievable,” IEEE Trans. on Inf. Theory , 2014

  7. [7]

    Asymptotics and non-asymptotics for universal fixed-to-variable source coding,

    O. Kosut and L. Sankar, “Asymptotics and non-asymptotics for universal fixed-to-variable source coding,” IIEEE Trans. Inf. Theory , 2017

  8. [8]

    Guessing subject to distortion,

    E. Arikan and N. Merhav, “Guessing subject to distortion,” IEEE Trans. on Inf. Theory , vol. 44, no. 3, pp. 1041–1056, May 1998

  9. [9]

    Guessing under source uncertainty,

    R. Sundaresan, “Guessing under source uncertainty,” IEEE Trans. on Inf. Theory , vol. 53, no. 1, pp. 525–526, Jan. 2007

  10. [10]

    Guessing revisited: A large deviations approach,

    M. K. Hanawal and R. Sundaresan, “Guessing revisited: A large deviations approach,” IEEE Trans. on Inf. Theory , vol. 57, no. 1, pp. 70–78, Jan. 2011

  11. [11]

    Universal Randomized Guessing with Application to Asynchronous Decentralized Brute-Force Attacks

    N. Merhav and A. Cohen, “Universal randomized guessing with application to asynchronous decentralized brute-force attacks,” arXiv preprint arXiv:1811.04363, 2018

  12. [12]

    Guesswork, large deviations, and Shannon entropy,

    M. M. Christiansen and K. R. Duffy, “Guesswork, large deviations, and Shannon entropy,” IEEE Trans. on Inf. Theory , vol. 59, no. 2, pp. 796–802, Feb. 2013

  13. [13]

    A characterization of guesswork on swiftly tilting curves,

    A. Beirami, R. Calderbank, M. Christiansen, K. Duffy, and M. M ´edard, “A characterization of guesswork on swiftly tilting curves,” IEEE Trans. Inf. Theory, 2019

  14. [14]

    Centralized vs decentralized multi-agent guesswork,

    S. Salamatian, A. Beirami, A. Cohen, and M. M ´edard, “Centralized vs decentralized multi-agent guesswork,” in IEEE ISIT, 2017

  15. [15]

    Guesswork subject to a total entropy budget,

    A. Rezaee, A. Beirami, A. Makhdoumi, M. M ´edard, and K. Duffy, “Guesswork subject to a total entropy budget,” in Allerton Conf. on Comm., Control, and Computing. IEEE, 2017, pp. 1008–1015

  16. [16]

    Why botnets work: distributed brute-force attacks need no synchronization,

    S. Salamatian, W. Huleihel, A. Beirami, A. Cohen, and M. M ´edard, “Why botnets work: distributed brute-force attacks need no synchronization,” IEEE Trans. on Inform. Forensics and Security , 2019

  17. [17]

    A geometric perspective on guesswork,

    A. Beirami, R. Calderbank, M. Christiansen, K. Duffy, A. Makhdoumi, and M. M ´edard, “A geometric perspective on guesswork,” in Allerton Conf. on Comm., Control, and Computing , 2015

  18. [18]

    A one-to-one code and its anti-redundancy,

    W. Szpankowski, “A one-to-one code and its anti-redundancy,” IEEE Trans. on Inf. Theory , vol. 54, no. 10, pp. 4762–4766, 2008

  19. [19]

    Fundamental limits of universal lossless one-to-one compression of parametric sources,

    A. Beirami and F. Fekri, “Fundamental limits of universal lossless one-to-one compression of parametric sources,” in IEEE ITW, 2014

  20. [20]

    Information theory and statistics: A tutorial,

    I. Csisz ´ar, P. C. Shields et al. , “Information theory and statistics: A tutorial,” Foundations and Trends R⃝ in Comm. and Inf. Theory , vol. 1, no. 4, pp. 417–528, 2004

  21. [21]

    Dembo and O

    A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications . Springer, 1998

  22. [22]

    Boyd and L

    S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004

  23. [23]

    Cumulant generating function of codeword lengths in optimal lossless compression

    T. A. Courtade and S. Verd ´u, “Cumulant generating function of codeword lengths in optimal lossless compression.” in ISIT, 2014

  24. [24]

    I-projection and the geometry of error exponents,

    S. Borade and L. Zheng, “I-projection and the geometry of error exponents,” in Allerton Conf. on Comm., Control, and Computing , 2006