pith. machine review for the scientific record. sign in

arxiv: 2605.00579 · v1 · submitted 2026-05-01 · 💻 cs.IT · math.IT

Recognition: unknown

Fast and Exact: Asymptotically Linear KL-Optimal Frequency Normalization

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:02 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords frequency normalizationKL divergenceasymmetric numeral systemsrange codingentropy codingoptimal algorithmslinear timeinteger frequencies
0
0 comments X

The pith

Three algorithms achieve provably exact KL-optimal frequency normalization for range coders and ANS, with one running in linear time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that replacing real probabilities with integer frequencies in range coding and ANS produces a per-symbol redundancy exactly equal to the KL divergence between the empirical distribution and the quantized one. It supplies three algorithms that reach the global minimum of this divergence rather than relying on heuristics or marginal optimality: a bottom-up construction, a bidirectional repair applied to Bloom's heap method, and a top-down window procedure whose running time is linear in the number of symbols that appear. A reader cares because any reduction in this KL term directly lowers the average code length without changing the coder itself. Prior normalizers left some redundancy on the table; these methods remove it exactly while remaining fast enough for practical alphabets.

Core claim

We give three provably KL-optimal algorithms: a bottom-up archetype, a bidirectional exchange repair of Bloom's heap correction, and a top-down window method that runs in O(r), asymptotically optimal in r, where r is the number of positive-count symbols.

What carries the argument

The top-down window method, which selects a contiguous set of integer frequencies summing to M that minimizes the KL divergence from the empirical counts.

If this is right

  • The redundancy term in ANS and range coding can be driven to its information-theoretic minimum for any given M.
  • Normalization cost becomes linear in alphabet size, removing the previous quadratic or super-linear barriers for large symbol sets.
  • Any existing implementation using Bloom or Collet normalization can be swapped for one of the new methods to obtain strictly lower or equal average code length.
  • Dynamic frequency updates can incorporate the bidirectional exchange repair to restore optimality after count increments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practical compression libraries could see small but consistent gains on text and structured data once the linear-time method replaces current heuristics.
  • The same optimality criterion might be applied to other integer quantization tasks that arise in arithmetic coding variants.
  • The existence of an asymptotically optimal algorithm suggests that frequency normalization need no longer be treated as an approximate preprocessing step.

Load-bearing premise

That the per-symbol redundancy equals exactly the KL divergence between empirical and quantized distributions, and that the three algorithms reach the true global minimum rather than a local or marginal one.

What would settle it

Running the top-down window algorithm on a small alphabet with known optimal frequencies obtained by exhaustive search and checking whether the achieved KL value matches the minimum.

Figures

Figures reproduced from arXiv: 2605.00579 by Kamila Szewczyk.

Figure 1
Figure 1. Figure 1: Per-symbol cycle cost versus alphabet size view at source ↗
Figure 2
Figure 2. Figure 2: Per-symbol cost of the smart and super variants at the largest tested view at source ↗
read the original abstract

Range coders and ANS replace empirical probabilities with integer frequencies summing to a fixed $M$; the resulting per-symbol code-length redundancy is exactly the KL divergence of the empirical distribution from the quantized one. Existing normalizers (Giesen, Bloom, Collet) are heuristic or only partially marginal-optimal. We give three provably KL-optimal algorithms: a bottom-up archetype, a bidirectional exchange repair of Bloom's heap correction, and a top-down window method that runs in $\mathcal{O}(r)$, asymptotically optimal in $r$, where $r$ is the number of positive-count symbols.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript claims to solve the frequency normalization problem for range coders and ANS by replacing empirical probabilities with integer frequencies summing to a fixed M, where the per-symbol redundancy equals the KL divergence. It presents three provably KL-optimal algorithms: a bottom-up dynamic programming archetype with O(r M^2) time, a bidirectional exchange repair of Bloom's heap correction that terminates at the global minimum due to convexity, and a top-down window method that runs in O(r) time by exploiting contiguity in the sorted order and a linear-time decision rule from the Lagrangian optimality condition.

Significance. If the results hold, the work is significant for providing the first set of exact, provably optimal normalizers that improve upon existing heuristics (Giesen, Bloom, Collet). The O(r) top-down algorithm is asymptotically optimal in the number of positive-count symbols and practical for large alphabets. Explicit credit is due for the machine-checkable-style proofs of global optimality, the fixed-point characterization of the exchange procedure, and the convexity argument in log-frequency variables, all of which strengthen the central claim.

minor comments (2)
  1. [Abstract] Abstract: the statement that the top-down method is 'asymptotically optimal in r' would be clearer if it explicitly noted the (presumably linear) dependence on M and the alphabet size.
  2. The bidirectional exchange procedure is described as starting from Bloom's marginal solution; a brief comparison table of achieved KL values versus the other two methods on a small benchmark set would help readers assess practical differences.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and thorough review, including the accurate summary of our contributions and the recommendation for minor revision. We appreciate the recognition of the significance of the three provably KL-optimal algorithms, the O(r) top-down method, and the strength of the optimality proofs.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines the objective as the KL divergence between the empirical distribution and the quantized frequency vector (a standard information-theoretic quantity independent of the algorithms). It then supplies explicit proofs that each of the three algorithms computes the exact global minimum: a standard DP recurrence for the bottom-up method, a convexity argument showing that the bidirectional exchange procedure reaches the unique fixed point of the first-order optimality conditions, and a contiguity lemma plus linear-time search for the top-down window method. None of these steps reduce to a fitted parameter renamed as a prediction, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The derivation therefore rests on independent mathematical arguments rather than on its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard information-theoretic assumptions about KL divergence in coding; no free parameters or invented entities are apparent from the abstract.

axioms (1)
  • domain assumption The per-symbol code-length redundancy is exactly the KL divergence of the empirical distribution from the quantized one.
    Stated directly in the abstract as the foundation for defining optimality.

pith-pipeline@v0.9.0 · 5383 in / 1079 out tokens · 39024 ms · 2026-05-09T19:02:13.656027+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 5 canonical work pages

  1. [1]

    BlockQuicksort: How branch mispredictions don’t affect quicksort,

    S. Edelkamp and A. Weiß, “BlockQuicksort: How branch mispredictions don’t affect quicksort,” in24th European Symposium on Algorithms (ESA 2016), LIPIcs, vol. 57, pp. 38:1–38:16, 2016. doi: https://doi.org/10.4230/ LIPIcs.ESA.2016.38

  2. [2]

    Chernoff

    S. Kullback and R. A. Leibler, “On information and sufficiency,”The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951. doi: https://doi.org/10.1214/aoms/1177729694

  3. [3]

    T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley-Interscience, 2006

  4. [4]

    Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding

    J. Duda, “Asymmetric numeral systems: Entropy coding combining speed of Huffman coding with compression rate of arithmetic coding,”arXiv preprint arXiv:1311.2540, 2013. [Online]. Available: https://arxiv.org/abs/ 1311.2540

  5. [5]

    The greedy procedure for resource allocation problems: Necessary and sufficient conditions for optimality,

    A. Federgruen and H. Groenevelt, “The greedy procedure for resource allocation problems: Necessary and sufficient conditions for optimality,” Operations Research, vol. 34, no. 6, pp. 909–918, 1986. doi: https://doi. org/10.1287/opre.34.6.909

  6. [6]

    Lower and upper bounds for the allocation problem and other nonlinear optimization problems,

    D. S. Hochbaum, “Lower and upper bounds for the allocation problem and other nonlinear optimization problems,”Mathematics of Operations Research, vol. 19, no. 2, pp. 390–409, 1994. doi: https://doi.org/10.1287/ moor.19.2.390

  7. [7]

    ryg rans: Simple rANS encoder/decoder,

    F. Giesen, “ryg rans: Simple rANS encoder/decoder,” 2014. [Online]. Available: https://github.com/rygorous/ryg rans

  8. [8]

    CRAM 3.1: Advances in the CRAM file format,

    J. K. Bonfield, “CRAM 3.1: Advances in the CRAM file format,” Bioinformatics, vol. 38, no. 6, pp. 1497–1503, Mar. 2022. doi: https: //doi.org/10.1093/bioinformatics/btac010

  9. [9]

    Understanding ANS - 10 - Optimal normalized counts,

    C. Bloom, “Understanding ANS - 10 - Optimal normalized counts,” cbloomrants, blog post, Feb. 11, 2014. [Online]. Available: https:// cbloomrants.blogspot.com/2014/02/02-11-14-understanding-ans-10.html

  10. [10]

    Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor,

    P. Barrett, “Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor,” inAdvances in Cryptology—CRYPTO ’86, Lecture Notes in Computer Science, vol. 263, A. M. Odlyzko, Ed. Berlin, Heidelberg: Springer, 1987, pp. 311–323. doi: https://doi.org/10.1007/3-540-47721-7 24

  11. [11]

    FiniteStateEntropy: New generation entropy codecs,

    Y . Collet, “FiniteStateEntropy: New generation entropy codecs,” source code repository, 2014–. [Online]. Available: https://github.com/Cyan4973/ FiniteStateEntropy

  12. [12]

    Perfect normalization,

    Y . Collet, “Perfect normalization,”RealTime Data Compression, blog post, Mar. 7, 2014. [Online]. Available: https://fastcompression.blogspot. com/2014/03/perfect-normalization.html