arxiv: 2605.00579 · v1 · submitted 2026-05-01 · 💻 cs.IT · math.IT

Recognition: unknown

Fast and Exact: Asymptotically Linear KL-Optimal Frequency Normalization

Kamila Szewczyk

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:02 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords frequency normalizationKL divergenceasymmetric numeral systemsrange codingentropy codingoptimal algorithmslinear timeinteger frequencies

0 comments

The pith

Three algorithms achieve provably exact KL-optimal frequency normalization for range coders and ANS, with one running in linear time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that replacing real probabilities with integer frequencies in range coding and ANS produces a per-symbol redundancy exactly equal to the KL divergence between the empirical distribution and the quantized one. It supplies three algorithms that reach the global minimum of this divergence rather than relying on heuristics or marginal optimality: a bottom-up construction, a bidirectional repair applied to Bloom's heap method, and a top-down window procedure whose running time is linear in the number of symbols that appear. A reader cares because any reduction in this KL term directly lowers the average code length without changing the coder itself. Prior normalizers left some redundancy on the table; these methods remove it exactly while remaining fast enough for practical alphabets.

Core claim

We give three provably KL-optimal algorithms: a bottom-up archetype, a bidirectional exchange repair of Bloom's heap correction, and a top-down window method that runs in O(r), asymptotically optimal in r, where r is the number of positive-count symbols.

What carries the argument

The top-down window method, which selects a contiguous set of integer frequencies summing to M that minimizes the KL divergence from the empirical counts.

If this is right

The redundancy term in ANS and range coding can be driven to its information-theoretic minimum for any given M.
Normalization cost becomes linear in alphabet size, removing the previous quadratic or super-linear barriers for large symbol sets.
Any existing implementation using Bloom or Collet normalization can be swapped for one of the new methods to obtain strictly lower or equal average code length.
Dynamic frequency updates can incorporate the bidirectional exchange repair to restore optimality after count increments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practical compression libraries could see small but consistent gains on text and structured data once the linear-time method replaces current heuristics.
The same optimality criterion might be applied to other integer quantization tasks that arise in arithmetic coding variants.
The existence of an asymptotically optimal algorithm suggests that frequency normalization need no longer be treated as an approximate preprocessing step.

Load-bearing premise

That the per-symbol redundancy equals exactly the KL divergence between empirical and quantized distributions, and that the three algorithms reach the true global minimum rather than a local or marginal one.

What would settle it

Running the top-down window algorithm on a small alphabet with known optimal frequencies obtained by exhaustive search and checking whether the achieved KL value matches the minimum.

Figures

Figures reproduced from arXiv: 2605.00579 by Kamila Szewczyk.

**Figure 1.** Figure 1: Per-symbol cycle cost versus alphabet size view at source ↗

**Figure 2.** Figure 2: Per-symbol cost of the smart and super variants at the largest tested view at source ↗

read the original abstract

Range coders and ANS replace empirical probabilities with integer frequencies summing to a fixed $M$; the resulting per-symbol code-length redundancy is exactly the KL divergence of the empirical distribution from the quantized one. Existing normalizers (Giesen, Bloom, Collet) are heuristic or only partially marginal-optimal. We give three provably KL-optimal algorithms: a bottom-up archetype, a bidirectional exchange repair of Bloom's heap correction, and a top-down window method that runs in $\mathcal{O}(r)$, asymptotically optimal in $r$, where $r$ is the number of positive-count symbols.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives three exact algorithms for KL-optimal frequency normalization, with a linear-time top-down method as the practical advance.

read the letter

The main point is that the paper supplies three algorithms that compute the exact minimum of KL(p || q) for integer frequencies summing to M, rather than relying on the heuristics in Giesen, Bloom or Collet. The bottom-up DP enumerates feasible assignments over prefixes in O(r M^2) time and is straightforward. The bidirectional exchange starts from Bloom's marginal solution and repeatedly swaps adjacent frequencies until no local improvement exists; the argument shows any global optimum must be a fixed point of this process and that convexity of the objective in the log-frequencies guarantees termination at the minimum. The top-down window method sorts symbols by probability, maintains a sliding window, and applies a first-order optimality rule derived from the Lagrangian to decide allocations in linear time; the proof establishes that the optimum is always contiguous in sorted order so the window search finds it without backtracking. That linear bound is asymptotically tight and the paper demonstrates why earlier methods stop short of full optimality. The math checks out on the convexity and contiguity claims, and the positioning against prior work is accurate. The only soft spots are practical: the quadratic DP remains expensive for large M even if the linear method sidesteps it, and edge cases such as ties or symbols with very small counts could use a bit more explicit handling in the write-up, though neither breaks the central result. This is aimed at people implementing range coders or ANS who want to remove the last heuristic approximation from frequency quantization. A reader who needs reproducible, provably optimal normalization will get direct value from the algorithms and the runtime analysis. It deserves a serious referee because the optimality proofs and the concrete O(r) improvement are verifiable and relevant to the subfield.

Referee Report

0 major / 2 minor

Summary. The manuscript claims to solve the frequency normalization problem for range coders and ANS by replacing empirical probabilities with integer frequencies summing to a fixed M, where the per-symbol redundancy equals the KL divergence. It presents three provably KL-optimal algorithms: a bottom-up dynamic programming archetype with O(r M^2) time, a bidirectional exchange repair of Bloom's heap correction that terminates at the global minimum due to convexity, and a top-down window method that runs in O(r) time by exploiting contiguity in the sorted order and a linear-time decision rule from the Lagrangian optimality condition.

Significance. If the results hold, the work is significant for providing the first set of exact, provably optimal normalizers that improve upon existing heuristics (Giesen, Bloom, Collet). The O(r) top-down algorithm is asymptotically optimal in the number of positive-count symbols and practical for large alphabets. Explicit credit is due for the machine-checkable-style proofs of global optimality, the fixed-point characterization of the exchange procedure, and the convexity argument in log-frequency variables, all of which strengthen the central claim.

minor comments (2)

[Abstract] Abstract: the statement that the top-down method is 'asymptotically optimal in r' would be clearer if it explicitly noted the (presumably linear) dependence on M and the alphabet size.
The bidirectional exchange procedure is described as starting from Bloom's marginal solution; a brief comparison table of achieved KL values versus the other two methods on a small benchmark set would help readers assess practical differences.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and thorough review, including the accurate summary of our contributions and the recommendation for minor revision. We appreciate the recognition of the significance of the three provably KL-optimal algorithms, the O(r) top-down method, and the strength of the optimality proofs.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines the objective as the KL divergence between the empirical distribution and the quantized frequency vector (a standard information-theoretic quantity independent of the algorithms). It then supplies explicit proofs that each of the three algorithms computes the exact global minimum: a standard DP recurrence for the bottom-up method, a convexity argument showing that the bidirectional exchange procedure reaches the unique fixed point of the first-order optimality conditions, and a contiguity lemma plus linear-time search for the top-down window method. None of these steps reduce to a fitted parameter renamed as a prediction, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The derivation therefore rests on independent mathematical arguments rather than on its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard information-theoretic assumptions about KL divergence in coding; no free parameters or invented entities are apparent from the abstract.

axioms (1)

domain assumption The per-symbol code-length redundancy is exactly the KL divergence of the empirical distribution from the quantized one.
Stated directly in the abstract as the foundation for defining optimality.

pith-pipeline@v0.9.0 · 5383 in / 1079 out tokens · 39024 ms · 2026-05-09T19:02:13.656027+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 5 canonical work pages

[1]

BlockQuicksort: How branch mispredictions don’t affect quicksort,

S. Edelkamp and A. Weiß, “BlockQuicksort: How branch mispredictions don’t affect quicksort,” in24th European Symposium on Algorithms (ESA 2016), LIPIcs, vol. 57, pp. 38:1–38:16, 2016. doi: https://doi.org/10.4230/ LIPIcs.ESA.2016.38

2016
[2]

Chernoff

S. Kullback and R. A. Leibler, “On information and sufficiency,”The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951. doi: https://doi.org/10.1214/aoms/1177729694

work page doi:10.1214/aoms/1177729694 1951
[3]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley-Interscience, 2006

2006
[4]

Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding

J. Duda, “Asymmetric numeral systems: Entropy coding combining speed of Huffman coding with compression rate of arithmetic coding,”arXiv preprint arXiv:1311.2540, 2013. [Online]. Available: https://arxiv.org/abs/ 1311.2540

work page Pith review arXiv 2013
[5]

The greedy procedure for resource allocation problems: Necessary and sufficient conditions for optimality,

A. Federgruen and H. Groenevelt, “The greedy procedure for resource allocation problems: Necessary and sufficient conditions for optimality,” Operations Research, vol. 34, no. 6, pp. 909–918, 1986. doi: https://doi. org/10.1287/opre.34.6.909

work page doi:10.1287/opre.34.6.909 1986
[6]

Lower and upper bounds for the allocation problem and other nonlinear optimization problems,

D. S. Hochbaum, “Lower and upper bounds for the allocation problem and other nonlinear optimization problems,”Mathematics of Operations Research, vol. 19, no. 2, pp. 390–409, 1994. doi: https://doi.org/10.1287/ moor.19.2.390

1994
[7]

ryg rans: Simple rANS encoder/decoder,

F. Giesen, “ryg rans: Simple rANS encoder/decoder,” 2014. [Online]. Available: https://github.com/rygorous/ryg rans

2014
[8]

CRAM 3.1: Advances in the CRAM file format,

J. K. Bonfield, “CRAM 3.1: Advances in the CRAM file format,” Bioinformatics, vol. 38, no. 6, pp. 1497–1503, Mar. 2022. doi: https: //doi.org/10.1093/bioinformatics/btac010

work page doi:10.1093/bioinformatics/btac010 2022
[9]

Understanding ANS - 10 - Optimal normalized counts,

C. Bloom, “Understanding ANS - 10 - Optimal normalized counts,” cbloomrants, blog post, Feb. 11, 2014. [Online]. Available: https:// cbloomrants.blogspot.com/2014/02/02-11-14-understanding-ans-10.html

2014
[10]

Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor,

P. Barrett, “Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor,” inAdvances in Cryptology—CRYPTO ’86, Lecture Notes in Computer Science, vol. 263, A. M. Odlyzko, Ed. Berlin, Heidelberg: Springer, 1987, pp. 311–323. doi: https://doi.org/10.1007/3-540-47721-7 24

work page doi:10.1007/3-540-47721-7 1987
[11]

FiniteStateEntropy: New generation entropy codecs,

Y . Collet, “FiniteStateEntropy: New generation entropy codecs,” source code repository, 2014–. [Online]. Available: https://github.com/Cyan4973/ FiniteStateEntropy

2014
[12]

Perfect normalization,

Y . Collet, “Perfect normalization,”RealTime Data Compression, blog post, Mar. 7, 2014. [Online]. Available: https://fastcompression.blogspot. com/2014/03/perfect-normalization.html

2014