Recognition: unknown
Investing Is Compression
Pith reviewed 2026-05-10 15:22 UTC · model grok-4.3
The pith
Kelly's investment objective factors into money, entropy, and divergence terms, so that maximizing growth requires minimizing divergence from the true distribution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Kelly's objective, even in the general form, factors the investing problem into three terms: a money term, an entropy term, and a divergence term. The only way to maximize growth is to minimize divergence which measures the difference between our distribution and the true distribution in bits. Investing is, fundamentally, a compression problem. This decomposition also yields new practical results including a winner fraction heuristic whose growth shortfall relative to the optimal portfolio is bounded by the entropy of the winner fraction distribution.
What carries the argument
The factorization of Kelly's growth objective into a money term, an entropy term, and a divergence term. The divergence term alone varies with the investor's chosen distribution, so growth maximization reduces directly to divergence minimization.
Load-bearing premise
That the universal-portfolio trick applies directly to the general Kelly objective without additional restrictions and that the money and entropy terms remain strictly constant across all candidate strategies in any given backtest.
What would settle it
Calculate realized log-growth rates for two different constant-rebalanced portfolios over the same return sequence and verify whether their difference exactly equals the difference in their divergence from the empirical outcome distribution; any consistent mismatch would falsify the claimed factorization.
read the original abstract
In 1956 John Kelly wrote a paper at Bell Labs describing the relationship between gambling and Information Theory. What came to be known as the Kelly Criterion is both an objective and a closed-form solution to sizing wagers when odds and edge are known. Samuelson argued it was arbitrary and subjective, and successfully kept it out of mainstream economics. Luckily it lived on in computer science, mostly because of Tom Cover's work at Stanford. He showed that it is the uniquely optimal way to invest: it maximizes long-term wealth, minimizes the risk of ruin, and is competitively optimal in a game-theoretic sense, even over the short term. One of Cover's most surprising contributions to portfolio theory was the universal portfolio. Related to universal compression in information theory, it performs asymptotically as well as the best constant-rebalanced portfolio in hindsight. I borrow a trick from that algorithm to show that Kelly's objective, even in the general form, factors the investing problem into three terms: a money term, an entropy term, and a divergence term. The only way to maximize growth is to minimize divergence which measures the difference between our distribution and the true distribution in bits. Investing is, fundamentally, a compression problem. This decomposition also yields new practical results. Because the money and entropy terms are constant across strategies in a given backtest, the difference in log growth between two strategies measures their relative divergence in bits. I also introduce a winner fraction heuristic which allocates capital in proportion to each asset's probability of dominating the candidate set. The growth shortfall of this heuristic relative to the optimal portfolio is bounded by the entropy of the winner fraction distribution. To my knowledge, both the heuristic and the entropy bound are original contributions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that adapting a technique from Cover's universal portfolios allows the general Kelly objective (maximizing expected log-wealth) to factor into three terms—a money term, an entropy term, and a KL-divergence term—such that only the divergence depends on the chosen betting distribution b. This implies optimal investing reduces to minimizing divergence from the true distribution, framing the problem as compression. It further introduces a winner-fraction heuristic (allocating in proportion to each asset's probability of dominating the candidate set) whose growth shortfall relative to the optimum is bounded by the entropy of the winner-fraction distribution.
Significance. If the factorization is shown to hold with the claimed independence of the first two terms, the work supplies a clean information-theoretic lens on portfolio choice and a concrete way to compare any two strategies by their relative divergence in bits. The heuristic plus entropy bound, if rigorously derived, would constitute an original practical contribution in computational finance.
major comments (2)
- [Factorization derivation (main body, around the borrowed Cover trick)] The central claim that the money and entropy terms remain strictly constant across candidate strategies (and thus that growth maximization reduces exactly to divergence minimization) depends on the universal-portfolio trick applying without further restrictions to the general Kelly objective. In non-i.i.d. markets, state-dependent payoffs, or when portfolio choice affects the outcome distribution, the money term (payoff-weighted expectation) can acquire dependence on b, so the reduction does not follow. Please supply the explicit derivation steps (with all assumptions stated) that establish independence of the first two terms.
- [Winner-fraction heuristic and entropy bound] The entropy bound on the growth shortfall of the winner-fraction heuristic is asserted without explicit proof steps, numerical validation, or counter-example checks. Because this bound is presented as an original practical result and is load-bearing for the heuristic's utility, the derivation (including the definition of the winner-fraction distribution) must be given in full.
minor comments (2)
- [Abstract] The abstract states both the factorization and the bound without derivation or validation; moving a concise proof sketch or reference to the relevant theorem into the abstract would improve readability.
- [Notation and definitions] Clarify early (e.g., in the notation or preliminaries section) whether the winner-fraction distribution is defined with respect to a fixed backtest or is itself estimated from data, as this affects whether the heuristic introduces additional fitting.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address each major point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Factorization derivation (main body, around the borrowed Cover trick)] The central claim that the money and entropy terms remain strictly constant across candidate strategies (and thus that growth maximization reduces exactly to divergence minimization) depends on the universal-portfolio trick applying without further restrictions to the general Kelly objective. In non-i.i.d. markets, state-dependent payoffs, or when portfolio choice affects the outcome distribution, the money term (payoff-weighted expectation) can acquire dependence on b, so the reduction does not follow. Please supply the explicit derivation steps (with all assumptions stated) that establish independence of the first two terms.
Authors: We agree that the assumptions require explicit statement and that the claim does not hold without them. The borrowed Cover trick is applied to the expected log-growth objective under the standard Kelly assumptions of exogenous asset returns (price-taker setting) and i.i.d. periods. Under these conditions the money term equals the expected payoff under the true distribution and the entropy term is the entropy of that distribution; both are independent of the chosen b. The revised manuscript will contain the full derivation with these assumptions listed at the outset, plus an explicit caveat that the factorization fails if b influences the outcome distribution (e.g., illiquid or state-dependent markets). revision: yes
-
Referee: [Winner-fraction heuristic and entropy bound] The entropy bound on the growth shortfall of the winner-fraction heuristic is asserted without explicit proof steps, numerical validation, or counter-example checks. Because this bound is presented as an original practical result and is load-bearing for the heuristic's utility, the derivation (including the definition of the winner-fraction distribution) must be given in full.
Authors: We accept that the original text presented the bound without sufficient detail. The winner-fraction distribution is the probability vector whose i-th component is the measure of scenarios in which asset i strictly outperforms all other assets in the candidate set. The growth shortfall relative to the Kelly optimum is then bounded above by the Shannon entropy of this distribution via the concavity of the logarithm and a direct application of Jensen's inequality to the expected log-wealth difference. The revised manuscript will include the complete proof (with all steps) in a new appendix, together with numerical checks on historical equity data and a short discussion of when the bound is tight or loose. revision: yes
Circularity Check
No circularity: decomposition is a direct algebraic identity from information theory applied to Kelly growth, with external Cover reference.
full rationale
The paper derives the three-term factorization of the Kelly objective by borrowing a standard trick from Cover's universal portfolio algorithm (an external, prior result). This yields growth = money_term + entropy_term - divergence_term, where the first two are independent of the betting distribution b under the usual i.i.d. setup. The claim that only divergence varies with strategy therefore follows immediately from the identity rather than being imposed by definition or by fitting. The winner-fraction heuristic and its entropy bound are presented as new but are not load-bearing for the central compression claim. No self-citation chain, ansatz smuggling, or renaming of known results occurs; the derivation remains self-contained against the cited external benchmark.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Kelly objective and universal portfolio results hold in the general form referenced from prior work
invented entities (1)
-
winner fraction distribution
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Algoet and Thomas M
Paul H. Algoet and Thomas M. Cover,Asymptotic optimality and asymptotic equipartition properties of log-optimum investment, The Annals of Probability16(1988), no. 2, 876–898
1988
-
[2]
Bell and Thomas M
Robert M. Bell and Thomas M. Cover,Competitive optimality of logarithmic investment, Mathematics of Operations Research5(1980), no. 2, 161–166
1980
-
[3]
6, 724–733
,Game-theoretic optimal portfolios, Management Science34(1988), no. 6, 724–733
1988
-
[4]
Specimen Theoriae Novae de Mensura Sortis,
Daniel Bernoulli,Exposition of a new theory on the measurement of risk, Econometrica22 (1954), no. 1, 23–36, Translation by Louise Sommer of “Specimen Theoriae Novae de Mensura Sortis,”Commentarii Academiae Scientiarum Imperialis Petropolitanae, 5:175–192, 1738
1954
-
[5]
Cover,Universal portfolios, Mathematical Finance1(1991), no
Thomas M. Cover,Universal portfolios, Mathematical Finance1(1991), no. 1, 1–29
1991
-
[6]
2, 10–11, Special Golden Jubilee Issue
,Shannon and investment, IEEE Information Theory Society Newsletter48(1998), no. 2, 10–11, Special Golden Jubilee Issue
1998
-
[7]
Cover and Joy A
Thomas M. Cover and Joy A. Thomas,Elements of information theory, 2nd ed., Wiley- Interscience, Hoboken, NJ, 2006
2006
-
[8]
Gr¨ unwald,The minimum description length principle, MIT Press, Cambridge, MA, 2007
Peter D. Gr¨ unwald,The minimum description length principle, MIT Press, Cambridge, MA, 2007
2007
-
[9]
J. L. Kelly, Jr.,A new interpretation of information rate, Bell System Technical Journal35 (1956), no. 4, 917–926
1956
-
[10]
Ming Li and Paul M. B. Vit´ anyi,An introduction to kolmogorov complexity and its applica- tions, 4th ed., Texts in Computer Science, Springer, Cham, 2019
2019
-
[11]
MacLean, Edward O
Leonard C. MacLean, Edward O. Thorp, and William T. Ziemba,The Kelly capital growth investment criterion: Theory and practice, World Scientific, 2011
2011
-
[12]
1, 77–91
Harry Markowitz,Portfolio selection, The Journal of Finance7(1952), no. 1, 77–91. INVESTING IS COMPRESSION 15
1952
-
[13]
Merton and Paul A
Robert C. Merton and Paul A. Samuelson,Fallacy of the log-normal approximation to optimal portfolio decision-making over many periods, Journal of Financial Economics1(1974), no. 1, 67–94
1974
-
[14]
Samuelson,The “fallacy” of maximizing the geometric mean in long sequences of investing or gambling, Proceedings of the National Academy of Sciences68(1971), no
Paul A. Samuelson,The “fallacy” of maximizing the geometric mean in long sequences of investing or gambling, Proceedings of the National Academy of Sciences68(1971), no. 10, 2493–2496
1971
-
[15]
4, 305–307
,Why we should not make mean log of wealth big though years to act are long, Journal of Banking & Finance3(1979), no. 4, 305–307
1979
-
[16]
Oscar Stiffelman,The least wrong model is not in the data, 2014
2014
-
[17]
Nassim Nicholas Taleb,Statistical consequences of fat tails, STEM Academic Press, 2020
2020
-
[18]
Thorp,Beat the dealer: A winning strategy for the game of twenty-one, 2nd ed., Random House, New York, 1966
Edward O. Thorp,Beat the dealer: A winning strategy for the game of twenty-one, 2nd ed., Random House, New York, 1966
1966
-
[19]
,A man for all markets: From Las Vegas to Wall Street, how I beat the dealer and the market, Random House, 2017
2017
-
[20]
Vapnik,Statistical learning theory, Wiley-Interscience, New York, 1998
Vladimir N. Vapnik,Statistical learning theory, Wiley-Interscience, New York, 1998
1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.