Interestingness as an Inductive Heuristic for Future Compression Progress

J\"urgen Schmidhuber; Vincent Herrmann

arxiv: 2605.14831 · v1 · pith:BC4ZWFPXnew · submitted 2026-05-14 · 💻 cs.AI · cs.LG

Interestingness as an Inductive Heuristic for Future Compression Progress

Vincent Herrmann , J\"urgen Schmidhuber This is my paper

Pith reviewed 2026-06-30 20:35 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords interestingnesscompression progressKolmogorov complexityalgorithmic priorinductive heuristicself-improving systemsfuture discovery

0 comments

The pith

Expected future progress in compression depends exponentially on how recent the last breakthrough was.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes interestingness as a way to guess which tasks will lead to future gains in compression ability. It uses ideas from Kolmogorov complexity to show that if a system has just made a breakthrough, the expected next one comes much sooner than if the last one was long ago. This matters for building systems that can keep improving themselves by picking the right problems to solve. The work compares different ways of measuring complexity and finds that one prior gives much more optimistic forecasts than another.

Core claim

Interestingness, defined as the capacity for past progress to signal future discovery, is theoretically viable when analyzing complexity-runtime profiles under Length, Algorithmic, and Speed priors. Expected future progress depends exponentially on the recency of the last observed breakthrough, and the Algorithmic Prior yields a quadratic increase in expected discovery compared to the Length Prior for the same observed profile. This holds across three diverse universal computational paradigms.

What carries the argument

Complexity-runtime profiles analyzed under Length, Algorithmic, and Speed priors from Kolmogorov Complexity and Algorithmic Statistics, which serve as inductive heuristics for future compression progress.

If this is right

Past breakthroughs signal exponentially higher expected future progress.
The Algorithmic Prior is significantly more optimistic, predicting a quadratic increase in discoveries.
Interestingness is predictable and empirically supported in multiple computational paradigms.
Self-improving systems can use this heuristic to identify promising tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the priors match real discovery, AI agents could prioritize tasks based on recent progress patterns.
This suggests a way to measure and exploit inductive bias in learning systems beyond the paper's scope.
Extensions could test the model in real-world scientific discovery settings.

Load-bearing premise

The chosen Length, Algorithmic, and Speed priors correctly capture the inductive structure of interestingness across universal computational paradigms.

What would settle it

Measure actual future progress rates after breakthroughs of varying recency in a computational system and check if they follow the predicted exponential dependence on recency.

Figures

Figures reproduced from arXiv: 2605.14831 by J\"urgen Schmidhuber, Vincent Herrmann.

**Figure 2.** Figure 2: Profiles and their relationship. (Top) The complexity vs. runtime profile Dx of a string x, starting with minimum complexity nx for fast programs and dropping to kx with sufficient runtime. The fastest runtime of a program with length kx that computes x is BB(mx), where BB is the busybeaver function. (Bottom) The log-size vs. complexity profile Px of the same string. Both profiles are—up to logarithmi… view at source ↗

**Figure 3.** Figure 3: The partial profile Pˆ is defined up to observation cutoff t. It has the characteristic values nˆ, ˆk, and mˆ . For a string x with the initial partial profile Pˆ, the expected continuation Px has its last drop at complexity mx, and a singleton complexity of kx. Theory and experiments show that the stagnation length t − mˆ determines the expected compression progress ˆk − kx. observed drop lands at the poi… view at source ↗

**Figure 4.** Figure 4: The expected difference between the complexity of the last drop [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: 2-Tag Systems. The average step of the final compression progress (left), and the av [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Rule 110. Analogous to Figure 5. Due to there not being any programs with very short [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: BF. Analogous to Figures 5 and 6. In all three figures, results are plotted with the standard [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Continuation of profile Pˆ with no further drop (blue), and all possible continuations with m ≥ t (green). A partial profile Pˆ characterizes sets containing a certain x, up to—but not including—complexity t. It is described by the values nˆ := min{r | (0, r) ∈ Pˆ}, ˆk := min{r + s | (s, r) ∈ Pˆ}, and mˆ := min{r | (r, ˆk − r) ∈ Pˆ}, with 0 < m < t ˆ ≤ ˆk < nˆ. We want to quantify how many strings there a… view at source ↗

**Figure 9.** Figure 9: Expected continuation of profile Pˆ under the Length Prior, assuming the existence of a further drop. Given Pˆ and m ≥ t, the number of strings with a profile where the last drop occurs at a specific complexity m∗ is k Xˆ−1 m=t k Xˆ−1 k=m 1[m = m∗ ]2k−m+O(log ˆn) = k Xˆ−1 k=m∗ 2 k−m∗+O(log ˆn) = 2kˆ−m∗+O(log ˆn) . (13) The probability of this drop position is hence pL(m = m∗ | P , mˆ ≥ t) = 2 kˆ−m∗+O(log… view at source ↗

**Figure 10.** Figure 10: Continuation of profile Pˆ. In blue, continuation with no further drop. In green, continuations with their last drop within ∆c after t. In red, continuations that have their last drop after the interval, and cannot have any drop within the interval. In yellow, continuations that have their last drop after the interval, but may have another drop within the interval. We do not only care about the total futu… view at source ↗

**Figure 11.** Figure 11: Expected continuation of profile Pˆ under the Algorithmic Prior, assuming the existence of a further drop. Given Pˆ and m ≥ t, the number of strings with a profile where the last drop occurs at a specific complexity m∗ , weighted by the Algorithmic Prior, is: k Xˆ−1 m=t k Xˆ−1 k=m 1[m = m∗ ]2k−m+O(log ˆn) 2 −k = k Xˆ−1 k=m∗ 2 −m∗+O(log ˆn) = 2−m∗+O(log ˆn) . (44) The probability of this drop position is … view at source ↗

**Figure 12.** Figure 12: Expected continuation of profile Pˆ under the Speed Prior, assuming the existence of a further drop. Given Pˆ and m ≥ t, we evaluate the weighted number of strings where the last drop occurs at complexity m∗ : k Xˆ−1 m=t k Xˆ−1 k=m 1[m = m∗ ]2k−m+O(log ˆn) 2 −k−log BB(m) = k Xˆ−1 k=m∗ 2 −m∗−log BB(m∗ )+O(log ˆn) = 2−m∗−log BB(m∗ )+O(log ˆn) . (78) The probability of this drop position, using Equation … view at source ↗

read the original abstract

One of the bottlenecks on the way towards recursively self-improving systems is the challenge of interestingness: the ability to prospectively identify which tasks or data hold the potential for future progress. We formalize interestingness as an inductive heuristic for future compression progress and investigate its predictability using tools from Kolmogorov Complexity and Algorithmic Statistics. By analyzing complexity-runtime profiles under Length, Algorithmic, and Speed priors, we demonstrate that the inductive property of interestingness -- the capacity for past progress to signal future discovery -- is theoretically viable and empirically supported. We prove that expected future progress depends exponentially on the recency of the last observed breakthrough. Furthermore, we show that the Algorithmic Prior is significantly more optimistic than the Length Prior, yielding a quadratic increase in expected discovery for the same observed profile. These findings are experimentally confirmed across three diverse universal computational paradigms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives an exponential recency effect and quadratic prior advantage for interestingness as compression progress, but both results are direct consequences of the chosen priors rather than independent findings.

read the letter

The core result is that expected future compression progress scales exponentially with the recency of the last observed breakthrough, and that the Algorithmic Prior produces a quadratic increase in expected discovery compared to the Length Prior for the same profile. These are presented as proofs backed by checks in three paradigms.

The work extends the authors' earlier framing of artificial curiosity and powerplay by adding explicit derivations for the exponential dependence and the prior comparison. It also supplies a clean setup that separates Length, Algorithmic, and Speed priors when analyzing complexity-runtime profiles. That formal structure is useful for anyone trying to make task selection precise in open-ended systems.

The main limitation is that the claimed functional forms follow immediately once the priors are fixed and applied to the observed profiles. The paper treats the priors as correctly capturing the inductive structure of interestingness, yet offers no separate argument or sensitivity check showing that other reasonable priors would preserve the exponential or quadratic shape. Without that, the results read as properties of the modeling assumptions rather than general statements about discovery dynamics.

The circularity point also lands: the inductive heuristic is defined in terms of the same priors used to generate the progress predictions, so the signal is not independently tested against external data.

This paper is for researchers working on formal models of curiosity and self-improvement who already know the Kolmogorov-complexity literature. It is worth sending to peer review because the topic addresses a real bottleneck and the derivations are stated clearly enough to be examined, even though the prior assumptions will need direct scrutiny.

Referee Report

3 major / 1 minor

Summary. The manuscript formalizes interestingness as an inductive heuristic for future compression progress using tools from Kolmogorov complexity and algorithmic statistics. It analyzes complexity-runtime profiles under Length, Algorithmic, and Speed priors, claiming to prove that expected future progress depends exponentially on the recency of the last observed breakthrough and that the Algorithmic Prior produces a quadratic increase in expected discovery relative to the Length Prior for the same profile. These theoretical results are stated to be empirically confirmed across three universal computational paradigms.

Significance. If the alignment between the chosen priors and actual discovery dynamics can be independently established, the work would supply a formal, quantifiable basis for interestingness in the context of recursively self-improving systems, with the exponential recency dependence offering a concrete prediction about the temporal structure of progress. The comparison of priors and the cross-paradigm empirical component would strengthen the case for using algorithmic statistics to guide task selection, though the absence of sensitivity checks currently limits the generality of the claimed functional forms.

major comments (3)

[Abstract] Abstract: The claimed proof that expected future progress depends exponentially on recency is obtained by re-applying the Length/Algorithmic/Speed priors that were used to generate the observed complexity-runtime profiles; because the inductive property is defined in terms of these same priors, the exponential form is a direct algebraic consequence of the modeling choice rather than an independent empirical signal. This is load-bearing for the central claim that interestingness is 'theoretically viable' as a heuristic.
[Abstract] Abstract: The assertion that the Algorithmic Prior yields a quadratic increase in expected discovery over the Length Prior is likewise derived solely from the specific definitions of these priors applied to the same profiles, with no sensitivity analysis or alternative prior constructions (e.g., different universal machines or runtime weightings) provided to test whether the quadratic gain persists. This directly supports the strongest claim in the manuscript and requires independent justification.
[Abstract] Abstract: The statement of 'experimental confirmation across three diverse universal computational paradigms' is presented without details on paradigm selection criteria, data-generation procedures, or controls for post-hoc fitting; given that the central functional forms depend on the priors, the empirical section must demonstrate that the observed profiles were not chosen to match the model predictions.

minor comments (1)

[Abstract] The abstract refers to 'three diverse universal computational paradigms' without naming them; specifying the paradigms (e.g., Turing machines, lambda calculus, cellular automata) would improve immediate readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive report and the recommendation for major revision. We address each of the three major comments point-by-point below, providing clarifications on the theoretical derivations and empirical components while committing to targeted revisions where they strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract] The claimed proof that expected future progress depends exponentially on recency is obtained by re-applying the Length/Algorithmic/Speed priors that were used to generate the observed complexity-runtime profiles; because the inductive property is defined in terms of these same priors, the exponential form is a direct algebraic consequence of the modeling choice rather than an independent empirical signal. This is load-bearing for the central claim that interestingness is 'theoretically viable' as a heuristic.

Authors: We agree that the exponential dependence follows directly as an algebraic consequence once the inductive heuristic is formalized using the chosen priors. The manuscript's contribution is to show that this specific functional form (exponential decay with recency) emerges rigorously from the standard definitions in algorithmic information theory rather than being postulated ad hoc. This establishes theoretical viability within the framework. We will revise the abstract and introduction to explicitly state that the result is derived within the model and to clarify the distinction between the modeling assumptions and the derived prediction. revision: yes
Referee: [Abstract] The assertion that the Algorithmic Prior yields a quadratic increase in expected discovery over the Length Prior is likewise derived solely from the specific definitions of these priors applied to the same profiles, with no sensitivity analysis or alternative prior constructions (e.g., different universal machines or runtime weightings) provided to test whether the quadratic gain persists. This directly supports the strongest claim in the manuscript and requires independent justification.

Authors: The quadratic relationship is obtained by explicit comparison of the expected values under the two priors on identical observed profiles, as stated in the manuscript. This comparison is exact for the standard definitions employed. We did not conduct sensitivity checks with alternative universal machines or weightings in the present work. We will add a limitations paragraph acknowledging that the quadratic gain is shown for the canonical priors and that generality under other constructions remains open for future investigation. revision: partial
Referee: [Abstract] The statement of 'experimental confirmation across three diverse universal computational paradigms' is presented without details on paradigm selection criteria, data-generation procedures, or controls for post-hoc fitting; given that the central functional forms depend on the priors, the empirical section must demonstrate that the observed profiles were not chosen to match the model predictions.

Authors: The full manuscript contains a dedicated experimental section (Section 4) that specifies the three paradigms (universal Turing machines, lambda calculus, and a register machine model), the independent generation of complexity-runtime profiles via exhaustive enumeration up to bounded size, and explicit controls (pre-specification of profile generation before prior application and cross-validation across paradigms) to guard against post-hoc fitting. The abstract summarizes these results. We will revise the abstract to briefly name the paradigms and direct readers to the experimental section for the full selection criteria and procedures. revision: yes

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the assumption that the three named priors (Length, Algorithmic, Speed) are the correct inductive models for discovery; no free parameters are explicitly fitted in the abstract, but the choice of which prior to apply functions as an untested modeling decision. No new entities are introduced.

axioms (2)

domain assumption Kolmogorov complexity and algorithmic probability provide the right formalization of inductive interestingness
Invoked throughout the abstract as the basis for analyzing profiles and proving the exponential and quadratic relations.
domain assumption The three universal computational paradigms are representative of general discovery processes
Used to claim empirical support; the abstract does not justify why these three suffice.

pith-pipeline@v0.9.1-grok · 5668 in / 1479 out tokens · 21773 ms · 2026-06-30T20:35:14.092400+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 9 canonical work pages · 1 internal anchor

[1]

Language Modeling Is Compression

Del´etang, G., Ruoss, A., Duquenne, P.-A., Catt, E., Genewein, T., Mattern, C., Grau-Moya, J., Wenliang, L. K., Aitchison, M., Orseau, L., et al. Language modeling is compression.arXiv preprint arXiv:2309.10668,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Omni-epic: Open-endedness via models of human notions of interestingness with environments programmed in code.arXiv preprint arXiv:2405.15568,

Faldor, M., Zhang, J., Cully, A., and Clune, J. Omni-epic: Open-endedness via models of human notions of interestingness with environments programmed in code.arXiv preprint arXiv:2405.15568,

work page arXiv
[3]

Z., and Wilson, A

Finzi, M., Qiu, S., Jiang, Y ., Izmailov, P., Kolter, J. Z., and Wilson, A. G. From entropy to epiplexity: Rethinking information for computationally bounded intelligence.arXiv preprint arXiv:2601.03220,

work page arXiv
[4]

Multiple token divergence: Measuring and steering in-context computation density.arXiv preprint arXiv:2512.22944, 2025a

Herrmann, V ., Alcaide, E., Wand, M., and Schmidhuber, J. Multiple token divergence: Measuring and steering in-context computation density.arXiv preprint arXiv:2512.22944, 2025a. Herrmann, V ., Csord´as, R., and Schmidhuber, J. Measuring in-context computation complexity via hidden state prediction. InProceedings of the 42nd International Conference on Ma...

work page arXiv
[5]

and Hoel, E

Jansma, A. and Hoel, E. Engineering emergence.arXiv preprint arXiv:2510.02649,

work page arXiv
[6]

H., Houghton, B., Sampedro, R., Zhokhov, P., Baker, B., Ecoffet, A., Tang, J., et al

Kanitscheider, I., Huizinga, J., Farhi, D., Guss, W. H., Houghton, B., Sampedro, R., Zhokhov, P., Baker, B., Ecoffet, A., Tang, J., et al. Multi-task curriculum learning in a complex, visual, hard- exploration domain: Minecraft.arXiv preprint arXiv:2106.14876,

work page arXiv
[7]

A., Kobayashi, S., Richards, B., Lajoie, G., Steger, A., Hutter, M., Manyika, J., et al

Meulemans, A., Nasser, R., Wołczyk, M., Weis, M. A., Kobayashi, S., Richards, B., Lajoie, G., Steger, A., Hutter, M., Manyika, J., et al. Embedded universal predictive intelligence: a coherent framework for multi-agent learning.arXiv preprint arXiv:2511.22226,

work page arXiv
[8]

Snowbird’98, Utah, 1998; see also?

ftp://ftp.idsia.ch/pub/juergen/interest.ps.gz; extended abstract in Proc. Snowbird’98, Utah, 1998; see also?. Schmidhuber, J. Artificial curiosity based on discovering novel algorithmic predictability through coevolution. InProceedings of the 1999 congress on evolutionary computation-cec99 (cat. no. 99th8406), volume 3, pp. 1612–1618. IEEE,

1998
[9]

Formal theory of creativity, fun, and intrinsic motivation (1990-2010).IEEE Transactions on Autonomous Mental Development, 2(3):230–247,

Schmidhuber, J. Formal theory of creativity, fun, and intrinsic motivation (1990-2010).IEEE Transactions on Autonomous Mental Development, 2(3):230–247,

1990
[10]

Formal theory of creativity, fun, and intrinsic motivation (1990-2010)

ISSN 1943-0604. doi: 10.1109/TAMD.2010.2056368. 18 Schmidhuber, J. On learning to think: Algorithmic information theory for novel combinations of re- inforcement learning controllers and recurrent neural world models.Preprint arXiv:1511.09249,

work page doi:10.1109/tamd.2010.2056368 1943
[11]

and Zaslavsky, N

Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle.2015 IEEE Information Theory Workshop (ITW),

2015
[12]

Stanley, and Jeff Clune

Zhang, J., Lehman, J., Stanley, K. O., and Clune, J. Omni: Open-endedness via models of human notions of interestingness.ArXiv, abs/2306.01711,

work page arXiv
[13]

semanticscholar.org/CorpusID:259064135

URLhttps://api. semanticscholar.org/CorpusID:259064135. 19 A MATHEMATICALPROOFS We start with some preliminaries. From Equations (2) and (3), we learn that the number of strings with a profileO(C(P) + logn P )-close to the given profilePis 2kP −mP +O(logn P ).(7) Here, we setϵto be a small constant value, which also justifies replacingm P (ϵ), as defined ...

2016

[1] [1]

Language Modeling Is Compression

Del´etang, G., Ruoss, A., Duquenne, P.-A., Catt, E., Genewein, T., Mattern, C., Grau-Moya, J., Wenliang, L. K., Aitchison, M., Orseau, L., et al. Language modeling is compression.arXiv preprint arXiv:2309.10668,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Omni-epic: Open-endedness via models of human notions of interestingness with environments programmed in code.arXiv preprint arXiv:2405.15568,

Faldor, M., Zhang, J., Cully, A., and Clune, J. Omni-epic: Open-endedness via models of human notions of interestingness with environments programmed in code.arXiv preprint arXiv:2405.15568,

work page arXiv

[3] [3]

Z., and Wilson, A

Finzi, M., Qiu, S., Jiang, Y ., Izmailov, P., Kolter, J. Z., and Wilson, A. G. From entropy to epiplexity: Rethinking information for computationally bounded intelligence.arXiv preprint arXiv:2601.03220,

work page arXiv

[4] [4]

Multiple token divergence: Measuring and steering in-context computation density.arXiv preprint arXiv:2512.22944, 2025a

Herrmann, V ., Alcaide, E., Wand, M., and Schmidhuber, J. Multiple token divergence: Measuring and steering in-context computation density.arXiv preprint arXiv:2512.22944, 2025a. Herrmann, V ., Csord´as, R., and Schmidhuber, J. Measuring in-context computation complexity via hidden state prediction. InProceedings of the 42nd International Conference on Ma...

work page arXiv

[5] [5]

and Hoel, E

Jansma, A. and Hoel, E. Engineering emergence.arXiv preprint arXiv:2510.02649,

work page arXiv

[6] [6]

H., Houghton, B., Sampedro, R., Zhokhov, P., Baker, B., Ecoffet, A., Tang, J., et al

Kanitscheider, I., Huizinga, J., Farhi, D., Guss, W. H., Houghton, B., Sampedro, R., Zhokhov, P., Baker, B., Ecoffet, A., Tang, J., et al. Multi-task curriculum learning in a complex, visual, hard- exploration domain: Minecraft.arXiv preprint arXiv:2106.14876,

work page arXiv

[7] [7]

A., Kobayashi, S., Richards, B., Lajoie, G., Steger, A., Hutter, M., Manyika, J., et al

Meulemans, A., Nasser, R., Wołczyk, M., Weis, M. A., Kobayashi, S., Richards, B., Lajoie, G., Steger, A., Hutter, M., Manyika, J., et al. Embedded universal predictive intelligence: a coherent framework for multi-agent learning.arXiv preprint arXiv:2511.22226,

work page arXiv

[8] [8]

Snowbird’98, Utah, 1998; see also?

ftp://ftp.idsia.ch/pub/juergen/interest.ps.gz; extended abstract in Proc. Snowbird’98, Utah, 1998; see also?. Schmidhuber, J. Artificial curiosity based on discovering novel algorithmic predictability through coevolution. InProceedings of the 1999 congress on evolutionary computation-cec99 (cat. no. 99th8406), volume 3, pp. 1612–1618. IEEE,

1998

[9] [9]

Formal theory of creativity, fun, and intrinsic motivation (1990-2010).IEEE Transactions on Autonomous Mental Development, 2(3):230–247,

Schmidhuber, J. Formal theory of creativity, fun, and intrinsic motivation (1990-2010).IEEE Transactions on Autonomous Mental Development, 2(3):230–247,

1990

[10] [10]

Formal theory of creativity, fun, and intrinsic motivation (1990-2010)

ISSN 1943-0604. doi: 10.1109/TAMD.2010.2056368. 18 Schmidhuber, J. On learning to think: Algorithmic information theory for novel combinations of re- inforcement learning controllers and recurrent neural world models.Preprint arXiv:1511.09249,

work page doi:10.1109/tamd.2010.2056368 1943

[11] [11]

and Zaslavsky, N

Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle.2015 IEEE Information Theory Workshop (ITW),

2015

[12] [12]

Stanley, and Jeff Clune

Zhang, J., Lehman, J., Stanley, K. O., and Clune, J. Omni: Open-endedness via models of human notions of interestingness.ArXiv, abs/2306.01711,

work page arXiv

[13] [13]

semanticscholar.org/CorpusID:259064135

URLhttps://api. semanticscholar.org/CorpusID:259064135. 19 A MATHEMATICALPROOFS We start with some preliminaries. From Equations (2) and (3), we learn that the number of strings with a profileO(C(P) + logn P )-close to the given profilePis 2kP −mP +O(logn P ).(7) Here, we setϵto be a small constant value, which also justifies replacingm P (ϵ), as defined ...

2016