Emergence of Physical Intelligence via Controllable Information Production

Stas Tiomkin; Tristan Shah

arxiv: 2601.22449 · v2 · submitted 2026-01-30 · 💻 cs.AI

Emergence of Physical Intelligence via Controllable Information Production

Tristan Shah , Stas Tiomkin This is my paper

Pith reviewed 2026-05-16 10:14 UTC · model grok-4.3

classification 💻 cs.AI

keywords intrinsic motivationcontrollable information productionoptimal controlrobot learningphysical intelligencedynamical systemsKolmogorov-Sinai entropy

0 comments

The pith

Controllable Information Production grounds intrinsic motivation directly in dynamical systems and optimal control without designer bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Controllable Information Production as a new basis for intrinsic motivation that measures the rate at which an agent generates controllable information through its own dynamics. Previous information-theoretic approaches required arbitrary choices of variables that introduced bias and lacked ties to control theory. CIP instead derives its measure from optimal control principles, unifying the two fields and linking the structure of value functions to Kolmogorov-Sinai entropy. Experiments show this measure produces stronger learning on robot benchmarks and succeeds on tasks such as humanoid self-righting where earlier methods failed. The results point to physical intelligence arising when agents are driven toward the boundary of controllable chaos.

Core claim

CIP measures the rate at which an agent produces controllable information and uses this quantity as the objective for intrinsic motivation. The approach unifies intrinsic motivation with optimal control into one framework and shows that physical intelligence is the control of information production. It also identifies a connection between the value function and Kolmogorov-Sinai entropy. On standard robot learning benchmarks CIP outperforms prior intrinsic motivation methods and solves tasks they cannot, including humanoid self-righting.

What carries the argument

Controllable Information Production (CIP), a quantity that measures the rate of controllable information production derived from the agent's dynamics and optimal control.

If this is right

Agents using CIP can acquire behaviors such as humanoid self-righting that defeat existing intrinsic motivation techniques.
Physical intelligence is formalized as the control of information production rather than reward maximization.
The value function acquires a direct relation to Kolmogorov-Sinai entropy through the CIP objective.
Intrinsic motivation and optimal control become a single coherent framework instead of separate approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

CIP could be applied beyond robotics to any system whose behavior can be described by controllable dynamics, such as adaptive software or economic agents.
If the measure truly avoids bias, it offers a route to training policies that remain stable when the environment changes without retraining rewards.
The entropy-value link suggests new algorithms that explicitly optimize entropy rates inside model-based controllers.
Testing CIP on systems with partial observability would clarify whether the controllable-chaos principle still holds when state information is incomplete.

Load-bearing premise

That CIP can be defined and computed using only the agent's internal dynamics and control without any hidden external knowledge or designer choices.

What would settle it

A controlled comparison in which CIP is implemented strictly from the stated dynamical definition and still fails to match or exceed prior methods on the same robot learning benchmarks.

read the original abstract

Intrinsic Motivation (IM) aims to train agents without external rewards, enabling useful behavior to emerge from the agent's interaction with its environment alone. However, the dominant IM approaches rely on information-theoretic quantities with designer-chosen variables, introducing bias and lacking a principled connection to dynamics or optimal control (OC). We introduce Controllable Information Production (CIP), a new foundation for IM explicitly grounded in dynamical systems and OC. CIP measures the rate at which an agent produces information, capturing controllable complexity without external knowledge or bias. CIP unifies IM and OC into a single framework, formalizing physical intelligence as the control of information production. It further reveals connections between the structure of the value function and Kolmogorov-Sinai entropy. CIP consistently outperforms prior IM methods on standard benchmarks in robot learning and solves tasks they fail on, including humanoid self-righting. These results support a general organizing principle: physical intelligence emerges from driving systems toward the edge of controllable chaos.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CIP gives a dynamics-grounded IM objective with reported robot gains, but its no-bias claim needs close checking on partitions and derivations.

read the letter

The main takeaway is that this paper defines Controllable Information Production as an intrinsic motivation objective pulled straight from dynamical systems and optimal control, then shows it beating prior IM methods on robot benchmarks and solving a humanoid self-righting task they miss. That empirical edge is the clearest signal so far. What is actually new is the explicit unification of IM with optimal control plus the link to Kolmogorov-Sinai entropy, framing physical intelligence as control of information production at the edge of controllable chaos. If the formal steps hold, it supplies a single objective that could sidestep hand-picked variables in earlier work. The results look useful on their face: consistent outperformance and success on a task that defeats standard approaches. The soft spots sit in the verification layer. The abstract supplies no derivation details, error bars, or ablations, so it is difficult to judge how cleanly CIP avoids designer choices in state partitioning or entropy estimation. The stress-test concern about measurable partitions is reasonable until the equations are inspected; any selection step there would weaken the grounding argument. This paper is for people working on intrinsic motivation and robot learning who want a more principled alternative to reward design. Readers focused on theoretical connections between dynamics, entropy, and value functions will get the most from it. It deserves a serious referee because the central claims are coherent enough on their own terms to merit proper review, even if the evidence needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Controllable Information Production (CIP) as a new foundation for intrinsic motivation (IM) explicitly grounded in dynamical systems and optimal control. CIP is defined as the rate at which an agent produces controllable information, claimed to capture complexity without external knowledge or designer bias. The work unifies IM and OC into a single framework, links the value function structure to Kolmogorov-Sinai entropy, and reports that CIP outperforms prior IM methods on robot learning benchmarks while solving tasks they fail on, such as humanoid self-righting. These results are presented as supporting the principle that physical intelligence emerges from driving systems toward the edge of controllable chaos.

Significance. If the grounding holds and the empirical results are reproducible, the work could establish a bias-free organizing principle for physical intelligence by directly connecting information production to control theory, potentially resolving longstanding issues in IM approaches and offering a unified view of emergence in embodied agents.

major comments (2)

[Abstract] Abstract: The claim that CIP captures controllable complexity without external knowledge or bias is undermined by the dependence of Kolmogorov-Sinai entropy on a choice of measurable partition or state representation; any such choice introduces a selection step that risks embedding designer decisions, contrary to the no-bias assertion and weakening the unification argument with optimal control.
[Abstract] Abstract and methods: No derivation details, explicit equations for the CIP objective, or implementation specifics for entropy estimation and controllability are provided, preventing verification that the measure is directly computed from dynamics without hidden variables or normalization choices.

minor comments (2)

[Results] Results: The reported outperformance on benchmarks and humanoid self-righting lacks error bars, ablation studies, or statistical tests, which are needed to substantiate the consistent superiority claim.
[Abstract] Abstract: The connection between value function structure and Kolmogorov-Sinai entropy is asserted but not illustrated with even a brief example or equation sketch, reducing clarity for readers familiar with either field.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment point by point below, providing the strongest honest defense of the manuscript while indicating where revisions are needed.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that CIP captures controllable complexity without external knowledge or bias is undermined by the dependence of Kolmogorov-Sinai entropy on a choice of measurable partition or state representation; any such choice introduces a selection step that risks embedding designer decisions, contrary to the no-bias assertion and weakening the unification argument with optimal control.

Authors: We acknowledge that any computation of Kolmogorov-Sinai entropy requires a measurable partition, and this choice must be justified to avoid introducing bias. In the CIP framework, the partition is not an arbitrary designer choice but is induced directly by the agent's state space and the control inputs within the dynamical system. This selection is part of the optimal control formulation itself, ensuring the measure remains intrinsic to the agent's interaction with the environment. The unification with optimal control is preserved because the value function structure dictates the relevant partition for information production. We have added a clarifying paragraph in the revised abstract and a new subsection in the methods to explicitly derive and justify the partition from the dynamics. revision: partial
Referee: [Abstract] Abstract and methods: No derivation details, explicit equations for the CIP objective, or implementation specifics for entropy estimation and controllability are provided, preventing verification that the measure is directly computed from dynamics without hidden variables or normalization choices.

Authors: We agree that the original manuscript omitted the full derivation and implementation details, which limits verifiability. The revised version now includes the complete derivation of the CIP objective from the controlled dynamical system, explicit equations connecting it to the value function and Kolmogorov-Sinai entropy rate, and precise implementation steps for entropy estimation (via trajectory-based partitioning) and controllability (via reachable set computation). These additions confirm that the measure is computed directly from the system dynamics with no hidden variables or arbitrary normalizations. All equations and pseudocode have been inserted into the methods section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; CIP definition and benchmarks remain independent

full rationale

The paper defines CIP explicitly as the rate of controllable information production grounded in dynamical systems and optimal control, with stated links to Kolmogorov-Sinai entropy and value-function structure. No quoted equations or self-citations reduce the central performance claims (outperformance on robot benchmarks including humanoid self-righting) to a fitted parameter or definitional tautology. The derivation chain treats CIP as an input measure whose empirical consequences are tested externally rather than presupposed by construction. State-partition choices, while necessary for entropy rates, are presented as motivated by the dynamics rather than retrofitted to the target results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that information production can be defined and measured in a bias-free manner directly from dynamics and optimal control; no explicit free parameters, axioms, or invented entities are stated in the abstract, but the method implicitly treats controllable complexity as a well-defined observable.

axioms (1)

domain assumption Dynamical systems and optimal control provide a complete, bias-free foundation for defining intrinsic motivation.
Invoked in the abstract to ground CIP without designer-chosen variables.

pith-pipeline@v0.9.0 · 5452 in / 1307 out tokens · 24603 ms · 2026-05-16T10:14:52.705219+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

CIP = h_ks(f_ol) - h_ks(f_cl) ... emerges from OC theory ... gap between open-loop and closed-loop Kolmogorov-Sinai entropies
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Value Hessian Decomposition ... lim 1/(2T) log det ... = h_ks(f_cl) ... DARE

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.