Cognitive Training for Language Models: Towards General Capabilities via Cross-Entropy Games

Andrew Emil; Arthur Renard; Cl\'ement Hongler; Franck Gabriel; Valentin Hartmann

arxiv: 2603.22479 · v3 · submitted 2026-03-23 · 🧮 math.OC · cs.AI

Cognitive Training for Language Models: Towards General Capabilities via Cross-Entropy Games

Cl\'ement Hongler , Franck Gabriel , Valentin Hartmann , Arthur Renard , Andrew Emil This is my paper

Pith reviewed 2026-05-15 00:20 UTC · model grok-4.3

classification 🧮 math.OC cs.AI

keywords cognitive trainingcross entropy gamescurriculum learninglanguage modelsgeneral capabilitiesgreedy optimizationmeta objective

0 comments

The pith

Cognitive training provides the unique meta-objective for automatically growing language model curricula through Cross-Entropy Games.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to define a constructive process for building general capabilities in language models automatically by creating a growing curriculum of tasks that enables relevant skill discovery. Using a family of tasks called Cross-Entropy Games, which are postulated to be universal, it demonstrates that if curriculum growth can be achieved by iterating a greedy optimization algorithm, then there is essentially only one possible meta-objective under natural assumptions, up to a few hyperparameters. This process is termed cognitive training. A reader would care because it offers a principled automatic method for skill discovery, potentially solving the open problem of achieving general capabilities if the assumptions about the universality of the games and the capability of language models hold.

Core claim

We show that if it is possible to grow the curriculum for relevant skill discovery by iterating a greedy optimization algorithm, then, under natural assumptions, there is essentially only one meta-objective possible up to a few hyper-parameters. We call the resulting process cognitive training. We postulate that, given sufficiently capable language models as players and meta-samplers, cognitive training provides a principled way to relevant skill discovery and hence to general capabilities via greedy curriculum learning.

What carries the argument

Cross-Entropy Games as the universal task family, with language models acting as players and meta-samplers, enabling the derivation of a unique meta-objective for curriculum growth via greedy optimization.

If this is right

Cognitive training emerges as the sole meta-objective for curriculum growth under the given assumptions.
This provides an automatic way to discover relevant skills for language models.
General capabilities become achievable through iterative greedy optimization on these games.
Hyperparameters allow flexibility while maintaining uniqueness of the objective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the assumptions hold, this framework could extend to training other types of models beyond language models.
Current language models might be tested as meta-samplers to see if cognitive training can start with existing capabilities.
Convergence of different training methods to this meta-objective could explain why some approaches work better than others.

Load-bearing premise

The assumption that Cross-Entropy Games form a universal family of tasks and that language models can sufficiently act as players and meta-samplers for the uniqueness of the meta-objective to hold.

What would settle it

Finding that multiple distinct meta-objectives can grow effective curricula for skill discovery using greedy algorithms on these tasks, or that language models cannot serve as adequate meta-samplers, would falsify the central claim.

read the original abstract

Defining a constructive process to build general capabilities for language models in an automatic manner is considered an open problem in artificial intelligence. Towards this, we consider the problem of building a curriculum of tasks that grows a model via relevant skill discovery. We provide a concrete framework for this task, using a family of tasks called Cross-Entropy Games, which we postulate is universal in a suitable sense. We show that if it is possible to grow the curriculum for relevant skill discovery by iterating a greedy optimization algorithm, then, under natural assumptions, there is essentially only one meta-objective possible (up to a few hyper-parameters). We call the resulting process cognitive training. We postulate that, given sufficiently capable language models as players and meta-samplers, cognitive training provides a principled way to relevant skill discovery; and hence to the extent general capabilities are achievable via greedy curriculum learning, cognitive training would be a solution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a conceptual framework for cognitive training via Cross-Entropy Games but its uniqueness claim rests on unlisted assumptions and no visible derivation.

read the letter

The paper sketches a framework for growing language model capabilities through a curriculum of Cross-Entropy Games, claiming that greedy optimization leads to essentially one meta-objective under natural assumptions. They call this cognitive training and suggest it could automate relevant skill discovery if the games are universal enough and the models are capable players and samplers. The constructive part is the attempt to identify a unique meta-objective for this greedy curriculum growth. It takes ideas from curriculum learning and tries to make them more precise by focusing on what the optimization target must look like if the process is to work generally. That gives a clear structure to the discussion. The main weakness is that the uniqueness result is not actually derived. The abstract mentions natural assumptions but does not list them or show the steps that rule out other objectives. The universality of the games is stated as a postulate without supporting argument or examples. This leaves the central claim hanging on unverified premises, and the argument risks circularity by tying success to the framework's own definitions. There is also no indication of any concrete implementation, experiments, or checks on whether the greedy process behaves as expected. This paper is for theorists interested in formalizing automatic curriculum design for AI agents. A reader working on similar meta-learning or self-improvement setups might find the high-level picture suggestive, but it offers little that can be used or tested directly. I would not bring this to a reading group in its current state. It is not ready for peer review because the key steps are missing; the authors should first make the assumptions explicit and provide the derivation before expecting serious referee feedback.

Referee Report

3 major / 2 minor

Summary. The paper proposes a framework for automatic curriculum construction in language models using a family of tasks called Cross-Entropy Games, which are postulated to be universal. It claims that if curriculum growth for relevant skill discovery can be achieved by iterating a greedy optimization algorithm, then under natural assumptions there is essentially only one possible meta-objective (up to a few hyperparameters); the resulting process is termed cognitive training. The authors postulate that sufficiently capable language models can serve as players and meta-samplers, making cognitive training a principled route to general capabilities via greedy curriculum learning.

Significance. If the uniqueness result can be made rigorous by enumerating and verifying the natural assumptions, and if the universality postulate for Cross-Entropy Games can be supported by explicit reductions or counter-example exclusion, the work would offer a notable theoretical contribution to automatic skill discovery and curriculum design. It would provide a candidate for a canonical meta-objective in greedy settings, potentially unifying disparate approaches to capability growth in language models.

major comments (3)

[Abstract / uniqueness statement] Abstract and main uniqueness claim: the statement that 'under natural assumptions, there is essentially only one meta-objective possible' is presented without an explicit list of those assumptions or a self-contained derivation showing how greedy iteration rules out alternatives. The argument therefore rests on unstated conditions whose necessity is not demonstrated.
[Abstract / Cross-Entropy Games definition] Abstract: the universality of Cross-Entropy Games is introduced as a postulate without supporting arguments, reductions to known task families, or exclusion of counter-examples. This postulate is load-bearing for the claim that cognitive training yields general capabilities.
[Abstract / cognitive training process] Abstract: no error analysis, concrete small-scale example, or empirical test of the greedy curriculum growth process is supplied, leaving the central claim that the process produces 'relevant skill discovery' without visible verification steps.

minor comments (2)

[Abstract] The abstract would benefit from clearer separation between postulates and derived results to help readers distinguish what is assumed from what is shown.
[Main text] Notation for the meta-objective and the greedy iteration step should be introduced with explicit definitions before the uniqueness claim is stated.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments correctly identify areas where the abstract and presentation can be strengthened by making assumptions explicit, supporting the universality postulate, and adding an illustrative example. We address each major comment below and will incorporate the suggested clarifications in a revised manuscript.

read point-by-point responses

Referee: [Abstract / uniqueness statement] Abstract and main uniqueness claim: the statement that 'under natural assumptions, there is essentially only one meta-objective possible' is presented without an explicit list of those assumptions or a self-contained derivation showing how greedy iteration rules out alternatives. The argument therefore rests on unstated conditions whose necessity is not demonstrated.

Authors: We agree that the uniqueness claim would benefit from an explicit enumeration of assumptions and a self-contained derivation. In the revised version we will add a dedicated subsection that lists the natural assumptions (greedy optimization of curriculum growth, relevant skill discovery as the selection criterion, and the meta-objective being defined over the space of Cross-Entropy Games) and derives the uniqueness result step by step, showing how alternative meta-objectives are excluded under these conditions. revision: yes
Referee: [Abstract / Cross-Entropy Games definition] Abstract: the universality of Cross-Entropy Games is introduced as a postulate without supporting arguments, reductions to known task families, or exclusion of counter-examples. This postulate is load-bearing for the claim that cognitive training yields general capabilities.

Authors: The current manuscript presents universality as a postulate to focus on the consequences for curriculum construction. We will strengthen this in revision by adding explicit reductions from standard families (next-token prediction, instruction following, and simple reasoning tasks) to Cross-Entropy Games and by discussing the class of tasks that fall outside the family, thereby clarifying the scope of the universality claim. revision: yes
Referee: [Abstract / cognitive training process] Abstract: no error analysis, concrete small-scale example, or empirical test of the greedy curriculum growth process is supplied, leaving the central claim that the process produces 'relevant skill discovery' without visible verification steps.

Authors: The manuscript is primarily theoretical and therefore contains no empirical evaluation. We will add a small-scale illustrative example of the greedy iteration on a toy task family together with a basic error analysis under the stated assumptions. This will supply the requested verification steps while remaining within the theoretical scope of the work. revision: yes

Circularity Check

2 steps flagged

Uniqueness of meta-objective rests on unstated 'natural assumptions' and universality postulate without independent derivation

specific steps

self definitional [Abstract]
"We show that if it is possible to grow the curriculum for relevant skill discovery by iterating a greedy optimization algorithm, then, under natural assumptions, there is essentially only one meta-objective possible (up to a few hyper-parameters). We call the resulting process cognitive training."

The uniqueness result is asserted under 'natural assumptions' that are never listed or shown to be minimal; the meta-objective is defined exactly as the fixed point of the greedy iteration on the postulated games, so the claim that only one such objective exists is true by construction of the setup rather than by independent derivation.
self definitional [Abstract]
"We postulate that, given sufficiently capable language models as players and meta-samplers, cognitive training provides a principled way to relevant skill discovery; and hence to the extent general capabilities are achievable via greedy curriculum learning, cognitive training would be a solution."

Success is defined in terms of the universality postulate for the introduced Cross-Entropy Games and the greedy process itself; the framework therefore validates its own meta-objective by the same postulates used to introduce it.

full rationale

The paper's central claim reduces the uniqueness of the meta-objective to a conditional on unspecified natural assumptions plus a postulate that Cross-Entropy Games are universal. No enumerated assumptions or self-contained proof is supplied showing that alternatives are ruled out; the construction defines the objective precisely as the one compatible with greedy curriculum growth, making uniqueness hold by the framing of the assumptions rather than external derivation. The universality is stated as a postulate without reduction or counter-example exclusion, rendering the framework self-referential in its success criteria.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The framework rests on the postulate that Cross-Entropy Games form a universal task family and on unspecified natural assumptions that enable the uniqueness derivation for the meta-objective.

free parameters (1)

a few hyper-parameters
Mentioned as the only freedom left in the unique meta-objective after the uniqueness result.

axioms (2)

ad hoc to paper Cross-Entropy Games are universal in a suitable sense
Explicitly postulated as the basis for the framework.
ad hoc to paper natural assumptions allow uniqueness of the meta-objective under greedy iteration
Invoked to derive the single possible meta-objective.

invented entities (1)

Cross-Entropy Games no independent evidence
purpose: Family of tasks postulated to enable relevant skill discovery via curriculum growth
Newly introduced construct that the entire framework depends on.

pith-pipeline@v0.9.0 · 5460 in / 1385 out tokens · 37968 ms · 2026-05-15T00:20:20.286221+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

If we can solve Problem 2 via a greedy method involving at every step the optimization of meta-objective O, then from reasonable principles an explicit (and unique) formula for O can be found.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Using the theoretical rescaling invariance idea... we must have rescaling invariance for q(H)... yielding powers sum up to 1

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.