On the Role of Time in Learning

Alessandro Betti; Marco Gori

arxiv: 1907.06198 · v1 · pith:LJZUWGMUnew · submitted 2019-07-14 · 💻 cs.LG · stat.ML

On the Role of Time in Learning

Alessandro Betti , Marco Gori This is my paper

Pith reviewed 2026-05-24 21:35 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords temporal learningleast cognitive actiondifferential equationsstochastic gradient descentrole of timephysics-inspired learningrecurrent models

0 comments

The pith

Reformulating learning via the principle of Least Cognitive Action models time through differential equations like those in physics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that the usual practice of minimizing risk functions over time with stochastic gradient descent overlooks the deeper meaning of time found in physics. It proposes that a reformulation based on the principle of Least Cognitive Action is more appropriate for learning that involves time. This principle produces a learning process governed by differential equations, allowing the same framework used for natural laws to apply to learning. A reader might care because it suggests learning from temporal sequences can be treated as a natural dynamical system rather than a separate optimization task.

Core claim

By and large the process of learning concepts that are embedded in time is regarded as quite a mature research topic. Hidden Markov models, recurrent neural networks are, amongst others, successful approaches to learning from temporal data. In this paper, we claim that the dominant approach minimizing appropriate risk functions defined over time by classic stochastic gradient might miss the deep interpretation of time given in other fields like physics. We show that a recent reformulation of learning according to the principle of Least Cognitive Action is better suited whenever time is involved in learning. The principle gives rise to a learning process that is driven by differential eq

What carries the argument

The principle of Least Cognitive Action, which reformulates learning so that the process obeys differential equations comparable to those in physics.

If this is right

Learning processes involving time can be expressed as solutions to differential equations.
The framework for learning becomes the same one used for other laws of nature.
Temporal data tasks gain an interpretation of time that matches physics rather than pure optimization.
Recurrent models and hidden Markov models can be re-derived from this variational principle.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may allow borrowing numerical methods from physics simulations to train models on sequences.
It opens the possibility of treating learning dynamics as continuous-time systems rather than discrete updates.
Comparisons could be made between the resulting trajectories and measured neural activity during learning tasks.

Load-bearing premise

Minimizing risk functions over time by stochastic gradient descent misses the deep physical interpretation of time.

What would settle it

An experiment in which a differential-equation model derived from the Least Cognitive Action principle fails to match or exceed the performance of standard stochastic gradient descent on a temporal learning benchmark would challenge the central claim.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper argues that a Least Cognitive Action principle should replace standard risk minimization for temporal learning because it yields differential equations aligned with physics, but the claim stays at the level of an unelaborated analogy.

read the letter

The main takeaway is that the authors think classic stochastic gradient descent on time-dependent risk functions misses something fundamental about time that physics captures through variational principles. They point to a recent reformulation using Least Cognitive Action as the fix, claiming it produces differential equations that sit in the same framework as natural laws. That contrast between discrete optimization and continuous physical description is the paper's clearest contribution, and it is fair to raise it as a possible blind spot in how ML handles sequences. The argument itself is not new in this manuscript; it builds on a prior reformulation the authors reference, so the work here is mostly interpretive advocacy rather than a fresh derivation or result. The soft spot is exactly where the stress-test note lands: the abstract says the new process is driven by differential equations that can “somehow” describe learning like other laws of nature, but nothing in the provided text supplies the cognitive action functional, derives the equations, or shows a concrete task where SGD or RNNs fail to capture temporal structure that this approach resolves. Without that mapping or a counter-example, the deeper-interpretation claim remains an assertion rather than evidence. This kind of foundational discussion might interest a small group of theorists who already follow the authors’ line of work on cognitive action, but it offers no algorithms, no experiments, and no falsifiable predictions that would move the broader sequential-learning literature. I would not send it to peer review in this form; the central claim needs the actual math and at least minimal validation before it deserves referee time.

Referee Report

3 major / 1 minor

Summary. The paper claims that standard approaches to temporal learning, such as minimizing risk functions over time via stochastic gradient descent, overlook the deeper physical interpretation of time. It asserts that a recent reformulation of learning based on the principle of Least Cognitive Action is better suited for time-involved learning because it produces differential equations that describe the process within the same framework as other natural laws. Methods like HMMs and RNNs are mentioned as existing but not compared in detail.

Significance. If the reformulation were shown to yield explicit differential equations with demonstrated advantages over SGD-based risk minimization on temporal tasks, and if those equations were derived from a well-defined cognitive action functional with verifiable parallels to physical laws, the work could offer a conceptual bridge between machine learning and physics. As presented, however, the manuscript supplies neither the functional, the equations, nor any empirical or theoretical comparison, rendering the significance unassessable.

major comments (3)

[Abstract] Abstract: The central assertion that the Least Cognitive Action principle 'gives rise to a learning process that is driven by differential equations' is stated without any derivation, definition of the cognitive action, or explicit form of the resulting equations, making it impossible to evaluate whether they differ from or improve upon standard temporal risk minimization.
[Abstract] Abstract: No concrete task, counter-example, or comparison is supplied showing where SGD on time-defined risk functions fails to capture temporal structure that the proposed differential equations resolve; the claim that the new approach is 'better suited' therefore lacks any load-bearing evidence.
[Abstract] Abstract: The statement that the equations 'can somehow describe the process within the same framework as other laws of nature' is presented as a conclusion without any mapping, equivalence proof, or reference to specific physical laws, leaving the integration claim unsupported.

minor comments (1)

[Abstract] Abstract: Typo 'descrive' should be 'describe'.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the careful reading of the manuscript. The paper is a brief conceptual note that refers to a prior reformulation of learning; we address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central assertion that the Least Cognitive Action principle 'gives rise to a learning process that is driven by differential equations' is stated without any derivation, definition of the cognitive action, or explicit form of the resulting equations, making it impossible to evaluate whether they differ from or improve upon standard temporal risk minimization.

Authors: The derivation, definition of the cognitive action, and explicit differential equations appear in the referenced recent reformulation of learning. The present manuscript is a short position piece whose purpose is to draw attention to the implications for temporal data rather than to reproduce those derivations. We will revise the abstract to include an explicit citation to that prior work. revision: yes
Referee: [Abstract] Abstract: No concrete task, counter-example, or comparison is supplied showing where SGD on time-defined risk functions fails to capture temporal structure that the proposed differential equations resolve; the claim that the new approach is 'better suited' therefore lacks any load-bearing evidence.

Authors: The manuscript advances a conceptual argument that standard risk minimization over time may overlook the physical interpretation of time; it does not assert or demonstrate task-specific superiority. No counter-examples or empirical comparisons are supplied because they lie outside the scope of this short note. We therefore do not plan to add such material. revision: no
Referee: [Abstract] Abstract: The statement that the equations 'can somehow describe the process within the same framework as other laws of nature' is presented as a conclusion without any mapping, equivalence proof, or reference to specific physical laws, leaving the integration claim unsupported.

Authors: The alignment claim rests on the fact that the Least Cognitive Action principle is constructed by direct analogy with the variational principle of least action in physics. The manuscript does not supply an explicit mapping or proof; we agree this is a limitation of the current text and can add a clarifying sentence in revision. revision: partial

standing simulated objections not resolved

The manuscript mentions HMMs and RNNs only in passing and supplies no detailed comparison with them.

Circularity Check

1 steps flagged

Central claim of superiority for Least Cognitive Action reduces to authors' prior self-reformulation without independent derivation shown.

specific steps

self citation load bearing [Abstract]
"We show that a recent reformulation of learning according to the principle of Least Cognitive Action is better suited whenever time is involved in learning. The principle gives rise to a learning process that is driven by differential equations, that can somehow descrive the process within the same framework as other laws of nature."

The paper presents the 'showing' of superiority and the differential-equation integration as its result, yet the reformulation is labeled 'recent' and external to this text. With no derivation supplied and authors matching the likely originators of the principle, the asserted advantage and physical-law parallel reduce to reliance on the authors' own prior definition rather than an independent chain.

full rationale

The paper's abstract asserts that a 'recent reformulation' via Least Cognitive Action yields differential equations integrating learning with physical laws and is 'better suited' than SGD risk minimization. No explicit derivation, equations, or comparative evidence appears in the abstract or described structure; the reformulation is imported as given. Given author overlap and the phrasing, this is a self-citation load-bearing step where the claimed advantage is not re-derived or externally benchmarked here, making the core argument equivalent to re-asserting the prior framework's value.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim depends on the unstated details of the Least Cognitive Action principle and the assertion that standard gradient methods inherently miss a physical interpretation of time; no free parameters, invented entities, or additional axioms are visible in the abstract.

axioms (1)

domain assumption The principle of Least Cognitive Action provides a superior framework for incorporating time into learning compared with risk minimization via stochastic gradient descent.
Invoked directly in the abstract as the basis for preferring the new approach.

pith-pipeline@v0.9.0 · 5631 in / 1145 out tokens · 21954 ms · 2026-05-24T21:35:03.690960+00:00 · methodology

On the Role of Time in Learning

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)