pith. sign in

arxiv: 2605.05856 · v1 · submitted 2026-05-07 · 💻 cs.LG

Measuring Learning Progress via Gradient-Momentum Coupling

Pith reviewed 2026-05-08 14:38 UTC · model grok-4.3

classification 💻 cs.LG
keywords learningnoisemeasuringcouplingcuriosity-drivenerrorexperimentsgradient-momentum
0
0 comments X

The pith

Gradient-Momentum Coupling offers a noise-robust alternative to prediction error for measuring learning progress in curiosity-driven reinforcement learning by quantifying gradient-momentum alignment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In reinforcement learning, agents often use curiosity signals to explore new things. Common signals like prediction error can be fooled by random noise that looks unpredictable but teaches nothing useful. The paper proposes GMC, which looks at the gradient (direction of change) a sample produces and multiplies it with the momentum (smoothed history of past gradients). By normalizing this product per parameter, GMC highlights samples whose gradients are actually moving the model in a consistent direction rather than oscillating randomly. Experiments suggest this leads to better task prioritization and more robust performance when observations contain noise.

Core claim

Experiments on MiniGrid suggest that replacing prediction error with GMC within existing curiosity-driven architectures can improve robustness to observation noise.

Load-bearing premise

That momentum's natural filtering of noise and oscillations reliably identifies samples contributing to ongoing parameter updates rather than merely reflecting optimization artifacts.

read the original abstract

Measuring learning progress is essential for curiosity-driven exploration in reinforcement learning, but widely used signals such as prediction error often fail to distinguish meaningful, learnable patterns from random noise. This paper proposes Gradient-Momentum Coupling (GMC), a signal derived from optimization dynamics that quantifies how useful each sample's gradient is for ongoing learning by measuring its per-parameter normalized absolute product with the momentum from previous gradients. By leveraging momentum's natural filtering of noise and oscillations, GMC identifies samples that contribute to ongoing parameter updates. Controlled experiments demonstrate noise robustness and emergent curriculum learning, with the signal prioritizing tasks by learning speed rather than difficulty. Experiments on MiniGrid suggest that replacing prediction error with GMC within existing curiosity-driven architectures can improve robustness to observation noise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full derivation, assumptions, and any fitted components are not visible.

axioms (1)
  • domain assumption Momentum from previous gradients naturally filters noise and oscillations in optimization trajectories.
    Invoked in abstract to justify why GMC identifies learnable patterns.

pith-pipeline@v0.9.0 · 5417 in / 1097 out tokens · 36152 ms · 2026-05-08T14:38:31.758996+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.