pith. sign in

arxiv: 2510.02945 · v3 · submitted 2025-10-03 · 💻 cs.LG · cs.AI

Ergodic Risk Measures: Towards a Risk-Aware Foundation for Continual Reinforcement Learning

Pith reviewed 2026-05-18 10:10 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords continual reinforcement learningrisk measuresergodic risk measuresrisk-aware RLlifelong learninglong-run averagenon-stationary environments
0
0 comments X

The pith

Classical risk measures from standard RL cannot be used directly for continual learning, so the paper defines ergodic risk measures that operate on long-run average behavior and are compatible with lifelong agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that classical risk measures, which work in non-continual settings by assessing risk on finite-horizon or stationary returns, lead to inconsistency when applied to continual RL agents that must handle ongoing adaptation and non-stationary environments. Building from this, the authors introduce ergodic risk measures as a new class that evaluates risk with respect to the agent's long-run average behavior. This matters for building risk-aware lifelong learners that optimize beyond expected performance while retaining past knowledge and adapting to new tasks. A case study with empirical results demonstrates that the new measures produce intuitive risk-aware behavior in continual settings.

Core claim

The authors establish that the classical theory of risk measures is incompatible with continual reinforcement learning because it does not align with the ergodic long-run average behavior required for lifelong agents. They extend risk measure theory by introducing ergodic risk measures, prove their compatibility with continual learning, and illustrate their use through a case study that includes empirical results showing practical advantages over risk-neutral approaches.

What carries the argument

Ergodic risk measures, a new class of risk measures defined directly on the ergodic (long-run average) behavior of the underlying process rather than on finite-horizon or stationary distributions.

If this is right

  • Risk-aware objectives become feasible for agents that must retain knowledge while adapting indefinitely.
  • Decision-making in lifelong settings can incorporate measures of variability or downside risk tied to long-run performance.
  • The framework supports development of algorithms that optimize risk beyond mean return in non-stationary environments.
  • Empirical validation in case studies shows the measures yield intuitive risk control during continual task sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could extend to other sequential decision problems with non-stationary data, such as online planning or adaptive control.
  • Estimation of ergodic quantities in practice may require new sampling or approximation methods suited to drifting environments.
  • Links to ergodic theory open the possibility of importing stability results from dynamical systems into risk-aware RL.

Load-bearing premise

That continual RL requires risk measures to be defined on ergodic long-run average behavior instead of finite-horizon or stationary quantities.

What would settle it

A demonstration that a classical risk measure can be applied to a continual RL task without producing inconsistency in the agent's long-run behavior, or an experiment where ergodic risk measures fail to control risk as claimed.

read the original abstract

Continual reinforcement learning (continual RL) seeks to formalize the notions of lifelong learning and endless adaptation in RL. In particular, the aim of continual RL is to develop RL agents that can maintain a careful balance between retaining useful information and adapting to new situations. To date, continual RL has been explored almost exclusively through the lens of risk-neutral decision-making, in which the agent aims to optimize the expected long-run performance. In this work, we present the first formal theoretical treatment of continual RL through the lens of risk-aware decision-making, in which the behaviour of the agent is directed towards optimizing a measure of long-run performance beyond the mean. In particular, we show that the classical theory of risk measures, widely used as a theoretical foundation in non-continual risk-aware RL, is, in its current form, incompatible with continual learning. Then, building on this insight, we extend risk measure theory into the continual setting by introducing a new class of ergodic risk measures, and showing that it is compatible with continual learning. Finally, we provide a case study of continual risk-aware learning, along with empirical results, which show the intuitive appeal of ergodic risk measures in continual settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that classical risk measures from non-continual risk-aware RL are incompatible with continual reinforcement learning because they cannot be directly defined on ergodic long-run average behavior. It introduces a new class of ergodic risk measures that are compatible with continual learning by construction, provides a theoretical extension of risk measure theory, and includes a case study with empirical results illustrating their intuitive appeal for lifelong agents.

Significance. If the incompatibility result and the new measures hold, this would constitute the first formal theoretical treatment of risk-aware continual RL, bridging two previously separate areas and supplying a compatible foundation for risk-sensitive lifelong agents. The empirical case study adds practical grounding. Strengths include the explicit extension of risk measure axioms to the ergodic setting and the focus on long-run adaptation.

major comments (2)
  1. [Modeling premise / abstract and introduction] The central incompatibility claim for classical risk measures (stated in the abstract and developed in the modeling section) rests on the premise that any valid risk measure for continual RL must be defined directly on the ergodic occupation measure rather than on finite-horizon returns, stationary distributions, or time-inhomogeneous trajectories. If continual RL agents can optimize risk measures over rolling finite windows or non-ergodic policies while still satisfying lifelong adaptation, the incompatibility does not follow and the motivation for ergodic risk measures weakens. This modeling choice is load-bearing and requires explicit justification or counter-examples showing why alternative definitions violate continual RL requirements.
  2. [Definition of ergodic risk measures] The ergodic risk measures are introduced to satisfy the compatibility property with continual learning. While the construction demonstrates internal consistency, the central claim would be strengthened by an independent verification (e.g., comparison to an external benchmark risk measure or a falsifiable prediction) rather than relying solely on the definition. Without this, the result risks reducing to a tautology.
minor comments (1)
  1. [Abstract] The abstract asserts both incompatibility and compatibility results without equations, proof sketches, or details on the empirical protocols or data-exclusion criteria, which limits immediate assessment of the claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive referee report. We address each major comment below with point-by-point responses and indicate planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Modeling premise / abstract and introduction] The central incompatibility claim for classical risk measures (stated in the abstract and developed in the modeling section) rests on the premise that any valid risk measure for continual RL must be defined directly on the ergodic occupation measure rather than on finite-horizon returns, stationary distributions, or time-inhomogeneous trajectories. If continual RL agents can optimize risk measures over rolling finite windows or non-ergodic policies while still satisfying lifelong adaptation, the incompatibility does not follow and the motivation for ergodic risk measures weakens. This modeling choice is load-bearing and requires explicit justification or counter-examples showing why alternative definitions violate continual RL requirements.

    Authors: We agree that the modeling premise is central and benefits from expanded justification. In the revised version we will add a dedicated subsection in the modeling section that formally argues why risk measures for continual RL must operate on the ergodic occupation measure. Continual RL is defined by indefinite interaction without a terminal horizon; any risk assessment that relies on finite windows or time-inhomogeneous trajectories necessarily introduces a recency bias that violates the requirement of lifelong, non-forgetting adaptation. We will include two concrete counter-examples: (i) a rolling-window CVaR that produces inconsistent risk rankings across successive environment shifts, and (ii) a stationary-distribution risk measure that cannot be updated without resetting the agent’s internal state. These examples demonstrate that the alternatives fail to preserve the ergodic consistency demanded by continual RL, thereby reinforcing the motivation for ergodic risk measures. revision: yes

  2. Referee: [Definition of ergodic risk measures] The ergodic risk measures are introduced to satisfy the compatibility property with continual learning. While the construction demonstrates internal consistency, the central claim would be strengthened by an independent verification (e.g., comparison to an external benchmark risk measure or a falsifiable prediction) rather than relying solely on the definition. Without this, the result risks reducing to a tautology.

    Authors: We acknowledge the concern that an axiomatic construction alone could appear circular. The manuscript already contains a case study that empirically contrasts ergodic risk measures with classical ones on a lifelong non-stationary task, showing qualitatively different adaptation behavior. To provide the requested independent verification we will add, in the revised manuscript, (a) a direct numerical comparison of ergodic versus classical risk values on the same occupation measures and (b) a falsifiable prediction: ergodic risk measures will yield lower regret under repeated environment switches than any rolling-window baseline. These additions move beyond the definition while preserving the theoretical contribution. revision: partial

Circularity Check

1 steps flagged

Ergodic risk measures introduced specifically to satisfy compatibility with continual RL by construction

specific steps
  1. self definitional [Abstract]
    "we show that the classical theory of risk measures... is, in its current form, incompatible with continual learning. Then, building on this insight, we extend risk measure theory into the continual setting by introducing a new class of ergodic risk measures, and showing that it is compatible with continual learning."

    The incompatibility is asserted under the premise that valid risk measures for continual RL must be defined on ergodic behavior. The new ergodic risk measures are then defined to meet exactly this requirement, so the subsequent demonstration of compatibility is true by the construction of the class rather than by independent derivation or external check.

full rationale

The paper first claims classical risk measures are incompatible with continual RL because they are not defined directly on ergodic long-run averages. It then introduces a new class of ergodic risk measures and shows they are compatible. This compatibility follows from the modeling premise and definition rather than an independent verification or external benchmark. The central derivation reduces to tailoring the new object to the chosen requirement, introducing moderate circularity risk without reducing the entire result to a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard axioms of coherent risk measures plus the modeling decision that continual learning requires ergodic (long-run) formulations; the paper introduces ergodic risk measures as a new entity without external falsifiable evidence beyond the theoretical construction itself.

axioms (1)
  • standard math Risk measures satisfy monotonicity, convexity, and translation invariance as in classical risk-measure theory.
    Invoked when extending classical theory to the continual case.
invented entities (1)
  • ergodic risk measures no independent evidence
    purpose: To provide a risk measure compatible with the non-stationary, lifelong structure of continual RL.
    Defined in the paper to resolve the claimed incompatibility of classical risk measures.

pith-pipeline@v0.9.0 · 5743 in / 1493 out tokens · 36942 ms · 2026-05-18T10:10:31.467307+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.