Ergodic Risk Measures: Towards a Risk-Aware Foundation for Continual Reinforcement Learning
Pith reviewed 2026-05-18 10:10 UTC · model grok-4.3
The pith
Classical risk measures from standard RL cannot be used directly for continual learning, so the paper defines ergodic risk measures that operate on long-run average behavior and are compatible with lifelong agents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that the classical theory of risk measures is incompatible with continual reinforcement learning because it does not align with the ergodic long-run average behavior required for lifelong agents. They extend risk measure theory by introducing ergodic risk measures, prove their compatibility with continual learning, and illustrate their use through a case study that includes empirical results showing practical advantages over risk-neutral approaches.
What carries the argument
Ergodic risk measures, a new class of risk measures defined directly on the ergodic (long-run average) behavior of the underlying process rather than on finite-horizon or stationary distributions.
If this is right
- Risk-aware objectives become feasible for agents that must retain knowledge while adapting indefinitely.
- Decision-making in lifelong settings can incorporate measures of variability or downside risk tied to long-run performance.
- The framework supports development of algorithms that optimize risk beyond mean return in non-stationary environments.
- Empirical validation in case studies shows the measures yield intuitive risk control during continual task sequences.
Where Pith is reading between the lines
- The approach could extend to other sequential decision problems with non-stationary data, such as online planning or adaptive control.
- Estimation of ergodic quantities in practice may require new sampling or approximation methods suited to drifting environments.
- Links to ergodic theory open the possibility of importing stability results from dynamical systems into risk-aware RL.
Load-bearing premise
That continual RL requires risk measures to be defined on ergodic long-run average behavior instead of finite-horizon or stationary quantities.
What would settle it
A demonstration that a classical risk measure can be applied to a continual RL task without producing inconsistency in the agent's long-run behavior, or an experiment where ergodic risk measures fail to control risk as claimed.
read the original abstract
Continual reinforcement learning (continual RL) seeks to formalize the notions of lifelong learning and endless adaptation in RL. In particular, the aim of continual RL is to develop RL agents that can maintain a careful balance between retaining useful information and adapting to new situations. To date, continual RL has been explored almost exclusively through the lens of risk-neutral decision-making, in which the agent aims to optimize the expected long-run performance. In this work, we present the first formal theoretical treatment of continual RL through the lens of risk-aware decision-making, in which the behaviour of the agent is directed towards optimizing a measure of long-run performance beyond the mean. In particular, we show that the classical theory of risk measures, widely used as a theoretical foundation in non-continual risk-aware RL, is, in its current form, incompatible with continual learning. Then, building on this insight, we extend risk measure theory into the continual setting by introducing a new class of ergodic risk measures, and showing that it is compatible with continual learning. Finally, we provide a case study of continual risk-aware learning, along with empirical results, which show the intuitive appeal of ergodic risk measures in continual settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that classical risk measures from non-continual risk-aware RL are incompatible with continual reinforcement learning because they cannot be directly defined on ergodic long-run average behavior. It introduces a new class of ergodic risk measures that are compatible with continual learning by construction, provides a theoretical extension of risk measure theory, and includes a case study with empirical results illustrating their intuitive appeal for lifelong agents.
Significance. If the incompatibility result and the new measures hold, this would constitute the first formal theoretical treatment of risk-aware continual RL, bridging two previously separate areas and supplying a compatible foundation for risk-sensitive lifelong agents. The empirical case study adds practical grounding. Strengths include the explicit extension of risk measure axioms to the ergodic setting and the focus on long-run adaptation.
major comments (2)
- [Modeling premise / abstract and introduction] The central incompatibility claim for classical risk measures (stated in the abstract and developed in the modeling section) rests on the premise that any valid risk measure for continual RL must be defined directly on the ergodic occupation measure rather than on finite-horizon returns, stationary distributions, or time-inhomogeneous trajectories. If continual RL agents can optimize risk measures over rolling finite windows or non-ergodic policies while still satisfying lifelong adaptation, the incompatibility does not follow and the motivation for ergodic risk measures weakens. This modeling choice is load-bearing and requires explicit justification or counter-examples showing why alternative definitions violate continual RL requirements.
- [Definition of ergodic risk measures] The ergodic risk measures are introduced to satisfy the compatibility property with continual learning. While the construction demonstrates internal consistency, the central claim would be strengthened by an independent verification (e.g., comparison to an external benchmark risk measure or a falsifiable prediction) rather than relying solely on the definition. Without this, the result risks reducing to a tautology.
minor comments (1)
- [Abstract] The abstract asserts both incompatibility and compatibility results without equations, proof sketches, or details on the empirical protocols or data-exclusion criteria, which limits immediate assessment of the claims.
Simulated Author's Rebuttal
Thank you for the constructive referee report. We address each major comment below with point-by-point responses and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Modeling premise / abstract and introduction] The central incompatibility claim for classical risk measures (stated in the abstract and developed in the modeling section) rests on the premise that any valid risk measure for continual RL must be defined directly on the ergodic occupation measure rather than on finite-horizon returns, stationary distributions, or time-inhomogeneous trajectories. If continual RL agents can optimize risk measures over rolling finite windows or non-ergodic policies while still satisfying lifelong adaptation, the incompatibility does not follow and the motivation for ergodic risk measures weakens. This modeling choice is load-bearing and requires explicit justification or counter-examples showing why alternative definitions violate continual RL requirements.
Authors: We agree that the modeling premise is central and benefits from expanded justification. In the revised version we will add a dedicated subsection in the modeling section that formally argues why risk measures for continual RL must operate on the ergodic occupation measure. Continual RL is defined by indefinite interaction without a terminal horizon; any risk assessment that relies on finite windows or time-inhomogeneous trajectories necessarily introduces a recency bias that violates the requirement of lifelong, non-forgetting adaptation. We will include two concrete counter-examples: (i) a rolling-window CVaR that produces inconsistent risk rankings across successive environment shifts, and (ii) a stationary-distribution risk measure that cannot be updated without resetting the agent’s internal state. These examples demonstrate that the alternatives fail to preserve the ergodic consistency demanded by continual RL, thereby reinforcing the motivation for ergodic risk measures. revision: yes
-
Referee: [Definition of ergodic risk measures] The ergodic risk measures are introduced to satisfy the compatibility property with continual learning. While the construction demonstrates internal consistency, the central claim would be strengthened by an independent verification (e.g., comparison to an external benchmark risk measure or a falsifiable prediction) rather than relying solely on the definition. Without this, the result risks reducing to a tautology.
Authors: We acknowledge the concern that an axiomatic construction alone could appear circular. The manuscript already contains a case study that empirically contrasts ergodic risk measures with classical ones on a lifelong non-stationary task, showing qualitatively different adaptation behavior. To provide the requested independent verification we will add, in the revised manuscript, (a) a direct numerical comparison of ergodic versus classical risk values on the same occupation measures and (b) a falsifiable prediction: ergodic risk measures will yield lower regret under repeated environment switches than any rolling-window baseline. These additions move beyond the definition while preserving the theoretical contribution. revision: partial
Circularity Check
Ergodic risk measures introduced specifically to satisfy compatibility with continual RL by construction
specific steps
-
self definitional
[Abstract]
"we show that the classical theory of risk measures... is, in its current form, incompatible with continual learning. Then, building on this insight, we extend risk measure theory into the continual setting by introducing a new class of ergodic risk measures, and showing that it is compatible with continual learning."
The incompatibility is asserted under the premise that valid risk measures for continual RL must be defined on ergodic behavior. The new ergodic risk measures are then defined to meet exactly this requirement, so the subsequent demonstration of compatibility is true by the construction of the class rather than by independent derivation or external check.
full rationale
The paper first claims classical risk measures are incompatible with continual RL because they are not defined directly on ergodic long-run averages. It then introduces a new class of ergodic risk measures and shows they are compatible. This compatibility follows from the modeling premise and definition rather than an independent verification or external benchmark. The central derivation reduces to tailoring the new object to the chosen requirement, introducing moderate circularity risk without reducing the entire result to a tautology.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Risk measures satisfy monotonicity, convexity, and translation invariance as in classical risk-measure theory.
invented entities (1)
-
ergodic risk measures
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Lemma 4.5, 4.6: static and nested risk measures violate Feasibility or Plasticity axioms
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.