Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints

Timofey Tomashevskiy

arxiv: 2605.18842 · v1 · pith:YF4JPHQGnew · submitted 2026-05-13 · 💻 cs.LG

Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints

Timofey Tomashevskiy This is my paper

Pith reviewed 2026-05-20 21:31 UTC · model grok-4.3

classification 💻 cs.LG

keywords safe reinforcement learningcontinual learningnonstationarityadaptive constraintsdistribution shiftsafety violationsdriving environments

0 comments

The pith

Adaptive safety constraints reduce violations under distribution shift while keeping task performance competitive.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes LILAC+, a framework for safe continual reinforcement learning that adapts safety constraints as environments change over time. Standard methods assume fixed constraints or stable conditions, which lead to more violations when conditions shift. LILAC+ combines context-based adjustments using inferred environmental predictions, tightening rules when change outpaces safe adaptation, and converting overall safety budgets into immediate state-level limits. Experiments in simulated driving tasks under stationary, seen nonstationary, and unseen nonstationary conditions show fewer safety violations than unconstrained or fixed-constraint baselines. This matters for applications like autonomous driving where weather, traffic, or road conditions vary continuously.

Core claim

LILAC+ integrates three mechanisms for proactive and reactive safety adaptation in nonstationary continual RL: context-based safety constraints that adjust requirements using inferred and predicted environmental context, adaptation-speed constraints that tighten when the rate of change exceeds the agent's safe adaptation ability, and budget-to-state safety enforcement that turns cumulative safety requirements into local state-level control constraints enforceable at decision time. In simulated driving environments, these mechanisms substantially reduce safety violations under distribution shift while maintaining competitive task performance compared with unconstrained and fixed-constraint RL

What carries the argument

The LILAC+ framework with its three adaptive safety mechanisms: context-based constraints, adaptation-speed constraints, and budget-to-state enforcement.

If this is right

Safety mechanisms can respond to predicted future conditions rather than only current observations.
Tighter constraints activate automatically when environmental change accelerates.
Cumulative safety budgets translate directly into per-step control limits.
The same framework handles both seen and unseen shifts in simulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may generalize to other safety-critical continual RL settings such as robotics under varying loads or sensor degradation.
Poor context prediction would likely erode the reported safety gains, suggesting a need for robust inference modules.
Real-world validation beyond driving simulators would test whether the adaptive rules transfer when shifts arise from unmodeled factors.

Load-bearing premise

Inferred and predicted environmental context is accurate enough to usefully adjust safety requirements, and the simulated nonstationary driving tasks represent the distribution shifts that matter in practice.

What would settle it

A nonstationary driving simulation in which context inference is made inaccurate or unavailable, resulting in no reduction or an increase in safety violations compared with fixed-constraint baselines.

Figures

Figures reproduced from arXiv: 2605.18842 by Timofey Tomashevskiy.

**Figure 2.** Figure 2: Experimental evidence for adaptive safety constraints. LILAC+ reduces violations in [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Safe reinforcement learning in nonstationary environments requires safety mechanisms that adapt as environmental conditions change. Standard safe reinforcement learning methods often assume fixed constraints or stable environmental conditions, which can become inadequate under distribution shift. We propose LILAC+, a framework for safe continual reinforcement learning under nonstationarity that combines three adaptive safety mechanisms: context-based safety constraints, adaptation-speed constraints, and budget-to-state safety enforcement. Context-based constraints adjust safety requirements using inferred and predicted environmental context. Adaptation-speed constraints tighten safety requirements when the rate of environmental change exceeds the agent's ability to adapt safely. Budget-to-state enforcement converts cumulative safety requirements into local state-level control constraints that can be enforced at decision time. Together, these mechanisms provide a unified approach for proactive and reactive safety adaptation in continual reinforcement learning. We evaluate the framework in simulated driving environments under stationary, seen nonstationary, and unseen nonstationary conditions. The results show that adaptive safety constraints substantially reduce safety violations under distribution shift while maintaining competitive task performance compared with unconstrained and fixed-constraint baselines. These findings suggest that safe continual reinforcement learning requires adaptive constraint mechanisms that respond not only to current state information but also to predicted environmental context, adaptation demand, and remaining safety budget.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LILAC+ combines context prediction, change-rate tightening, and budget mapping into one adaptive safety setup for continual RL, but the experiments stay too high-level to judge the gains.

read the letter

The main takeaway is that this paper puts forward LILAC+, a framework that tries to keep safety constraints useful when the environment keeps shifting. It does this by layering three pieces: inferring and predicting context to loosen or tighten rules, slowing adaptation when change outpaces the agent, and turning leftover safety budget into per-state limits that can be checked at each step. That combination is the concrete new element, even though the individual ideas have roots in earlier safe RL and nonstationary work. The abstract makes a clear case that fixed constraints either get violated or become too conservative once distribution shift starts, and the three mechanisms are meant to handle both prediction and reaction without waiting for violations to pile up. The driving simulations under stationary, seen-shift, and unseen-shift conditions are a reasonable testbed for the claim. Comparing against unconstrained and fixed-constraint baselines is the right baseline choice. The results are described as cutting violations while holding task performance steady, which would be useful if it holds up in the numbers. The soft spot is that the writeup gives almost no quantitative detail—no violation counts, no performance deltas, no error bars, and no mention of how the context predictor was trained or tested. That leaves the central claim hard to evaluate. The stress-test concern also lands: if context inference is off by even a moderate amount, the adjustments can either under-protect or over-constrain, and the paper does not appear to run ablations with noisy or biased predictors. The simulated shifts may also be cleaner than the messier drifts that show up in real driving data. This work is aimed at people already working on safe continual RL who need concrete mechanisms rather than high-level warnings. A reader who wants to extend adaptive constraint ideas would find usable building blocks here. I would send it to peer review. The problem it targets is real for deployment, the approach is a fresh assembly of existing parts, and the thinking is straightforward enough that referees can give targeted feedback on the missing robustness checks and numbers.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes LILAC+, a framework for safe continual reinforcement learning under nonstationarity. It integrates three adaptive safety mechanisms: context-based constraints that adjust requirements using inferred and predicted environmental context, adaptation-speed constraints that tighten safety when environmental change rate exceeds safe adaptation capacity, and budget-to-state enforcement that maps cumulative safety budgets into enforceable local state constraints. The approach is evaluated in simulated driving environments under stationary, seen-nonstationary, and unseen-nonstationary conditions, claiming substantial reductions in safety violations under distribution shift while preserving competitive task performance relative to unconstrained and fixed-constraint baselines.

Significance. If the results prove robust, the work addresses a practically relevant gap in safe RL by moving beyond fixed constraints to mechanisms that respond to predicted context, adaptation demand, and remaining budget. This unified proactive-reactive design is a conceptual strength for continual learning settings. The evaluation across multiple nonstationarity regimes is a positive step toward more realistic testing. However, the central empirical claim depends on accurate context inference, and the absence of targeted robustness checks limits the strength of the contribution.

major comments (2)

[Evaluation] Evaluation section: the central claim that adaptive constraints substantially reduce safety violations under distribution shift rests on accurate context inference and prediction, yet no ablations are reported that inject noise or bias into the context estimator to measure degradation in performance or safety; without this, it is unclear whether the reported gains hold when inference error is non-negligible, as the skeptic concern notes.
[Abstract] Abstract and results summary: the positive outcomes under seen and unseen nonstationary conditions are stated without accompanying quantitative values, error bars, baseline specifics, or statistical tests, which prevents assessment of effect size and undermines verification of the claim that task performance remains competitive.

minor comments (2)

Define the acronym LILAC+ at its first appearance and clarify whether the '+' denotes an extension of a prior method.
[Results] Results figures should include error bars and indicate whether differences from baselines are statistically significant.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the empirical support and clarity of the results.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the central claim that adaptive constraints substantially reduce safety violations under distribution shift rests on accurate context inference and prediction, yet no ablations are reported that inject noise or bias into the context estimator to measure degradation in performance or safety; without this, it is unclear whether the reported gains hold when inference error is non-negligible, as the skeptic concern notes.

Authors: We agree that robustness to imperfect context inference is critical for the central claim. The current evaluation assumes a context estimator operating at the accuracy levels described in Section 4.2. To directly address this concern, we will add new ablation experiments that inject controlled noise and bias into the context predictions and report the resulting changes in safety violations and task performance across the nonstationary regimes. revision: yes
Referee: [Abstract] Abstract and results summary: the positive outcomes under seen and unseen nonstationary conditions are stated without accompanying quantitative values, error bars, baseline specifics, or statistical tests, which prevents assessment of effect size and undermines verification of the claim that task performance remains competitive.

Authors: We acknowledge that the abstract and high-level summary currently omit specific numbers. In the revision we will update both the abstract and the results summary to include quantitative effect sizes (e.g., mean reduction in safety violations with standard errors), explicit baseline names, and references to the statistical tests reported in the main evaluation tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity; methods defined independently and evaluated on separate simulated tasks

full rationale

The paper introduces LILAC+ with three explicitly defined adaptive mechanisms (context-based constraints, adaptation-speed constraints, budget-to-state enforcement) and evaluates them on distinct simulated driving scenarios under stationary, seen-nonstationary, and unseen-nonstationary conditions. No step reduces a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction; the central performance claims rest on empirical comparisons against unconstrained and fixed-constraint baselines rather than on any internal redefinition or load-bearing self-reference. The evaluation protocol uses held-out distribution shifts, satisfying the criterion for independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the assumption that context can be reliably inferred and predicted and that the three constraint types can be combined without introducing new instabilities; no free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption Environmental context can be inferred and predicted from observations with sufficient accuracy to adjust safety constraints usefully.
Central to the context-based safety constraints described in the abstract.

invented entities (1)

LILAC+ framework no independent evidence
purpose: Unified adaptive safety mechanism for continual RL under nonstationarity
Newly proposed combination of three constraint types.

pith-pipeline@v0.9.0 · 5740 in / 1354 out tokens · 42022 ms · 2026-05-20T21:31:23.965630+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LILAC+ constructs three adaptive constraint mechanisms: context-based (CB) constraints, adaptation-speed (AS) constraints, and soft-to-hard (SH) constraints... ρt = vreq_t / vcap_t

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

134 extracted references · 134 canonical work pages · 6 internal anchors

[1]

doi:10.1109/ms.2018.4321239 , number =

David Lorge Parnas , title =. doi:10.1109/ms.2018.4321239 , number =

work page doi:10.1109/ms.2018.4321239 2018
[2]

1999 , publisher =

Constrained Markov decision processes , author =. 1999 , publisher =

work page 1999
[3]

2018 , publisher =

Reinforcement learning: An introduction , author =. 2018 , publisher =

work page 2018
[4]

arXiv preprint arXiv:2004.07584 , year =

Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions , author =. arXiv preprint arXiv:2004.07584 , year =

work page arXiv 2004
[5]

2016 IEEE 55th Conference on Decision and Control (CDC) , pages =

Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes , author =. 2016 IEEE 55th Conference on Decision and Control (CDC) , pages =. 2016 , organization =

work page 2016
[6]

arXiv preprint arXiv:2205.10330 , year =

A review of safe reinforcement learning: Methods, theory and applications , author =. arXiv preprint arXiv:2205.10330 , year =

work page arXiv
[7]

arXiv preprint arXiv:2006.10701 , year =

Deep reinforcement learning amidst lifelong non-stationarity , author =. arXiv preprint arXiv:2006.10701 , year =

work page arXiv 2006
[8]

International Conference on Machine Learning (ICML) , year =

Constrained Policy Optimization , author =. International Conference on Machine Learning (ICML) , year =

work page
[9]

Risk-Sensitive and Robust Decision-Making: a

Chow, Yinlam and Ghavamzadeh, Mohammad and Janson, Lucas and Pavone, Marco , booktitle =. Risk-Sensitive and Robust Decision-Making: a

work page
[10]

A uniform estimate for general quaternionic Calabi problem (with appendix by Daniel Barlet)

Policy Gradient for Coherent Risk Measures , author =. arXiv preprint arXiv:1502.02267 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Safe Model-based Reinforcement Learning with Stability Guarantees , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[12]

1998 , publisher =

Reinforcement learning: An introduction , author =. 1998 , publisher =

work page 1998
[13]

2015 European Control Conference (ECC) , pages =

Safe and robust learning control with Gaussian processes , author =. 2015 European Control Conference (ECC) , pages =. 2015 , organization =

work page 2015
[14]

International Conference on Machine Learning , pages =

Robust multi-objective bayesian optimization under input noise , author =. International Conference on Machine Learning , pages =. 2022 , organization =

work page 2022
[15]

Neural networks , volume =

Continual lifelong learning with neural networks: A review , author =. Neural networks , volume =. 2019 , publisher =

work page 2019
[16]

Automated machine learning: methods, systems, challenges , pages =

Meta-learning , author =. Automated machine learning: methods, systems, challenges , pages =. 2019 , publisher =

work page 2019
[17]

International conference on machine learning , pages =

Pac-inspired option discovery in lifelong reinforcement learning , author =. International conference on machine learning , pages =. 2014 , organization =

work page 2014
[18]

International conference on machine learning , pages =

Policy and value transfer in lifelong reinforcement learning , author =. International conference on machine learning , pages =. 2018 , organization =

work page 2018
[19]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Lifelong learning with a changing action set , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

work page
[20]

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

Continuous adaptation via meta-learning in nonstationary and competitive environments , author =. arXiv preprint arXiv:1710.03641 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Provably efficient primal-dual reinforcement learning for cmdps with non-stationary objectives and constraints , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

work page
[22]

Advances in Neural Information Processing Systems , volume =

Towards safe policy improvement for non-stationary MDPs , author =. Advances in Neural Information Processing Systems , volume =

work page
[23]

arXiv preprint arXiv:2003.00660 , year =

Upper confidence primal-dual optimization: Stochastically constrained markov decision processes with adversarial losses and unknown transitions , author =. arXiv preprint arXiv:2003.00660 , year =

work page arXiv 2003
[24]

International Conference on Machine Learning , pages =

Optimizing for the future in non-stationary mdps , author =. International Conference on Machine Learning , pages =. 2020 , organization =

work page 2020
[25]

International Conference on Machine Learning , pages =

Safe policy search for lifelong reinforcement learning with sublinear regret , author =. International Conference on Machine Learning , pages =. 2015 , organization =

work page 2015
[26]

International Conference on Artificial Intelligence and Statistics , pages =

Provably efficient model-free algorithms for non-stationary cmdps , author =. International Conference on Artificial Intelligence and Statistics , pages =. 2023 , organization =

work page 2023
[27]

2021 IEEE International Conference on Robotics and Automation (ICRA) , pages =

Context-aware safe reinforcement learning for non-stationary environments , author =. 2021 IEEE International Conference on Robotics and Automation (ICRA) , pages =. 2021 , organization =

work page 2021
[28]

2022 IEEE 61st Conference on Decision and Control (CDC) , pages =

Finite-time complexity of online primal-dual natural actor-critic algorithm for constrained Markov decision processes , author =. 2022 IEEE 61st Conference on Decision and Control (CDC) , pages =. 2022 , organization =

work page 2022
[29]

arXiv preprint arXiv:2405.16601 , year =

A CMDP-within-online framework for meta-safe reinforcement learning , author =. arXiv preprint arXiv:2405.16601 , year =

work page arXiv
[30]

arXiv preprint arXiv:2111.00552 , year =

Policy optimization for constrained mdps with provable fast global convergence , author =. arXiv preprint arXiv:2111.00552 , year =

work page arXiv
[31]

Machine Learning , volume =

All-time safety and sample-efficient meta update for online safe meta reinforcement learning under Markov task transition , author =. Machine Learning , volume =. 2025 , publisher =

work page 2025
[32]

IEEE Journal of Selected Topics in Signal Processing , volume =

Online convex optimization in dynamic environments , author =. IEEE Journal of Selected Topics in Signal Processing , volume =. 2015 , publisher =

work page 2015
[33]

Proceedings of the AAAI conference on artificial intelligence , volume =

Safe online convex optimization with unknown linear safety constraints , author =. Proceedings of the AAAI conference on artificial intelligence , volume =

work page
[34]

IEEE Transactions on Cybernetics , volume =

Adaptive safe reinforcement learning with full-state constraints and constrained adaptation for autonomous vehicles , author =. IEEE Transactions on Cybernetics , volume =. 2023 , publisher =

work page 2023
[35]

International Conference on Machine Learning , pages =

Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments , author =. International Conference on Machine Learning , pages =. 2023 , organization =

work page 2023
[36]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

A review of safe reinforcement learning: Methods, theories and applications , author =. IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

work page
[37]

The 30th international joint conference on artificial intelligence (ijcai) , year =

Policy learning with constraints in model-free reinforcement learning: A survey , author =. The 30th international joint conference on artificial intelligence (ijcai) , year =

work page
[38]

arXiv preprint arXiv:2402.02025 , year =

A survey of constraint formulations in safe reinforcement learning , author =. arXiv preprint arXiv:2402.02025 , year =

work page arXiv
[39]

ACM Computing Surveys (CSUR) , volume =

A survey of reinforcement learning algorithms for dynamically varying environments , author =. ACM Computing Surveys (CSUR) , volume =. 2021 , publisher =

work page 2021
[40]

Journal of Artificial Intelligence Research , volume =

Towards continual reinforcement learning: A review and perspectives , author =. Journal of Artificial Intelligence Research , volume =

work page
[41]

Machine Learning , volume =

A taxonomy for similarity metrics between markov decision processes , author =. Machine Learning , volume =. 2022 , publisher =

work page 2022
[42]

Proceedings of the AAAI conference on artificial intelligence , volume =

Safe reinforcement learning via shielding under partial observability , author =. Proceedings of the AAAI conference on artificial intelligence , volume =

work page
[43]

Machine learning , volume =

Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , author =. Machine learning , volume =. 2023 , publisher =

work page 2023
[44]

2022 , school =

Reinforcement Learning for Non-stationary problems , author =. 2022 , school =

work page 2022
[45]

Learning to reinforcement learn

Learning to reinforcement learn , author =. arXiv preprint arXiv:1611.05763 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[46]

System Modeling and Optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31--September 4, 1981 , pages =

The Bayesian approach to global optimization , author =. System Modeling and Optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31--September 4, 1981 , pages =. 2005 , organization =

work page 1981
[47]

Foundations and Trends

Bayesian reinforcement learning: A survey , author =. Foundations and Trends. 2015 , publisher =

work page 2015
[48]

Journal of Global Optimization , volume =

Bayesian heuristic approach to global optimization and examples , author =. Journal of Global Optimization , volume =. 2002 , publisher =

work page 2002
[49]

2021 , publisher =

Constrained Markov decision processes , author =. 2021 , publisher =

work page 2021
[50]

IEEE Transactions on Automatic Control , volume =

Risk-constrained Markov decision processes , author =. IEEE Transactions on Automatic Control , volume =. 2014 , publisher =

work page 2014
[51]

International Conference on Machine Learning , pages =

Safe reinforcement learning in constrained markov decision processes , author =. International Conference on Machine Learning , pages =. 2020 , organization =

work page 2020
[52]

International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning , pages =

Safe learning and optimization techniques: Towards a survey of the state of the art , author =. International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning , pages =. 2020 , organization =

work page 2020
[53]

IEEE transactions on pattern analysis and machine intelligence , volume =

Meta-learning in neural networks: A survey , author =. IEEE transactions on pattern analysis and machine intelligence , volume =. 2021 , publisher =

work page 2021
[54]

Annual Review of Control, Robotics, and Autonomous Systems , volume =

Safe learning in robotics: From learning-based control to safe reinforcement learning , author =. Annual Review of Control, Robotics, and Autonomous Systems , volume =. 2022 , publisher =

work page 2022
[55]

Journal of mathematics and mechanics , pages =

A Markovian decision process , author =. Journal of mathematics and mechanics , pages =. 1957 , publisher =

work page 1957
[56]

Journal of mathematical analysis and applications , volume =

Optimal control of Markov processes with incomplete state information I , author =. Journal of mathematical analysis and applications , volume =. 1965 , publisher =

work page 1965
[57]

IJCAI: proceedings of the conference , volume =

Hidden parameter markov decision processes: A semiparametric regression approach for discovering latent task parametrizations , author =. IJCAI: proceedings of the conference , volume =

work page
[58]

Sequence learning: paradigms, algorithms, and applications , pages =

Hidden-mode markov decision processes for nonstationary sequential decision making , author =. Sequence learning: paradigms, algorithms, and applications , pages =. 2001 , publisher =

work page 2001
[59]

Artificial intelligence , volume =

Planning and acting in partially observable stochastic domains , author =. Artificial intelligence , volume =. 1998 , publisher =

work page 1998
[60]

Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems , pages =

Improving reinforcement learning with context detection , author =. Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems , pages =

work page
[61]

Handbooks in operations research and management science , volume =

Markov decision processes , author =. Handbooks in operations research and management science , volume =. 1990 , publisher =

work page 1990
[62]

Advances in neural information processing systems , volume =

Bayes-adaptive pomdps , author =. Advances in neural information processing systems , volume =

work page
[63]

2002 , publisher =

Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes , author =. 2002 , publisher =

work page 2002
[64]

2019 , school =

Safe exploration in reinforcement learning: Theory and applications in robotics , author =. 2019 , school =

work page 2019
[65]

International workshop on hybrid systems: Computation and control , pages =

Safety verification of hybrid systems using barrier certificates , author =. International workshop on hybrid systems: Computation and control , pages =. 2004 , organization =

work page 2004
[66]

Mitigating Distribution Shifts: Uncertainty-Aware Offline-to-Online Reinforcement Learning , author =

work page
[67]

2019 American Control Conference (ACC) , pages =

Safety-aware reinforcement learning framework with an actor-critic-barrier structure , author =. 2019 American Control Conference (ACC) , pages =. 2019 , organization =

work page 2019
[68]

IEEE Transactions on robotics , volume =

Barrier-certified adaptive reinforcement learning with applications to brushbot navigation , author =. IEEE Transactions on robotics , volume =. 2019 , publisher =

work page 2019
[69]

International conference on machine learning , pages =

Model-agnostic meta-learning for fast adaptation of deep networks , author =. International conference on machine learning , pages =. 2017 , organization =

work page 2017
[70]

International conference on machine learning , pages =

Online meta-learning , author =. International conference on machine learning , pages =. 2019 , organization =

work page 2019
[71]

Foundations and Trends

Introduction to online convex optimization , author =. Foundations and Trends. 2016 , publisher =

work page 2016
[72]

Advances in Neural Information Processing Systems , volume =

Meta-reinforcement learning with universal policy adaptation: Provable near-optimality under all-task optimum comparator , author =. Advances in Neural Information Processing Systems , volume =

work page
[73]

International Conference on Machine Learning , pages =

Memory efficient online meta learning , author =. International Conference on Machine Learning , pages =. 2021 , organization =

work page 2021
[74]

Advances in neural information processing systems , volume =

Adaptive gradient-based meta-learning methods , author =. Advances in neural information processing systems , volume =

work page
[75]

International Conference on Machine Learning , pages =

Crpo: A new approach for safe reinforcement learning with convergence guarantee , author =. International Conference on Machine Learning , pages =. 2021 , organization =

work page 2021
[76]

International conference on machine learning , pages =

Efficient off-policy meta-reinforcement learning via probabilistic context variables , author =. International conference on machine learning , pages =. 2019 , organization =

work page 2019
[77]

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

Improved Robustness and Safety for Pre-Adaptation of Meta Reinforcement Learning with Prior Regularization , author =. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =. 2022 , organization =

work page 2022
[78]

2.3 softmax units for multinoulli output distributions , author =

6.2. 2.3 softmax units for multinoulli output distributions , author =. Deep learning , volume =. 2016 , publisher =

work page 2016
[79]

Towards safe reinforcement learning via constraining con- ditional value-at-risk.arXiv preprint arXiv:2206.04436,

Towards safe reinforcement learning via constraining conditional value-at-risk , author =. arXiv preprint arXiv:2206.04436 , year =

work page arXiv
[80]

Journal of Advances in Information Fusion , volume =

A constrained POMDP formulation and algorithmic solution for radar resource management in multi-target tracking , author =. Journal of Advances in Information Fusion , volume =. 2021 , publisher =

work page 2021

Showing first 80 references.

[1] [1]

doi:10.1109/ms.2018.4321239 , number =

David Lorge Parnas , title =. doi:10.1109/ms.2018.4321239 , number =

work page doi:10.1109/ms.2018.4321239 2018

[2] [2]

1999 , publisher =

Constrained Markov decision processes , author =. 1999 , publisher =

work page 1999

[3] [3]

2018 , publisher =

Reinforcement learning: An introduction , author =. 2018 , publisher =

work page 2018

[4] [4]

arXiv preprint arXiv:2004.07584 , year =

Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions , author =. arXiv preprint arXiv:2004.07584 , year =

work page arXiv 2004

[5] [5]

2016 IEEE 55th Conference on Decision and Control (CDC) , pages =

Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes , author =. 2016 IEEE 55th Conference on Decision and Control (CDC) , pages =. 2016 , organization =

work page 2016

[6] [6]

arXiv preprint arXiv:2205.10330 , year =

A review of safe reinforcement learning: Methods, theory and applications , author =. arXiv preprint arXiv:2205.10330 , year =

work page arXiv

[7] [7]

arXiv preprint arXiv:2006.10701 , year =

Deep reinforcement learning amidst lifelong non-stationarity , author =. arXiv preprint arXiv:2006.10701 , year =

work page arXiv 2006

[8] [8]

International Conference on Machine Learning (ICML) , year =

Constrained Policy Optimization , author =. International Conference on Machine Learning (ICML) , year =

work page

[9] [9]

Risk-Sensitive and Robust Decision-Making: a

Chow, Yinlam and Ghavamzadeh, Mohammad and Janson, Lucas and Pavone, Marco , booktitle =. Risk-Sensitive and Robust Decision-Making: a

work page

[10] [10]

A uniform estimate for general quaternionic Calabi problem (with appendix by Daniel Barlet)

Policy Gradient for Coherent Risk Measures , author =. arXiv preprint arXiv:1502.02267 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Safe Model-based Reinforcement Learning with Stability Guarantees , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[12] [12]

1998 , publisher =

Reinforcement learning: An introduction , author =. 1998 , publisher =

work page 1998

[13] [13]

2015 European Control Conference (ECC) , pages =

Safe and robust learning control with Gaussian processes , author =. 2015 European Control Conference (ECC) , pages =. 2015 , organization =

work page 2015

[14] [14]

International Conference on Machine Learning , pages =

Robust multi-objective bayesian optimization under input noise , author =. International Conference on Machine Learning , pages =. 2022 , organization =

work page 2022

[15] [15]

Neural networks , volume =

Continual lifelong learning with neural networks: A review , author =. Neural networks , volume =. 2019 , publisher =

work page 2019

[16] [16]

Automated machine learning: methods, systems, challenges , pages =

Meta-learning , author =. Automated machine learning: methods, systems, challenges , pages =. 2019 , publisher =

work page 2019

[17] [17]

International conference on machine learning , pages =

Pac-inspired option discovery in lifelong reinforcement learning , author =. International conference on machine learning , pages =. 2014 , organization =

work page 2014

[18] [18]

International conference on machine learning , pages =

Policy and value transfer in lifelong reinforcement learning , author =. International conference on machine learning , pages =. 2018 , organization =

work page 2018

[19] [19]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Lifelong learning with a changing action set , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

work page

[20] [20]

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

Continuous adaptation via meta-learning in nonstationary and competitive environments , author =. arXiv preprint arXiv:1710.03641 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Provably efficient primal-dual reinforcement learning for cmdps with non-stationary objectives and constraints , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

work page

[22] [22]

Advances in Neural Information Processing Systems , volume =

Towards safe policy improvement for non-stationary MDPs , author =. Advances in Neural Information Processing Systems , volume =

work page

[23] [23]

arXiv preprint arXiv:2003.00660 , year =

Upper confidence primal-dual optimization: Stochastically constrained markov decision processes with adversarial losses and unknown transitions , author =. arXiv preprint arXiv:2003.00660 , year =

work page arXiv 2003

[24] [24]

International Conference on Machine Learning , pages =

Optimizing for the future in non-stationary mdps , author =. International Conference on Machine Learning , pages =. 2020 , organization =

work page 2020

[25] [25]

International Conference on Machine Learning , pages =

Safe policy search for lifelong reinforcement learning with sublinear regret , author =. International Conference on Machine Learning , pages =. 2015 , organization =

work page 2015

[26] [26]

International Conference on Artificial Intelligence and Statistics , pages =

Provably efficient model-free algorithms for non-stationary cmdps , author =. International Conference on Artificial Intelligence and Statistics , pages =. 2023 , organization =

work page 2023

[27] [27]

2021 IEEE International Conference on Robotics and Automation (ICRA) , pages =

Context-aware safe reinforcement learning for non-stationary environments , author =. 2021 IEEE International Conference on Robotics and Automation (ICRA) , pages =. 2021 , organization =

work page 2021

[28] [28]

2022 IEEE 61st Conference on Decision and Control (CDC) , pages =

Finite-time complexity of online primal-dual natural actor-critic algorithm for constrained Markov decision processes , author =. 2022 IEEE 61st Conference on Decision and Control (CDC) , pages =. 2022 , organization =

work page 2022

[29] [29]

arXiv preprint arXiv:2405.16601 , year =

A CMDP-within-online framework for meta-safe reinforcement learning , author =. arXiv preprint arXiv:2405.16601 , year =

work page arXiv

[30] [30]

arXiv preprint arXiv:2111.00552 , year =

Policy optimization for constrained mdps with provable fast global convergence , author =. arXiv preprint arXiv:2111.00552 , year =

work page arXiv

[31] [31]

Machine Learning , volume =

All-time safety and sample-efficient meta update for online safe meta reinforcement learning under Markov task transition , author =. Machine Learning , volume =. 2025 , publisher =

work page 2025

[32] [32]

IEEE Journal of Selected Topics in Signal Processing , volume =

Online convex optimization in dynamic environments , author =. IEEE Journal of Selected Topics in Signal Processing , volume =. 2015 , publisher =

work page 2015

[33] [33]

Proceedings of the AAAI conference on artificial intelligence , volume =

Safe online convex optimization with unknown linear safety constraints , author =. Proceedings of the AAAI conference on artificial intelligence , volume =

work page

[34] [34]

IEEE Transactions on Cybernetics , volume =

Adaptive safe reinforcement learning with full-state constraints and constrained adaptation for autonomous vehicles , author =. IEEE Transactions on Cybernetics , volume =. 2023 , publisher =

work page 2023

[35] [35]

International Conference on Machine Learning , pages =

Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments , author =. International Conference on Machine Learning , pages =. 2023 , organization =

work page 2023

[36] [36]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

A review of safe reinforcement learning: Methods, theories and applications , author =. IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

work page

[37] [37]

The 30th international joint conference on artificial intelligence (ijcai) , year =

Policy learning with constraints in model-free reinforcement learning: A survey , author =. The 30th international joint conference on artificial intelligence (ijcai) , year =

work page

[38] [38]

arXiv preprint arXiv:2402.02025 , year =

A survey of constraint formulations in safe reinforcement learning , author =. arXiv preprint arXiv:2402.02025 , year =

work page arXiv

[39] [39]

ACM Computing Surveys (CSUR) , volume =

A survey of reinforcement learning algorithms for dynamically varying environments , author =. ACM Computing Surveys (CSUR) , volume =. 2021 , publisher =

work page 2021

[40] [40]

Journal of Artificial Intelligence Research , volume =

Towards continual reinforcement learning: A review and perspectives , author =. Journal of Artificial Intelligence Research , volume =

work page

[41] [41]

Machine Learning , volume =

A taxonomy for similarity metrics between markov decision processes , author =. Machine Learning , volume =. 2022 , publisher =

work page 2022

[42] [42]

Proceedings of the AAAI conference on artificial intelligence , volume =

Safe reinforcement learning via shielding under partial observability , author =. Proceedings of the AAAI conference on artificial intelligence , volume =

work page

[43] [43]

Machine learning , volume =

Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , author =. Machine learning , volume =. 2023 , publisher =

work page 2023

[44] [44]

2022 , school =

Reinforcement Learning for Non-stationary problems , author =. 2022 , school =

work page 2022

[45] [45]

Learning to reinforcement learn

Learning to reinforcement learn , author =. arXiv preprint arXiv:1611.05763 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[46] [46]

System Modeling and Optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31--September 4, 1981 , pages =

The Bayesian approach to global optimization , author =. System Modeling and Optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31--September 4, 1981 , pages =. 2005 , organization =

work page 1981

[47] [47]

Foundations and Trends

Bayesian reinforcement learning: A survey , author =. Foundations and Trends. 2015 , publisher =

work page 2015

[48] [48]

Journal of Global Optimization , volume =

Bayesian heuristic approach to global optimization and examples , author =. Journal of Global Optimization , volume =. 2002 , publisher =

work page 2002

[49] [49]

2021 , publisher =

Constrained Markov decision processes , author =. 2021 , publisher =

work page 2021

[50] [50]

IEEE Transactions on Automatic Control , volume =

Risk-constrained Markov decision processes , author =. IEEE Transactions on Automatic Control , volume =. 2014 , publisher =

work page 2014

[51] [51]

International Conference on Machine Learning , pages =

Safe reinforcement learning in constrained markov decision processes , author =. International Conference on Machine Learning , pages =. 2020 , organization =

work page 2020

[52] [52]

International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning , pages =

Safe learning and optimization techniques: Towards a survey of the state of the art , author =. International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning , pages =. 2020 , organization =

work page 2020

[53] [53]

IEEE transactions on pattern analysis and machine intelligence , volume =

Meta-learning in neural networks: A survey , author =. IEEE transactions on pattern analysis and machine intelligence , volume =. 2021 , publisher =

work page 2021

[54] [54]

Annual Review of Control, Robotics, and Autonomous Systems , volume =

Safe learning in robotics: From learning-based control to safe reinforcement learning , author =. Annual Review of Control, Robotics, and Autonomous Systems , volume =. 2022 , publisher =

work page 2022

[55] [55]

Journal of mathematics and mechanics , pages =

A Markovian decision process , author =. Journal of mathematics and mechanics , pages =. 1957 , publisher =

work page 1957

[56] [56]

Journal of mathematical analysis and applications , volume =

Optimal control of Markov processes with incomplete state information I , author =. Journal of mathematical analysis and applications , volume =. 1965 , publisher =

work page 1965

[57] [57]

IJCAI: proceedings of the conference , volume =

Hidden parameter markov decision processes: A semiparametric regression approach for discovering latent task parametrizations , author =. IJCAI: proceedings of the conference , volume =

work page

[58] [58]

Sequence learning: paradigms, algorithms, and applications , pages =

Hidden-mode markov decision processes for nonstationary sequential decision making , author =. Sequence learning: paradigms, algorithms, and applications , pages =. 2001 , publisher =

work page 2001

[59] [59]

Artificial intelligence , volume =

Planning and acting in partially observable stochastic domains , author =. Artificial intelligence , volume =. 1998 , publisher =

work page 1998

[60] [60]

Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems , pages =

Improving reinforcement learning with context detection , author =. Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems , pages =

work page

[61] [61]

Handbooks in operations research and management science , volume =

Markov decision processes , author =. Handbooks in operations research and management science , volume =. 1990 , publisher =

work page 1990

[62] [62]

Advances in neural information processing systems , volume =

Bayes-adaptive pomdps , author =. Advances in neural information processing systems , volume =

work page

[63] [63]

2002 , publisher =

Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes , author =. 2002 , publisher =

work page 2002

[64] [64]

2019 , school =

Safe exploration in reinforcement learning: Theory and applications in robotics , author =. 2019 , school =

work page 2019

[65] [65]

International workshop on hybrid systems: Computation and control , pages =

Safety verification of hybrid systems using barrier certificates , author =. International workshop on hybrid systems: Computation and control , pages =. 2004 , organization =

work page 2004

[66] [66]

Mitigating Distribution Shifts: Uncertainty-Aware Offline-to-Online Reinforcement Learning , author =

work page

[67] [67]

2019 American Control Conference (ACC) , pages =

Safety-aware reinforcement learning framework with an actor-critic-barrier structure , author =. 2019 American Control Conference (ACC) , pages =. 2019 , organization =

work page 2019

[68] [68]

IEEE Transactions on robotics , volume =

Barrier-certified adaptive reinforcement learning with applications to brushbot navigation , author =. IEEE Transactions on robotics , volume =. 2019 , publisher =

work page 2019

[69] [69]

International conference on machine learning , pages =

Model-agnostic meta-learning for fast adaptation of deep networks , author =. International conference on machine learning , pages =. 2017 , organization =

work page 2017

[70] [70]

International conference on machine learning , pages =

Online meta-learning , author =. International conference on machine learning , pages =. 2019 , organization =

work page 2019

[71] [71]

Foundations and Trends

Introduction to online convex optimization , author =. Foundations and Trends. 2016 , publisher =

work page 2016

[72] [72]

Advances in Neural Information Processing Systems , volume =

Meta-reinforcement learning with universal policy adaptation: Provable near-optimality under all-task optimum comparator , author =. Advances in Neural Information Processing Systems , volume =

work page

[73] [73]

International Conference on Machine Learning , pages =

Memory efficient online meta learning , author =. International Conference on Machine Learning , pages =. 2021 , organization =

work page 2021

[74] [74]

Advances in neural information processing systems , volume =

Adaptive gradient-based meta-learning methods , author =. Advances in neural information processing systems , volume =

work page

[75] [75]

International Conference on Machine Learning , pages =

Crpo: A new approach for safe reinforcement learning with convergence guarantee , author =. International Conference on Machine Learning , pages =. 2021 , organization =

work page 2021

[76] [76]

International conference on machine learning , pages =

Efficient off-policy meta-reinforcement learning via probabilistic context variables , author =. International conference on machine learning , pages =. 2019 , organization =

work page 2019

[77] [77]

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

Improved Robustness and Safety for Pre-Adaptation of Meta Reinforcement Learning with Prior Regularization , author =. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =. 2022 , organization =

work page 2022

[78] [78]

2.3 softmax units for multinoulli output distributions , author =

6.2. 2.3 softmax units for multinoulli output distributions , author =. Deep learning , volume =. 2016 , publisher =

work page 2016

[79] [79]

Towards safe reinforcement learning via constraining con- ditional value-at-risk.arXiv preprint arXiv:2206.04436,

Towards safe reinforcement learning via constraining conditional value-at-risk , author =. arXiv preprint arXiv:2206.04436 , year =

work page arXiv

[80] [80]

Journal of Advances in Information Fusion , volume =

A constrained POMDP formulation and algorithmic solution for radar resource management in multi-target tracking , author =. Journal of Advances in Information Fusion , volume =. 2021 , publisher =

work page 2021