pith. sign in

arxiv: 2605.18842 · v1 · pith:YF4JPHQGnew · submitted 2026-05-13 · 💻 cs.LG

Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints

Pith reviewed 2026-05-20 21:31 UTC · model grok-4.3

classification 💻 cs.LG
keywords safe reinforcement learningcontinual learningnonstationarityadaptive constraintsdistribution shiftsafety violationsdriving environments
0
0 comments X

The pith

Adaptive safety constraints reduce violations under distribution shift while keeping task performance competitive.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes LILAC+, a framework for safe continual reinforcement learning that adapts safety constraints as environments change over time. Standard methods assume fixed constraints or stable conditions, which lead to more violations when conditions shift. LILAC+ combines context-based adjustments using inferred environmental predictions, tightening rules when change outpaces safe adaptation, and converting overall safety budgets into immediate state-level limits. Experiments in simulated driving tasks under stationary, seen nonstationary, and unseen nonstationary conditions show fewer safety violations than unconstrained or fixed-constraint baselines. This matters for applications like autonomous driving where weather, traffic, or road conditions vary continuously.

Core claim

LILAC+ integrates three mechanisms for proactive and reactive safety adaptation in nonstationary continual RL: context-based safety constraints that adjust requirements using inferred and predicted environmental context, adaptation-speed constraints that tighten when the rate of change exceeds the agent's safe adaptation ability, and budget-to-state safety enforcement that turns cumulative safety requirements into local state-level control constraints enforceable at decision time. In simulated driving environments, these mechanisms substantially reduce safety violations under distribution shift while maintaining competitive task performance compared with unconstrained and fixed-constraint RL

What carries the argument

The LILAC+ framework with its three adaptive safety mechanisms: context-based constraints, adaptation-speed constraints, and budget-to-state enforcement.

If this is right

  • Safety mechanisms can respond to predicted future conditions rather than only current observations.
  • Tighter constraints activate automatically when environmental change accelerates.
  • Cumulative safety budgets translate directly into per-step control limits.
  • The same framework handles both seen and unseen shifts in simulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may generalize to other safety-critical continual RL settings such as robotics under varying loads or sensor degradation.
  • Poor context prediction would likely erode the reported safety gains, suggesting a need for robust inference modules.
  • Real-world validation beyond driving simulators would test whether the adaptive rules transfer when shifts arise from unmodeled factors.

Load-bearing premise

Inferred and predicted environmental context is accurate enough to usefully adjust safety requirements, and the simulated nonstationary driving tasks represent the distribution shifts that matter in practice.

What would settle it

A nonstationary driving simulation in which context inference is made inaccurate or unavailable, resulting in no reduction or an increase in safety violations compared with fixed-constraint baselines.

Figures

Figures reproduced from arXiv: 2605.18842 by Timofey Tomashevskiy.

Figure 1
Figure 1. Figure 1: Overview of LILAC+. A shared predictive context pipeline estimates the current context [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Experimental evidence for adaptive safety constraints. LILAC+ reduces violations in [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Safe reinforcement learning in nonstationary environments requires safety mechanisms that adapt as environmental conditions change. Standard safe reinforcement learning methods often assume fixed constraints or stable environmental conditions, which can become inadequate under distribution shift. We propose LILAC+, a framework for safe continual reinforcement learning under nonstationarity that combines three adaptive safety mechanisms: context-based safety constraints, adaptation-speed constraints, and budget-to-state safety enforcement. Context-based constraints adjust safety requirements using inferred and predicted environmental context. Adaptation-speed constraints tighten safety requirements when the rate of environmental change exceeds the agent's ability to adapt safely. Budget-to-state enforcement converts cumulative safety requirements into local state-level control constraints that can be enforced at decision time. Together, these mechanisms provide a unified approach for proactive and reactive safety adaptation in continual reinforcement learning. We evaluate the framework in simulated driving environments under stationary, seen nonstationary, and unseen nonstationary conditions. The results show that adaptive safety constraints substantially reduce safety violations under distribution shift while maintaining competitive task performance compared with unconstrained and fixed-constraint baselines. These findings suggest that safe continual reinforcement learning requires adaptive constraint mechanisms that respond not only to current state information but also to predicted environmental context, adaptation demand, and remaining safety budget.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes LILAC+, a framework for safe continual reinforcement learning under nonstationarity. It integrates three adaptive safety mechanisms: context-based constraints that adjust requirements using inferred and predicted environmental context, adaptation-speed constraints that tighten safety when environmental change rate exceeds safe adaptation capacity, and budget-to-state enforcement that maps cumulative safety budgets into enforceable local state constraints. The approach is evaluated in simulated driving environments under stationary, seen-nonstationary, and unseen-nonstationary conditions, claiming substantial reductions in safety violations under distribution shift while preserving competitive task performance relative to unconstrained and fixed-constraint baselines.

Significance. If the results prove robust, the work addresses a practically relevant gap in safe RL by moving beyond fixed constraints to mechanisms that respond to predicted context, adaptation demand, and remaining budget. This unified proactive-reactive design is a conceptual strength for continual learning settings. The evaluation across multiple nonstationarity regimes is a positive step toward more realistic testing. However, the central empirical claim depends on accurate context inference, and the absence of targeted robustness checks limits the strength of the contribution.

major comments (2)
  1. [Evaluation] Evaluation section: the central claim that adaptive constraints substantially reduce safety violations under distribution shift rests on accurate context inference and prediction, yet no ablations are reported that inject noise or bias into the context estimator to measure degradation in performance or safety; without this, it is unclear whether the reported gains hold when inference error is non-negligible, as the skeptic concern notes.
  2. [Abstract] Abstract and results summary: the positive outcomes under seen and unseen nonstationary conditions are stated without accompanying quantitative values, error bars, baseline specifics, or statistical tests, which prevents assessment of effect size and undermines verification of the claim that task performance remains competitive.
minor comments (2)
  1. Define the acronym LILAC+ at its first appearance and clarify whether the '+' denotes an extension of a prior method.
  2. [Results] Results figures should include error bars and indicate whether differences from baselines are statistically significant.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the empirical support and clarity of the results.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the central claim that adaptive constraints substantially reduce safety violations under distribution shift rests on accurate context inference and prediction, yet no ablations are reported that inject noise or bias into the context estimator to measure degradation in performance or safety; without this, it is unclear whether the reported gains hold when inference error is non-negligible, as the skeptic concern notes.

    Authors: We agree that robustness to imperfect context inference is critical for the central claim. The current evaluation assumes a context estimator operating at the accuracy levels described in Section 4.2. To directly address this concern, we will add new ablation experiments that inject controlled noise and bias into the context predictions and report the resulting changes in safety violations and task performance across the nonstationary regimes. revision: yes

  2. Referee: [Abstract] Abstract and results summary: the positive outcomes under seen and unseen nonstationary conditions are stated without accompanying quantitative values, error bars, baseline specifics, or statistical tests, which prevents assessment of effect size and undermines verification of the claim that task performance remains competitive.

    Authors: We acknowledge that the abstract and high-level summary currently omit specific numbers. In the revision we will update both the abstract and the results summary to include quantitative effect sizes (e.g., mean reduction in safety violations with standard errors), explicit baseline names, and references to the statistical tests reported in the main evaluation tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity; methods defined independently and evaluated on separate simulated tasks

full rationale

The paper introduces LILAC+ with three explicitly defined adaptive mechanisms (context-based constraints, adaptation-speed constraints, budget-to-state enforcement) and evaluates them on distinct simulated driving scenarios under stationary, seen-nonstationary, and unseen-nonstationary conditions. No step reduces a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction; the central performance claims rest on empirical comparisons against unconstrained and fixed-constraint baselines rather than on any internal redefinition or load-bearing self-reference. The evaluation protocol uses held-out distribution shifts, satisfying the criterion for independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the assumption that context can be reliably inferred and predicted and that the three constraint types can be combined without introducing new instabilities; no free parameters or invented physical entities are mentioned.

axioms (1)
  • domain assumption Environmental context can be inferred and predicted from observations with sufficient accuracy to adjust safety constraints usefully.
    Central to the context-based safety constraints described in the abstract.
invented entities (1)
  • LILAC+ framework no independent evidence
    purpose: Unified adaptive safety mechanism for continual RL under nonstationarity
    Newly proposed combination of three constraint types.

pith-pipeline@v0.9.0 · 5740 in / 1354 out tokens · 42022 ms · 2026-05-20T21:31:23.965630+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

134 extracted references · 134 canonical work pages · 6 internal anchors

  1. [1]

    doi:10.1109/ms.2018.4321239 , number =

    David Lorge Parnas , title =. doi:10.1109/ms.2018.4321239 , number =

  2. [2]

    1999 , publisher =

    Constrained Markov decision processes , author =. 1999 , publisher =

  3. [3]

    2018 , publisher =

    Reinforcement learning: An introduction , author =. 2018 , publisher =

  4. [4]

    arXiv preprint arXiv:2004.07584 , year =

    Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions , author =. arXiv preprint arXiv:2004.07584 , year =

  5. [5]

    2016 IEEE 55th Conference on Decision and Control (CDC) , pages =

    Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes , author =. 2016 IEEE 55th Conference on Decision and Control (CDC) , pages =. 2016 , organization =

  6. [6]

    arXiv preprint arXiv:2205.10330 , year =

    A review of safe reinforcement learning: Methods, theory and applications , author =. arXiv preprint arXiv:2205.10330 , year =

  7. [7]

    arXiv preprint arXiv:2006.10701 , year =

    Deep reinforcement learning amidst lifelong non-stationarity , author =. arXiv preprint arXiv:2006.10701 , year =

  8. [8]

    International Conference on Machine Learning (ICML) , year =

    Constrained Policy Optimization , author =. International Conference on Machine Learning (ICML) , year =

  9. [9]

    Risk-Sensitive and Robust Decision-Making: a

    Chow, Yinlam and Ghavamzadeh, Mohammad and Janson, Lucas and Pavone, Marco , booktitle =. Risk-Sensitive and Robust Decision-Making: a

  10. [10]

    A uniform estimate for general quaternionic Calabi problem (with appendix by Daniel Barlet)

    Policy Gradient for Coherent Risk Measures , author =. arXiv preprint arXiv:1502.02267 , year =

  11. [11]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Safe Model-based Reinforcement Learning with Stability Guarantees , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  12. [12]

    1998 , publisher =

    Reinforcement learning: An introduction , author =. 1998 , publisher =

  13. [13]

    2015 European Control Conference (ECC) , pages =

    Safe and robust learning control with Gaussian processes , author =. 2015 European Control Conference (ECC) , pages =. 2015 , organization =

  14. [14]

    International Conference on Machine Learning , pages =

    Robust multi-objective bayesian optimization under input noise , author =. International Conference on Machine Learning , pages =. 2022 , organization =

  15. [15]

    Neural networks , volume =

    Continual lifelong learning with neural networks: A review , author =. Neural networks , volume =. 2019 , publisher =

  16. [16]

    Automated machine learning: methods, systems, challenges , pages =

    Meta-learning , author =. Automated machine learning: methods, systems, challenges , pages =. 2019 , publisher =

  17. [17]

    International conference on machine learning , pages =

    Pac-inspired option discovery in lifelong reinforcement learning , author =. International conference on machine learning , pages =. 2014 , organization =

  18. [18]

    International conference on machine learning , pages =

    Policy and value transfer in lifelong reinforcement learning , author =. International conference on machine learning , pages =. 2018 , organization =

  19. [19]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Lifelong learning with a changing action set , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

  20. [20]

    Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

    Continuous adaptation via meta-learning in nonstationary and competitive environments , author =. arXiv preprint arXiv:1710.03641 , year =

  21. [21]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Provably efficient primal-dual reinforcement learning for cmdps with non-stationary objectives and constraints , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

  22. [22]

    Advances in Neural Information Processing Systems , volume =

    Towards safe policy improvement for non-stationary MDPs , author =. Advances in Neural Information Processing Systems , volume =

  23. [23]

    arXiv preprint arXiv:2003.00660 , year =

    Upper confidence primal-dual optimization: Stochastically constrained markov decision processes with adversarial losses and unknown transitions , author =. arXiv preprint arXiv:2003.00660 , year =

  24. [24]

    International Conference on Machine Learning , pages =

    Optimizing for the future in non-stationary mdps , author =. International Conference on Machine Learning , pages =. 2020 , organization =

  25. [25]

    International Conference on Machine Learning , pages =

    Safe policy search for lifelong reinforcement learning with sublinear regret , author =. International Conference on Machine Learning , pages =. 2015 , organization =

  26. [26]

    International Conference on Artificial Intelligence and Statistics , pages =

    Provably efficient model-free algorithms for non-stationary cmdps , author =. International Conference on Artificial Intelligence and Statistics , pages =. 2023 , organization =

  27. [27]

    2021 IEEE International Conference on Robotics and Automation (ICRA) , pages =

    Context-aware safe reinforcement learning for non-stationary environments , author =. 2021 IEEE International Conference on Robotics and Automation (ICRA) , pages =. 2021 , organization =

  28. [28]

    2022 IEEE 61st Conference on Decision and Control (CDC) , pages =

    Finite-time complexity of online primal-dual natural actor-critic algorithm for constrained Markov decision processes , author =. 2022 IEEE 61st Conference on Decision and Control (CDC) , pages =. 2022 , organization =

  29. [29]

    arXiv preprint arXiv:2405.16601 , year =

    A CMDP-within-online framework for meta-safe reinforcement learning , author =. arXiv preprint arXiv:2405.16601 , year =

  30. [30]

    arXiv preprint arXiv:2111.00552 , year =

    Policy optimization for constrained mdps with provable fast global convergence , author =. arXiv preprint arXiv:2111.00552 , year =

  31. [31]

    Machine Learning , volume =

    All-time safety and sample-efficient meta update for online safe meta reinforcement learning under Markov task transition , author =. Machine Learning , volume =. 2025 , publisher =

  32. [32]

    IEEE Journal of Selected Topics in Signal Processing , volume =

    Online convex optimization in dynamic environments , author =. IEEE Journal of Selected Topics in Signal Processing , volume =. 2015 , publisher =

  33. [33]

    Proceedings of the AAAI conference on artificial intelligence , volume =

    Safe online convex optimization with unknown linear safety constraints , author =. Proceedings of the AAAI conference on artificial intelligence , volume =

  34. [34]

    IEEE Transactions on Cybernetics , volume =

    Adaptive safe reinforcement learning with full-state constraints and constrained adaptation for autonomous vehicles , author =. IEEE Transactions on Cybernetics , volume =. 2023 , publisher =

  35. [35]

    International Conference on Machine Learning , pages =

    Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments , author =. International Conference on Machine Learning , pages =. 2023 , organization =

  36. [36]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

    A review of safe reinforcement learning: Methods, theories and applications , author =. IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

  37. [37]

    The 30th international joint conference on artificial intelligence (ijcai) , year =

    Policy learning with constraints in model-free reinforcement learning: A survey , author =. The 30th international joint conference on artificial intelligence (ijcai) , year =

  38. [38]

    arXiv preprint arXiv:2402.02025 , year =

    A survey of constraint formulations in safe reinforcement learning , author =. arXiv preprint arXiv:2402.02025 , year =

  39. [39]

    ACM Computing Surveys (CSUR) , volume =

    A survey of reinforcement learning algorithms for dynamically varying environments , author =. ACM Computing Surveys (CSUR) , volume =. 2021 , publisher =

  40. [40]

    Journal of Artificial Intelligence Research , volume =

    Towards continual reinforcement learning: A review and perspectives , author =. Journal of Artificial Intelligence Research , volume =

  41. [41]

    Machine Learning , volume =

    A taxonomy for similarity metrics between markov decision processes , author =. Machine Learning , volume =. 2022 , publisher =

  42. [42]

    Proceedings of the AAAI conference on artificial intelligence , volume =

    Safe reinforcement learning via shielding under partial observability , author =. Proceedings of the AAAI conference on artificial intelligence , volume =

  43. [43]

    Machine learning , volume =

    Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , author =. Machine learning , volume =. 2023 , publisher =

  44. [44]

    2022 , school =

    Reinforcement Learning for Non-stationary problems , author =. 2022 , school =

  45. [45]

    Learning to reinforcement learn

    Learning to reinforcement learn , author =. arXiv preprint arXiv:1611.05763 , year =

  46. [46]

    System Modeling and Optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31--September 4, 1981 , pages =

    The Bayesian approach to global optimization , author =. System Modeling and Optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31--September 4, 1981 , pages =. 2005 , organization =

  47. [47]

    Foundations and Trends

    Bayesian reinforcement learning: A survey , author =. Foundations and Trends. 2015 , publisher =

  48. [48]

    Journal of Global Optimization , volume =

    Bayesian heuristic approach to global optimization and examples , author =. Journal of Global Optimization , volume =. 2002 , publisher =

  49. [49]

    2021 , publisher =

    Constrained Markov decision processes , author =. 2021 , publisher =

  50. [50]

    IEEE Transactions on Automatic Control , volume =

    Risk-constrained Markov decision processes , author =. IEEE Transactions on Automatic Control , volume =. 2014 , publisher =

  51. [51]

    International Conference on Machine Learning , pages =

    Safe reinforcement learning in constrained markov decision processes , author =. International Conference on Machine Learning , pages =. 2020 , organization =

  52. [52]

    International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning , pages =

    Safe learning and optimization techniques: Towards a survey of the state of the art , author =. International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning , pages =. 2020 , organization =

  53. [53]

    IEEE transactions on pattern analysis and machine intelligence , volume =

    Meta-learning in neural networks: A survey , author =. IEEE transactions on pattern analysis and machine intelligence , volume =. 2021 , publisher =

  54. [54]

    Annual Review of Control, Robotics, and Autonomous Systems , volume =

    Safe learning in robotics: From learning-based control to safe reinforcement learning , author =. Annual Review of Control, Robotics, and Autonomous Systems , volume =. 2022 , publisher =

  55. [55]

    Journal of mathematics and mechanics , pages =

    A Markovian decision process , author =. Journal of mathematics and mechanics , pages =. 1957 , publisher =

  56. [56]

    Journal of mathematical analysis and applications , volume =

    Optimal control of Markov processes with incomplete state information I , author =. Journal of mathematical analysis and applications , volume =. 1965 , publisher =

  57. [57]

    IJCAI: proceedings of the conference , volume =

    Hidden parameter markov decision processes: A semiparametric regression approach for discovering latent task parametrizations , author =. IJCAI: proceedings of the conference , volume =

  58. [58]

    Sequence learning: paradigms, algorithms, and applications , pages =

    Hidden-mode markov decision processes for nonstationary sequential decision making , author =. Sequence learning: paradigms, algorithms, and applications , pages =. 2001 , publisher =

  59. [59]

    Artificial intelligence , volume =

    Planning and acting in partially observable stochastic domains , author =. Artificial intelligence , volume =. 1998 , publisher =

  60. [60]

    Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems , pages =

    Improving reinforcement learning with context detection , author =. Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems , pages =

  61. [61]

    Handbooks in operations research and management science , volume =

    Markov decision processes , author =. Handbooks in operations research and management science , volume =. 1990 , publisher =

  62. [62]

    Advances in neural information processing systems , volume =

    Bayes-adaptive pomdps , author =. Advances in neural information processing systems , volume =

  63. [63]

    2002 , publisher =

    Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes , author =. 2002 , publisher =

  64. [64]

    2019 , school =

    Safe exploration in reinforcement learning: Theory and applications in robotics , author =. 2019 , school =

  65. [65]

    International workshop on hybrid systems: Computation and control , pages =

    Safety verification of hybrid systems using barrier certificates , author =. International workshop on hybrid systems: Computation and control , pages =. 2004 , organization =

  66. [66]

    Mitigating Distribution Shifts: Uncertainty-Aware Offline-to-Online Reinforcement Learning , author =

  67. [67]

    2019 American Control Conference (ACC) , pages =

    Safety-aware reinforcement learning framework with an actor-critic-barrier structure , author =. 2019 American Control Conference (ACC) , pages =. 2019 , organization =

  68. [68]

    IEEE Transactions on robotics , volume =

    Barrier-certified adaptive reinforcement learning with applications to brushbot navigation , author =. IEEE Transactions on robotics , volume =. 2019 , publisher =

  69. [69]

    International conference on machine learning , pages =

    Model-agnostic meta-learning for fast adaptation of deep networks , author =. International conference on machine learning , pages =. 2017 , organization =

  70. [70]

    International conference on machine learning , pages =

    Online meta-learning , author =. International conference on machine learning , pages =. 2019 , organization =

  71. [71]

    Foundations and Trends

    Introduction to online convex optimization , author =. Foundations and Trends. 2016 , publisher =

  72. [72]

    Advances in Neural Information Processing Systems , volume =

    Meta-reinforcement learning with universal policy adaptation: Provable near-optimality under all-task optimum comparator , author =. Advances in Neural Information Processing Systems , volume =

  73. [73]

    International Conference on Machine Learning , pages =

    Memory efficient online meta learning , author =. International Conference on Machine Learning , pages =. 2021 , organization =

  74. [74]

    Advances in neural information processing systems , volume =

    Adaptive gradient-based meta-learning methods , author =. Advances in neural information processing systems , volume =

  75. [75]

    International Conference on Machine Learning , pages =

    Crpo: A new approach for safe reinforcement learning with convergence guarantee , author =. International Conference on Machine Learning , pages =. 2021 , organization =

  76. [76]

    International conference on machine learning , pages =

    Efficient off-policy meta-reinforcement learning via probabilistic context variables , author =. International conference on machine learning , pages =. 2019 , organization =

  77. [77]

    2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

    Improved Robustness and Safety for Pre-Adaptation of Meta Reinforcement Learning with Prior Regularization , author =. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =. 2022 , organization =

  78. [78]

    2.3 softmax units for multinoulli output distributions , author =

    6.2. 2.3 softmax units for multinoulli output distributions , author =. Deep learning , volume =. 2016 , publisher =

  79. [79]

    Towards safe reinforcement learning via constraining con- ditional value-at-risk.arXiv preprint arXiv:2206.04436,

    Towards safe reinforcement learning via constraining conditional value-at-risk , author =. arXiv preprint arXiv:2206.04436 , year =

  80. [80]

    Journal of Advances in Information Fusion , volume =

    A constrained POMDP formulation and algorithmic solution for radar resource management in multi-target tracking , author =. Journal of Advances in Information Fusion , volume =. 2021 , publisher =

Showing first 80 references.