arxiv: 2604.09452 · v1 · submitted 2026-04-10 · 💻 cs.LG · cs.AI

Recognition: no theorem link

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

Maksim Anisimov (Imperial College London) , Francesco Belardinelli (Imperial College London) , Matthew Wicker (Imperial College London)

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords safe reinforcement learningpolicy updatesRashomon setcontinual learningsafety guaranteespolicy projection

0 comments

The pith

Projecting reinforcement learning policy updates onto a certified safe region preserves formal safety guarantees during adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that any reinforcement learning update rule can be made safe by projecting its changes onto the Rashomon set, a region of policy parameters already certified to satisfy safety constraints on demonstration data. This projection step is claimed to deliver a priori provable guarantees that previous safety properties survive even when the policy is adapted to new tasks or changing dynamics. The approach targets continual reinforcement learning settings where environments are non-stationary and safety must hold on past tasks while performance improves on current ones. A reader would care because most existing methods either lack formal proofs or check safety only after the fact, leaving open the risk of catastrophic forgetting of safety rules.

Core claim

The central claim is that one can provide formal, provable guarantees for arbitrary RL algorithms used to update a policy by projecting their updates onto the Rashomon set: a region in policy parameter space certified to meet safety constraints within the demonstration data distribution.

What carries the argument

The Rashomon set, a certified region in policy parameter space that contains all policies meeting safety constraints on the demonstration distribution, which works by allowing any external update to be projected back inside it so safety is retained.

If this is right

Safety on the source task remains deterministically guaranteed after adaptation in grid-world navigation tasks.
Regularisation baselines suffer catastrophic forgetting of safety constraints while the projection method does not.
The method works for any underlying RL update algorithm without modifying the algorithm itself.
A priori certification replaces post-hoc verification for continual policy updates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same projection idea could be applied to other constraint types beyond safety, such as performance bounds or fairness criteria.
If the Rashomon set can be approximated efficiently in high-dimensional deep networks, the technique might scale to robotic control tasks with changing goals.
Combining the set with online monitoring could detect when the projection step becomes too restrictive and trigger a re-certification.

Load-bearing premise

The Rashomon set identified from demonstration data remains safe when the policy is deployed in the actual environment and when further updates occur.

What would settle it

A concrete counter-example would be an adapted policy that is projected onto the Rashomon set yet still violates a safety constraint when executed in the target environment with altered dynamics.

Figures

Figures reproduced from arXiv: 2604.09452 by Francesco Belardinelli (Imperial College London), Maksim Anisimov (Imperial College London), Matthew Wicker (Imperial College London).

**Figure 1.** Figure 1: Overview of the proposed method. The unsafety [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 3.** Figure 3: Examples of experiment environments: Frozen Lake [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Scalability analysis across diagonal Frozen Lake [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Relationship between the safety surrogate and the [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Environment frames at initial states. C TRAINING, ADAPTATION, AND CERTIFICATION SETTINGS We report the full experimental configuration used in both environments (Frozen Lake and Poisoned Apple), organized by training stage [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Logit bounds illustrate the following guarantee of [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Neural policy probabilities for the worst-case logit [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

read the original abstract

Safety guarantees are a prerequisite to the deployment of reinforcement learning (RL) agents in safety-critical tasks. Often, deployment environments exhibit non-stationary dynamics or are subject to changing performance goals, requiring updates to the learned policy. This leads to a fundamental challenge: how to update an RL policy while preserving its safety properties on previously encountered tasks? The majority of current approaches either do not provide formal guarantees or verify policy safety only a posteriori. We propose a novel a priori approach to safe policy updates in continual RL by introducing the Rashomon set: a region in policy parameter space certified to meet safety constraints within the demonstration data distribution. We then show that one can provide formal, provable guarantees for arbitrary RL algorithms used to update a policy by projecting their updates onto the Rashomon set. Empirically, we validate this approach across grid-world navigation environments (Frozen Lake and Poisoned Apple) where we guarantee an a priori provably deterministic safety on the source task during downstream adaptation. In contrast, we observe that regularisation-based baselines experience catastrophic forgetting of safety constraints while our approach enables strong adaptation with provable guarantees that safety is preserved.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a Rashomon set of safe policies from demonstration data and projects arbitrary RL updates onto it to preserve safety on the source task, but the guarantees stay tied to that fixed distribution.

read the letter

The main point is that they introduce the Rashomon set as the region of policy parameters that meet safety constraints under the demonstration data distribution. Projecting any update from an arbitrary RL algorithm onto this set then gives a formal guarantee that safety on the original task is kept during adaptation in continual settings. This is the central construction, and it is distinct from regularization or post-training verification methods in the literature they cite. The projection step itself is what allows the approach to work with black-box learners, which is a practical advantage. On the two gridworld tasks they show that this prevents the safety forgetting that regularization baselines suffer, while still permitting adaptation. That empirical contrast is clear enough on the source task. The soft spot is exactly the one in the stress-test note. Safety is certified only with respect to the demonstration distribution, and the paper gives no formal argument or mechanism showing why the projected policy stays safe once dynamics shift or state visitation changes in non-stationary environments. The abstract states the target is continual RL with changing goals, yet the construction does not address transfer beyond the fixed data distribution. Experiments are also narrow: two simple grid worlds, no variance numbers, and no ablation on how the Rashomon set is computed or its sensitivity to data. Without those, it is difficult to assess how robust the guarantees are in practice. This work is aimed at researchers in safe and continual reinforcement learning who need a priori rather than a posteriori assurances. A reader looking for new mechanisms to constrain policy updates without restricting the learner would get value from the projection idea. I would send it to peer review. The core construction is worth referee discussion even though the distributional transfer claim needs more support.

Referee Report

1 major / 3 minor

Summary. The paper introduces the Rashomon set as the region of policy parameters satisfying safety constraints evaluated under the demonstration data distribution. It claims that projecting arbitrary RL algorithm updates onto this set yields formal, provable a priori safety guarantees for policy updates in continual RL, even under non-stationary dynamics or changing goals. Empirically, the method is tested on two grid-world tasks (Frozen Lake, Poisoned Apple), where it preserves deterministic safety on the source task during adaptation while regularization baselines suffer catastrophic forgetting.

Significance. If the central formal claim could be made rigorous, the projection-based approach would be a notable contribution to safe continual RL: it decouples the choice of update algorithm from safety certification and provides guarantees without post-hoc verification. The Rashomon-set construction is a clean idea that could generalize if the distributional assumptions are addressed. Currently, however, the guarantees are confined to the fixed demonstration distribution, limiting applicability to the non-stationary settings the paper targets.

major comments (1)

Abstract and the definition of the Rashomon set: the central claim states that projection onto the Rashomon set supplies 'formal, provable guarantees' for updates in continual RL with non-stationary dynamics. Yet the set is explicitly defined by safety constraints evaluated only under the demonstration data distribution, and no derivation, theorem, or argument is supplied showing that this certification transfers to deployment dynamics, altered state-visitation distributions, or subsequent updates. This distributional-equivalence assumption is load-bearing for the transfer of safety guarantees and is not established.

minor comments (3)

Empirical section: results are reported on only two simple grid-world environments with no variance across random seeds, no ablation on Rashomon-set computation or projection accuracy, and no evaluation under explicit non-stationarity. These details would strengthen the practical assessment but are not required for the formal claim.
Notation and presentation: the manuscript would benefit from an explicit statement of all assumptions required for the projection operator to preserve safety (e.g., convexity of the set, exactness of the projection) and from a clear distinction between safety on the source distribution versus safety under deployment dynamics.
Missing details: no error analysis or approximation bounds are given for how the Rashomon set is identified or represented in practice, especially for deep policies.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and for identifying the need to clarify the precise scope of our safety guarantees. We address the major comment below by explaining the construction and by committing to revisions that remove any ambiguity.

read point-by-point responses

Referee: Abstract and the definition of the Rashomon set: the central claim states that projection onto the Rashomon set supplies 'formal, provable guarantees' for updates in continual RL with non-stationary dynamics. Yet the set is explicitly defined by safety constraints evaluated only under the demonstration data distribution, and no derivation, theorem, or argument is supplied showing that this certification transfers to deployment dynamics, altered state-visitation distributions, or subsequent updates. This distributional-equivalence assumption is load-bearing for the transfer of safety guarantees and is not established.

Authors: We agree that the Rashomon set is defined exclusively by safety constraints evaluated under the fixed demonstration data distribution of the source task. The projection step therefore supplies a formal guarantee only with respect to that distribution: any policy that remains inside the set satisfies the source-task safety constraints by construction, regardless of the update rule or the dynamics of future tasks. We do not claim, and do not provide a theorem for, transfer of these guarantees to new deployment dynamics, altered state-visitation distributions, or safety on subsequent tasks. The intended use case is continual RL in which an agent may adapt to new goals or environments while provably retaining safety on previously encountered source tasks. We will revise the abstract, introduction, and the statement of the main result to state this scope explicitly and to eliminate any phrasing that could be read as promising safety transfer beyond the source distribution. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper defines the Rashomon set externally from safety constraints evaluated on the demonstration data distribution, then defines projection as the operation that maps any update into that set. The resulting guarantee that the projected policy remains safe on the source distribution follows directly from set membership but does not reduce the identification or certification of the set itself to the projection step. No equations or claims in the provided text equate a fitted parameter to a prediction, invoke self-citations as load-bearing uniqueness theorems, or rename known results. The central construction therefore retains independent content in how the set is obtained and applied to arbitrary updates.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a computable Rashomon set derived from safety constraints and demonstration data; no free parameters are explicitly fitted in the abstract description.

axioms (1)

domain assumption Safety constraints can be certified as holding for all policies within a region of parameter space based solely on the demonstration data distribution.
This underpins the definition of the Rashomon set and the projection step.

invented entities (1)

Rashomon set no independent evidence
purpose: Certified region in policy parameter space that meets safety constraints on source task data.
Newly introduced construct enabling the projection-based guarantees.

pith-pipeline@v0.9.0 · 5512 in / 1170 out tokens · 35101 ms · 2026-05-10T17:47:59.812824+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 15 canonical work pages · 5 internal anchors

[1]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. InProceedings of the 34th International Conference on Machine Learning (ICML). https://proceedings.mlr.press/v70/achiam17a.html

2017
[2]

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. 2018. Memory Aware Synapses: Learning What (Not) to Forget. InEuropean Conference on Computer Vision. 139–154

2018
[3]

Kochenderfer, Scott Niekum, and Ufuk Topcu

Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Mykel J. Kochenderfer, Scott Niekum, and Ufuk Topcu. 2018. Safe Reinforcement Learning via Shielding. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI). https: //ojs.aaai.org/index.php/AAAI/article/view/11573

2018
[4]

1999.Constrained Markov Decision Processes

Eitan Altman. 1999.Constrained Markov Decision Processes. CRC Press

1999
[5]

Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable Rein- forcement Learning via Policy Extraction. InAdvances in Neural Information Processing Systems, Vol. 31

2018
[6]

Turchetta, Angela P

Felix Berkenkamp, Matteo P. Turchetta, Angela P. Schoellig, and Andreas Krause
[7]

In Advances in Neural Information Processing Systems (NeurIPS)

Safe Model-based Reinforcement Learning with Stability Guarantees. In Advances in Neural Information Processing Systems (NeurIPS). https://proceedings. neurips.cc/paper/2017/hash/3a1dd3879d8c38fefcd1b582f44f0f42-Abstract.html

2017
[8]

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. 2021. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[9]

Haitham Bou Ammar, Rasul Tutunov, and Eric Eaton. 2015. Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret. InInternational Conference on Machine Learning. PMLR, 2361–2369

2015
[10]

Campi and Simone Garatti

Marco C. Campi and Simone Garatti. 2008. The Exact Feasibility of Randomized Solutions of Uncertain Convex Programs.SIAM Journal on Optimization19, 3 (2008), 1211–1230

2008
[11]

Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. 2018. Lyapunov-based Safe Policy Optimization for Continuous Control. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI). https://ojs.aaai.org/index.php/AAAI/article/view/11682

2018
[12]

Jacques Cloete, Niklas Vertovec, and Kostas Margellos. 2025. SPoRt – Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)

2025
[13]

Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Padu- raru, and Yuval Tassa. 2018. Safe Exploration in Continuous Action Spaces. In Proceedings of the 35th International Conference on Machine Learning (ICML). https://proceedings.mlr.press/v80/dalal18a.html

2018
[14]

Leo Elmecker-Plakolm, Pierre Fasterling, Philip Sosnin, Calvin Tsay, and Matthew Wicker. 2025. Provably Safe Model Updates.arXiv preprint arXiv:2512.01899 (2025). Submitted to IEEE SaTML 2026

work page arXiv 2025
[15]

Ingy ElSayed-Aly, Suda Bharadwaj, Christopher Amato, Rüdiger Ehlers, Ufuk Topcu, and Lu Feng. 2021. Safe Multi-Agent Reinforcement Learning via Shielding. InProceedings of the 20th International Conference on Autonomous Agents and Multi-Agent Systems

2021
[16]

Javier García and Fernando Fernández. 2015. A Comprehensive Survey on Safe Reinforcement Learning.Journal of Machine Learning Research16 (2015), 1437– 1480

2015
[17]

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli
[18]

On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models.arXiv preprint arXiv:1810.12715(2018)

work page arXiv 2018
[19]

Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll. 2024. A Review of Safe Reinforcement Learning: Methods, Theories and Applications.IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

2024
[20]

David Held, Zoe McCarthy, Michael Zhang, Fred Shentu, and Pieter Abbeel. 2017. Probabilistically Safe Policy Transfer. InIEEE International Conference on Robotics and Automation (ICRA). 5798–5805

2017
[21]

Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John- Mark Allen, Vinh-Dieu Lam, Alex Bewley, and Amar Shah. 2019. Learning to Drive in a Day. InInternational Conference on Robotics and Automation, ICRA 2019, Montreal, QC, Canada, May 20-24, 2019. IEEE, 8248–8254. https://doi.org/ 10.1109/ICRA.2019.8793742

work page doi:10.1109/icra.2019.8793742 2019
[22]

Khimya Khetarpal, Matthew Riemer, Irina Rish, and Doina Precup. 2022. To- wards Continual Reinforcement Learning: A Review and Perspectives.Journal of Artificial Intelligence Research75 (2022), 1401–1476

2022
[23]

Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwińska, et al

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwińska, et al. 2017. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences114, 13 (2017), 3521– 3526

2017
[24]

Bagnell, and Jan Peters

Jens Kober, J. Bagnell, and Jan Peters. 2013. Reinforcement Learning in Robotics: A Survey.The International Journal of Robotics Research32 (09 2013), 1238–1274. https://doi.org/10.1177/0278364913495721

work page doi:10.1177/0278364913495721 2013
[25]

David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient Episodic Memory for Continual Learning. InAdvances in Neural Information Processing Systems, Vol. 30

2017
[26]

Arun Mallya and Svetlana Lazebnik. 2018. PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7765–7773

2018
[27]

Matthew Mirman, Timon Gehr, and Martin Vechev. 2018. Differentiable Abstract Interpretation for Provably Robust Neural Networks. InInternational Conference on Machine Learning. PMLR, 3578–3586

2018
[28]

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning.CoRRabs/1312.5602 (2013). arXiv:1312.5602 http://arxiv.org/abs/1312.5602

work page internal anchor Pith review arXiv 2013
[29]

Mark B. Ring. 1994. Continual Learning in Reinforcement Environments.PhD thesis, University of Texas at Austin(1994)

1994
[30]

Policy Distillation

Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. 2015. Policy Distillation.arXiv preprint arXiv:1511.06295(2015). https://arxiv.org/abs/1511.06295

work page Pith review arXiv 2015
[31]

Progressive Neural Networks

Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Pro- gressive Neural Networks.arXiv preprint arXiv:1606.04671(2016)

work page internal anchor Pith review arXiv 2016
[32]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov
[33]

Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms. InarXiv preprint arXiv:1707.06347. https://arxiv.org/abs/1707.06347

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Jonathan Schwarz, Wojciech Marian Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwińska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. 2018. Progress & Compress: A scalable framework for continual learning.arXiv preprint arXiv:1805.06370(2018)

work page arXiv 2018
[35]

Lesia Semenova and Cynthia Rudin. 2019. A Study in Rashomon Curves and Volumes: A New Perspective on Generalization and Model Simplicity in Machine Learning.arXiv preprint arXiv:1908.01755(2019). arXiv:1908.01755 [cs.LG] https: //arxiv.org/abs/1908.01755

work page arXiv 2019
[36]

Philip Sosnin, Mark Niklas Müller, Maximilian Baader, Calvin Tsay, and Matthew Robert Wicker. 2025. Certified Robustness to Data Poisoning in Gradient-Based Training.Transactions on Machine Learning Research(2025). https://openreview.net/forum?id=9WHifn9ZVX

2025
[37]

Philip Sosnin, Matthew Wicker, Josh Collyer, and Calvin Tsay. 2025. Abstract Gradient Training: A Unified Certification Framework for Data Poisoning, Un- learning, and Differential Privacy.arXiv preprint arXiv:2511.09400(2025)

work page arXiv 2025
[38]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. 2018.Reinforcement Learning: An Intro- duction(second ed.). The MIT Press. http://incompleteideas.net/book/the-book- 2nd.html

2018
[39]

Lukasz Szpruch, Agni Orfanoudaki, Carsten Maple, Matthew Wicker, Yoshua Bengio, Kwok-Yan Lam, and Marcin Detyniecki. 2025. Insuring AI: Incentivising Safe and Secure Deployment of AI Workflows.A vailable at SSRN 5505759(2025)

2025
[40]

1998.Lifelong Learning Algorithms

Sebastian Thrun. 1998.Lifelong Learning Algorithms. Springer. 181–209 pages

1998
[41]

Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U. Balis, Gianluca De Cola, Tristan Deleu, Manuel Goulão, Andreas Kallinteris, Markus Krimmel, Arjun KG, Rodrigo Perez-Vicente, Andrea Pierré, Sander Schulhoff, Jun Jet Tai, Hannah Tan, and Omar G. Younis. 2025. Gymnasium: A Standard Interface for Reinforcement Learning Environments. arXiv:2407.17032 [cs....

work page internal anchor Pith review arXiv 2025
[42]

Matthew Wicker, Juyeon Heo, Luca Costabello, and Adrian Weller. 2023. Robust Explanation Constraints for Neural Networks. InThe Eleventh International Conference on Learning Representations (ICLR). https://openreview.net/forum? id=_hHYaKu0jcj

2023
[43]

Matthew Wicker, Luca Laurenti, Andrea Patane, and Marta Kwiatkowska. 2020. Probabilistic safety for bayesian neural networks. InUAI. PMLR, 1198–1207

2020
[44]

Matthew Wicker, Luca Laurenti, Andrea Patane, Nicola Paoletti, Alessandro Abate, and Marta Kwiatkowska. 2021. Certification of iterative predictions in bayesian neural networks. InUAI. PMLR, 1713–1723

2021
[45]

Matthew Wicker, Luca Laurenti, Andrea Patane, Nicola Paoletti, Alessandro Abate, and Marta Kwiatkowska. 2024. Probabilistic reach-avoid for Bayesian neural networks.Artificial Intelligence334 (2024), 104132

2024
[46]

Matthew Robert Wicker, Philip Sosnin, Igor Shilov, Adrianna Janik, Mark Niklas Mueller, Yves-Alexandre De Montjoye, Adrian Weller, and Calvin Tsay. 2025. Certification for Differentially Private Prediction in Gradient-Based Training. In International Conference on Machine Learning. PMLR, 66726–66745

2025
[47]

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. 2021. Fast and Complete: En- abling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers.International Conference on Learning Representations(2021)

2021
[48]

Tengyu Xu, Yingbin Liang, and Guanghui Lan. 2021. CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee. InInternational Conference on Machine Learning. PMLR

2021
[49]

Simão, Nils Jansen, Simon H

Qisong Yang, Thiago D. Simão, Nils Jansen, Simon H. Tindemans, and Matthijs T. J. Spaan. 2023. Reinforcement Learning by Guided Safe Exploration. InECAI 2023 – 26th European Conference on Artificial Intelligence (Frontiers in Artificial Intelligence and Applications, Vol. 372). IOS Press, 2858–2865. https://doi.org/10. 3233/FAIA230598

2023
[50]

Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, and Peter J. Ramadge. 2020. Projection-Based Constrained Policy Optimization. InInternational Conference on Learning Representations

2020
[51]

Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual Learning Through Synaptic Intelligence.arXiv preprint arXiv:1703.04200(2017)

work page arXiv 2017
[52]

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel
[53]

Efficient Neural Network Robustness Certification with General Activation Functions.Advances in Neural Information Processing Systems31 (2018)

2018
[54]

uncertainty

Quanqi Zhang, Chengwei Wu, Haoyu Tian, Yabin Gao, Weiran Yao, and Lig- ang Wu. 2024. Safety Reinforcement Learning Control via Transfer Learning. Automatica166 (2024), 111714. https://doi.org/10.1016/j.automatica.2024.111714 A METHODOLOGY DETAILS Here we provide some additional details on our methodSafeAdapt. A.1 Safety surrogate sound constraint Consider...

work page doi:10.1016/j.automatica.2024.111714 2024