hub

A review of safe reinforcement learning: Methods, theory and applications

Gu, S · 2022 · arXiv 2205.10330

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Data-Driven Synthesis of Probabilistic Controlled Invariant Sets for Linear MDPs

eess.SY · 2026-04-03 · unverdicted · novelty 7.0

Data-driven regularized least squares with self-normalized bounds and lattice abstraction yields certified (N, ε)-PCIS for linear MDPs via conservative backward recursion.

The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification

cs.LG · 2025-10-01 · unverdicted · novelty 7.0

A no-regret procedure for safe online logistic classification that meets a target error rate with high probability using only O(sqrt(T)) excess tests over an oracle.

Constrained Decoding for Safe Robot Navigation Foundation Models

cs.RO · 2025-09-01 · unverdicted · novelty 7.0

SafeDec uses constrained decoding to ensure autoregressive robot navigation foundation models generate actions that provably satisfy STL safety specifications under assumed dynamics.

From Cumulative Constraints to Adaptive Runtime Safety Control for Nonstationary Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

CPSS projects cumulative safety constraints into time-varying per-state thresholds for online action shielding in nonstationary RL, providing per-state guarantees and cumulative bounds.

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

cs.CR · 2026-04-30 · unverdicted · novelty 6.0

TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.

Learning Control Policies to Provably Satisfy Hard Affine Constraints for Black-Box Hybrid Dynamical Systems

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

The authors introduce affine repulsive RL policies that provably satisfy hard affine state constraints for black-box hybrid dynamical systems with affine reset maps by deriving sufficient closed-loop safety conditions and testing on pendulum and juggler examples.

Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

LILAC+ combines context-based, adaptation-speed, and budget-to-state safety constraints to reduce violations in continual RL under nonstationary conditions, demonstrated in simulated driving tasks.

Safe and Policy-Compliant Multi-Agent Orchestration for Enterprise AI

cs.AI · 2026-04-19 · unverdicted · novelty 5.0

CAMCO enforces policy constraints on multi-agent AI at deployment time via convex projection, risk-weighted Lagrangian shaping, and bounded-convergence negotiation, yielding zero violations and 92-97% utility in tested enterprise scenarios.

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

cs.RO · 2025-03-05 · unverdicted · novelty 5.0

SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.

Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making

cs.CY · 2025-02-17 · unverdicted · novelty 5.0

A reinforcement learning model is ethically fine-tuned using aggregated feedback from LLMs embodying five moral principles via Belief Jensen-Shannon Divergence and Dempster-Shafer Theory.

A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions

eess.SY · 2025-08-12 · unverdicted · novelty 2.0

A literature review of safe RL using Lyapunov and barrier functions that identifies a shift to model-free methods since 2017, well-defined open problems per approach class, and high-dimensional scalability as the main barrier.

citing papers explorer

Showing 11 of 11 citing papers.

Data-Driven Synthesis of Probabilistic Controlled Invariant Sets for Linear MDPs eess.SY · 2026-04-03 · unverdicted · none · ref 3
Data-driven regularized least squares with self-normalized bounds and lattice abstraction yields certified (N, ε)-PCIS for linear MDPs via conservative backward recursion.
The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification cs.LG · 2025-10-01 · unverdicted · none · ref 23
A no-regret procedure for safe online logistic classification that meets a target error rate with high probability using only O(sqrt(T)) excess tests over an oracle.
Constrained Decoding for Safe Robot Navigation Foundation Models cs.RO · 2025-09-01 · unverdicted · none · ref 32
SafeDec uses constrained decoding to ensure autoregressive robot navigation foundation models generate actions that provably satisfy STL safety specifications under assumed dynamics.
From Cumulative Constraints to Adaptive Runtime Safety Control for Nonstationary Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 6
CPSS projects cumulative safety constraints into time-varying per-state thresholds for online action shielding in nonstationary RL, providing per-state guarantees and cumulative bounds.
TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning cs.CR · 2026-04-30 · unverdicted · none · ref 8
TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.
Learning Control Policies to Provably Satisfy Hard Affine Constraints for Black-Box Hybrid Dynamical Systems cs.RO · 2026-04-24 · unverdicted · none · ref 26
The authors introduce affine repulsive RL policies that provably satisfy hard affine state constraints for black-box hybrid dynamical systems with affine reset maps by deriving sufficient closed-loop safety conditions and testing on pendulum and juggler examples.
Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints cs.LG · 2026-05-13 · unverdicted · none · ref 6
LILAC+ combines context-based, adaptation-speed, and budget-to-state safety constraints to reduce violations in continual RL under nonstationary conditions, demonstrated in simulated driving tasks.
Safe and Policy-Compliant Multi-Agent Orchestration for Enterprise AI cs.AI · 2026-04-19 · unverdicted · none · ref 3
CAMCO enforces policy constraints on multi-agent AI at deployment time via convex projection, risk-weighted Lagrangian shaping, and bounded-convergence negotiation, yielding zero violations and 92-97% utility in tested enterprise scenarios.
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning cs.RO · 2025-03-05 · unverdicted · none · ref 56
SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.
Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making cs.CY · 2025-02-17 · unverdicted · none · ref 39
A reinforcement learning model is ethically fine-tuned using aggregated feedback from LLMs embodying five moral principles via Belief Jensen-Shannon Divergence and Dempster-Shafer Theory.
A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions eess.SY · 2025-08-12 · unverdicted · none · ref 42
A literature review of safe RL using Lyapunov and barrier functions that identifies a shift to model-free methods since 2017, well-defined open problems per approach class, and high-dimensional scalability as the main barrier.

A review of safe reinforcement learning: Methods, theory and applications

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer