Iterative reachability estimation for safe reinforcement learning,

· 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Bellman Value Decomposition for Task Logic in Safe Optimal Control

cs.RO · 2026-02-23 · unverdicted · novelty 7.0

Bellman values for temporal logic tasks decompose into a graph of reach-avoid, avoid, and reach-avoid-loop equations solved by embedding the graph in a two-layer neural net (VDPPO) for safe high-dimensional control.

Constraint-Aware Reinforcement Learning via Adaptive Action Scaling

cs.RO · 2025-10-13 · unverdicted · novelty 6.0

A separate regulator module adaptively scales actions in RL to reduce constraint violations while preserving exploration, yielding up to 126x fewer violations and over 10x higher returns on Safety Gym tasks.

citing papers explorer

Showing 2 of 2 citing papers.

Bellman Value Decomposition for Task Logic in Safe Optimal Control cs.RO · 2026-02-23 · unverdicted · none · ref 65
Bellman values for temporal logic tasks decompose into a graph of reach-avoid, avoid, and reach-avoid-loop equations solved by embedding the graph in a two-layer neural net (VDPPO) for safe high-dimensional control.
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling cs.RO · 2025-10-13 · unverdicted · none · ref 25
A separate regulator module adaptively scales actions in RL to reduce constraint violations while preserving exploration, yielding up to 126x fewer violations and over 10x higher returns on Safety Gym tasks.

Iterative reachability estimation for safe reinforcement learning,

fields

years

verdicts

representative citing papers

citing papers explorer