CVaR-constrained TD3 policies for robot navigation show larger safety margins and higher post-training reachability verification rates than average-cost baselines across simulated scenarios and real-robot tests.
Addressing function approxi- mation error in actor-critic methods
9 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 9representative citing papers
AmelPredSto, a stochastic self-predictive representation model, outperforms other state representation learning approaches when combined with actor-critic RL for object-goal navigation in UAVs.
PriPG-RL trains RL policies for POMDPs by distilling knowledge from a privileged anytime-feasible MPC planner into a P2P-SAC policy, improving sample efficiency and performance in partially observable robotic navigation.
A modular belief-space controller using learned Belief Control Lyapunov Functions for information gathering and conformal-prediction Belief Control Barrier Functions for safety reduces reach-avoid POMDP synthesis to fast quadratic programs.
A separate regulator module adaptively scales actions in RL to reduce constraint violations while preserving exploration, yielding up to 126x fewer violations and over 10x higher returns on Safety Gym tasks.
PBRS-augmented RL trained in simple settings transfers zero-shot to complex UAV environments when wrapped with a CLF-CBF-QP safety filter, yielding shorter missions and formal safety guarantees.
Offline RL for ICU sedation shows that adding 30-day mortality to the objective yields policies whose clinician agreement correlates negatively with mortality, unlike pain-only versions.
A GNN-augmented SAC policy that encodes tensegrity topology as a graph improves sample efficiency and enables zero-shot sim-to-real locomotion on a 3-bar tensegrity robot.
LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.
citing papers explorer
-
Safety-Constrained Reinforcement Learning with Post-Training Reachability Verification for Robot Navigation
CVaR-constrained TD3 policies for robot navigation show larger safety margins and higher post-training reachability verification rates than average-cost baselines across simulated scenarios and real-robot tests.
-
Self-Predictive Representation for Autonomous UAV Object-Goal Navigation
AmelPredSto, a stochastic self-predictive representation model, outperforms other state representation learning approaches when combined with actor-critic RL for object-goal navigation in UAVs.
-
PriPG-RL: Privileged Planner-Guided Reinforcement Learning for Partially Observable Systems with Anytime-Feasible MPC
PriPG-RL trains RL policies for POMDPs by distilling knowledge from a privileged anytime-feasible MPC planner into a P2P-SAC policy, improving sample efficiency and performance in partially observable robotic navigation.
-
Safety-critical Control Under Partial Observability: Reach-Avoid POMDP meets Belief Space Control
A modular belief-space controller using learned Belief Control Lyapunov Functions for information gathering and conformal-prediction Belief Control Barrier Functions for safety reduces reach-avoid POMDP synthesis to fast quadratic programs.
-
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling
A separate regulator module adaptively scales actions in RL to reduce constraint violations while preserving exploration, yielding up to 126x fewer violations and over 10x higher returns on Safety Gym tasks.
-
Zero-Shot, Safe and Time-Efficient UAV Navigation via Potential-Based Reward Shaping, Control Lyapunov and Barrier Functions
PBRS-augmented RL trained in simple settings transfers zero-shot to complex UAV environments when wrapped with a CLF-CBF-QP safety filter, yielding shorter missions and formal safety guarantees.
-
On Safer Reinforcement Learning for Sedation and Analgesia in Intensive Care
Offline RL for ICU sedation shows that adding 30-day mortality to the objective yields policies whose clinician agreement correlates negatively with mortality, unlike pain-only versions.
-
Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion
A GNN-augmented SAC policy that encodes tensegrity topology as a graph improves sample efficiency and enables zero-shot sim-to-real locomotion on a 3-bar tensegrity robot.
-
LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning
LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.