Non-Markovian policies from decomposed temporal logic value functions are proven optimal for nested Until, Globally, and Globally-Until specifications and extend Q-function safety filters to complex tasks.
Bridging hamilton-jacobi safety analysis and reinforcement learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
An autopilot-preserving residual Q-learning supervisor with HJB-inspired finite-action risk filtering reduces mean RMS path-tracking error from 338.617 m to 44.809 m (86.77% reduction) in fixed simulation benchmarks.
citing papers explorer
-
Value Functions for Temporal Logic: Optimal Policies and Safety Filters
Non-Markovian policies from decomposed temporal logic value functions are proven optimal for nested Until, Globally, and Globally-Until specifications and extend Q-function safety filters to complex tasks.
-
Autopilot-Preserving Residual Q-Learning with HJB-Inspired Finite-Action Risk Filtering for Fixed-Wing UAV Command Supervision
An autopilot-preserving residual Q-learning supervisor with HJB-inspired finite-action risk filtering reduces mean RMS path-tracking error from 338.617 m to 44.809 m (86.77% reduction) in fixed simulation benchmarks.