A nested dynamic program using the Regret-Bellman operator computes regret-optimal policies that interpolate between MDP and robust controllers for finite-state systems.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Soft Bellman residual minimization with weighted Lp-norm aligns the objective with Bellman contraction as p increases and yields performance error bounds.
Reinforcement learning is used to learn adaptive policies for selecting parameters in nonlinear Bayesian filters, improving estimate quality and consistency in experiments with the unscented Kalman filter and stochastic integration filter.
citing papers explorer
-
Regret-Optimal Control for Finite-State Systems
A nested dynamic program using the Regret-Bellman operator computes regret-optimal policies that interpolate between MDP and robust controllers for finite-state systems.
-
Contraction-Aligned Analysis of Soft Bellman Residual Minimization with Weighted Lp-Norm for Markov Decision Problem
Soft Bellman residual minimization with weighted Lp-norm aligns the objective with Bellman contraction as p increases and yields performance error bounds.
-
Learning Adaptive Parameter Policies for Nonlinear Bayesian Filtering
Reinforcement learning is used to learn adaptive policies for selecting parameters in nonlinear Bayesian filters, improving estimate quality and consistency in experiments with the unscented Kalman filter and stochastic integration filter.