For any ρ>0 there exists a ρ-TV-stable RL algorithm for tabular MDPs supporting exact unlearning at expected cost ρ√(ln T) of retraining from scratch, with regret O(H²√(SAT)+H³S²A+H^{2.5}S²A/ρ) and matching lower bound Ω(H√(SAT)+SAH/ρ).
arXiv preprint arXiv:2503.14347 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
A T-estimation-based procedure for adaptive density estimation and optimal control in offline contextual MDPs without stationarity, providing oracle risk bounds under two loss functions and finite-sample cost guarantees.
citing papers explorer
-
Exact Unlearning in Reinforcement Learning
For any ρ>0 there exists a ρ-TV-stable RL algorithm for tabular MDPs supporting exact unlearning at expected cost ρ√(ln T) of retraining from scratch, with regret O(H²√(SAT)+H³S²A+H^{2.5}S²A/ρ) and matching lower bound Ω(H√(SAT)+SAH/ρ).
-
Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity
A T-estimation-based procedure for adaptive density estimation and optimal control in offline contextual MDPs without stationarity, providing oracle risk bounds under two loss functions and finite-sample cost guarantees.