Introduces KL misspecification for bandits and RL under function approximation and proves explicit KL-regret bounds for regression-based Gibbs algorithms that recover the realizable case.
A general framework for sequential decision- making under adaptivity constraints
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
The work establishes the first DP regret bound of order O(K^{3/5}) for model-free online RL under general function approximation and the first coverability-based regret bound for batched non-private RL.
A quantile-of-means ensemble method achieves minimax optimal variance-dependent regret bounds for finite-horizon MDPs without count-based uncertainty estimates.
citing papers explorer
-
Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification
Introduces KL misspecification for bandits and RL under function approximation and proves explicit KL-regret bounds for regression-based Gibbs algorithms that recover the realizable case.
-
Towards Differentially Private Reinforcement Learning with General Function Approximation
The work establishes the first DP regret bound of order O(K^{3/5}) for model-free online RL under general function approximation and the first coverability-based regret bound for batched non-private RL.
-
Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning
A quantile-of-means ensemble method achieves minimax optimal variance-dependent regret bounds for finite-horizon MDPs without count-based uncertainty estimates.