A general framework for sequential decision- making under adaptivity constraints

The role of coverage in online reinforcement learning , author= · 2023 · arXiv 2210.04157

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

Introduces KL misspecification for bandits and RL under function approximation and proves explicit KL-regret bounds for regression-based Gibbs algorithms that recover the realizable case.

Towards Differentially Private Reinforcement Learning with General Function Approximation

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

The work establishes the first DP regret bound of order O(K^{3/5}) for model-free online RL under general function approximation and the first coverability-based regret bound for batched non-private RL.

Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning

cs.LG · 2026-06-18 · unverdicted · novelty 6.0

A quantile-of-means ensemble method achieves minimax optimal variance-dependent regret bounds for finite-horizon MDPs without count-based uncertainty estimates.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification cs.LG · 2026-06-04 · unverdicted · none · ref 19
Introduces KL misspecification for bandits and RL under function approximation and proves explicit KL-regret bounds for regression-based Gibbs algorithms that recover the realizable case.
Towards Differentially Private Reinforcement Learning with General Function Approximation cs.LG · 2026-05-07 · unverdicted · none · ref 13
The work establishes the first DP regret bound of order O(K^{3/5}) for model-free online RL under general function approximation and the first coverability-based regret bound for batched non-private RL.
Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning cs.LG · 2026-06-18 · unverdicted · none · ref 37
A quantile-of-means ensemble method achieves minimax optimal variance-dependent regret bounds for finite-horizon MDPs without count-based uncertainty estimates.

A general framework for sequential decision- making under adaptivity constraints

fields

years

verdicts

representative citing papers

citing papers explorer