Adaptivity in linear bandits for ε-best arm identification gives only logarithmic improvements on hypercube, ℓ2 ball, m-sets and multi-task settings but polynomial-factor gains on a specially constructed action set, enabled by an adaptive O(d log(1/δ)/ε²) ℓ2-norm estimator.
Conference on learning theory , pages=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 4years
2026 4verdicts
UNVERDICTED 4roles
method 1polarities
use method 1representative citing papers
A new primal-dual algorithm for adversarial linear CMDPs achieves the first sublinear regret and constraint violation bounds of order K to the 3/4 using weighted LogSumExp softmax policies with periodic mixing and regularized dual updates.
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
POOL is a new RL algorithm that adds privacy protection in continuous spaces with one-sided feedback and achieves sample complexity matching known non-private lower bounds.
citing papers explorer
-
On the Power of Adaptivity for $\varepsilon$-Best Arm Identification in Linear Bandits
Adaptivity in linear bandits for ε-best arm identification gives only logarithmic improvements on hypercube, ℓ2 ball, m-sets and multi-task settings but polynomial-factor gains on a specially constructed action set, enabled by an adaptive O(d log(1/δ)/ε²) ℓ2-norm estimator.
-
Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses
A new primal-dual algorithm for adversarial linear CMDPs achieves the first sublinear regret and constraint violation bounds of order K to the 3/4 using weighted LogSumExp softmax policies with periodic mixing and regularized dual updates.
-
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
-
Privacy Preserving Reinforcement Learning with One-Sided Feedback
POOL is a new RL algorithm that adds privacy protection in continuous spaces with one-sided feedback and achieves sample complexity matching known non-private lower bounds.