BLCE-G and BLCE achieve minimax-optimal regret for linear contextual bandits with only O(log log T) parameter updates and reduced computational cost by avoiding near G-optimal design.
arXiv preprint arXiv:2311.08376 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A quantile-of-means ensemble method achieves minimax optimal variance-dependent regret bounds for finite-horizon MDPs without count-based uncertainty estimates.
citing papers explorer
-
Practical and Optimal Algorithm for Linear Contextual Bandits with Rare Parameter Updates
BLCE-G and BLCE achieve minimax-optimal regret for linear contextual bandits with only O(log log T) parameter updates and reduced computational cost by avoiding near G-optimal design.