Variance-aware neural dueling bandit algorithms achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(d T)) for wide networks on nonlinear utilities.
Neural contextual bandits with deep representation and shallow exploration.arXiv preprint arXiv:2012.01780,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
EE-Net is a contextual bandit algorithm that pairs an exploitation neural net with a separate exploration neural net and proves an instance-dependent Õ(√T) regret bound while beating linear and neural baselines on real data.
citing papers explorer
-
Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
Variance-aware neural dueling bandit algorithms achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(d T)) for wide networks on nonlinear utilities.
-
Neural Exploitation and Exploration of Contextual Bandits
EE-Net is a contextual bandit algorithm that pairs an exploitation neural net with a separate exploration neural net and proves an instance-dependent Õ(√T) regret bound while beating linear and neural baselines on real data.