Elimination algorithm for stochastic one-sided full-info bandits achieves O(sqrt(T log(TK))) distribution-independent regret and a gap-dependent bound, claimed as the best theoretical result.
Machine Learning 47(2-3), 235–256 (2002)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2019 2verdicts
UNVERDICTED 2representative citing papers
MLPP integrates multilevel Monte Carlo into MCTS to accelerate online POMDP solving for complex dynamics, with experiments indicating outperformance over prior solvers on torque control, navigation, and grasping tasks.
citing papers explorer
-
Stochastic One-Sided Full-Information Bandit
Elimination algorithm for stochastic one-sided full-info bandits achieves O(sqrt(T log(TK))) distribution-independent regret and a gap-dependent bound, claimed as the best theoretical result.
-
Multilevel Monte-Carlo for Solving POMDPs Online
MLPP integrates multilevel Monte Carlo into MCTS to accelerate online POMDP solving for complex dynamics, with experiments indicating outperformance over prior solvers on torque control, navigation, and grasping tasks.