pith. sign in

arxiv: 1810.00737 · v1 · pith:CUQPWMVRnew · submitted 2018-10-01 · 💻 cs.LG · stat.ML

Risk-Averse Stochastic Convex Bandit

classification 💻 cs.LG stat.ML
keywords banditconvexproblemalgorithmfirstonlinerisk-averseachieves
0
0 comments X
read the original abstract

Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.