Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

Anirban Bhattacharya; Ping-Chun Hsieh; P. R. Kumar; Xi Liu

arxiv: 1810.12418 · v4 · pith:VUM4YHI2new · submitted 2018-10-29 · 💻 cs.LG · stat.ML

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

Ping-Chun Hsieh , Xi Liu , Anirban Bhattacharya , P. R. Kumar This is my paper

classification 💻 cs.LG stat.ML

keywords renegingoutcomeallowsapplicationsbanditsheteroscedastichr-ucblevel

0 comments

read the original abstract

Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a "reneging" phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct "satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves $\mathcal{O}\big(\sqrt{{T}(\log({T}))^{3}}\big)$ regret. Finally, we validate the performance of HR-UCB via simulations.

This paper has not been read by Pith yet.

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

discussion (0)