pith. sign in

arxiv: 1805.02971 · v2 · pith:SZYYRFEVnew · submitted 2018-05-08 · 💻 cs.LG · cs.AI· stat.ML

Multinomial Logit Bandit with Linear Utility Functions

classification 💻 cs.LG cs.AIstat.ML
keywords modelalgorithmcandidateitemlinearlogitmultinomialutility
0
0 comments X
read the original abstract

Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a $K$-cardinality subset from $N$ candidate items, and receives a reward which is governed by a {\it multinomial logit} (MNL) choice model considering both item utility and substitution property among items. The player's objective is to dynamically learn the parameters of MNL model and maximize cumulative reward over a finite horizon $T$. This problem faces the exploration-exploitation dilemma, and the involved combinatorial nature makes it non-trivial. In recent years, there have developed some algorithms by exploiting specific characteristics of the MNL model, but all of them estimate the parameters of MNL model separately and incur a regret no better than $\tilde{O}\big(\sqrt{NT}\big)$ which is not preferred for large candidate set size $N$. In this paper, we consider the {\it linear utility} MNL choice model whose item utilities are represented as linear functions of $d$-dimension item features, and propose an algorithm, titled {\bf LUMB}, to exploit the underlying structure. It is proven that the proposed algorithm achieves $\tilde{O}\big(dK\sqrt{T}\big)$ regret which is free of candidate set size. Experiments show the superiority of the proposed algorithm.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Optimal Online and Offline Algorithms for Contextual MNL with Applications to Assortment and Pricing

    math.OC 2026-04 unverdicted novelty 6.0

    New algorithms for joint contextual MNL assortment and pricing deliver improved online regret bounds of order W sqrt(d T log N)/L0 and local suboptimality guarantees offline.