Alternating Linear Bandits for Online Matrix-Factorization Recommendation
read the original abstract
We consider the problem of online collaborative filtering in the online setting, where items are recommended to the users over time. At each time step, the user (selected by the environment) consumes an item (selected by the agent) and provides a rating of the selected item. In this paper, we propose a novel algorithm for online matrix factorization recommendation that combines linear bandits and alternating least squares. In this formulation, the bandit feedback is equal to the difference between the ratings of the best and selected items. We evaluate the performance of the proposed algorithm over time using both cumulative regret and average cumulative NDCG. Simulation results over three synthetic datasets as well as three real-world datasets for online collaborative filtering indicate the superior performance of the proposed algorithm over two state-of-the-art online algorithms.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
The Bandit's Blind Spot: The Critical Role of User State Representation in Recommender Systems
Variations in user state embeddings for CMAB recommenders can improve performance more than changing the bandit algorithm, with no embedding or aggregation strategy dominating across datasets.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.