Deep Reinforcement Learning for List-wise Recommendations

Xiangyu Zhao, Liang Zhang, Long Xia, Zhuoye Ding, Dawei Yin, Jiliang Tang · 2017 · cs.LG · arXiv 1801.00209

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.

representative citing papers

Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation

cs.IR · 2026-05-06 · conditional · novelty 7.0

BLADE uses Bayesian list-wise alignment with dynamic estimation to create a self-evolving target that overcomes limitations of static references in LLM-based recommendation, yielding sustained gains in ranking and complex metrics.

Combinatorial Keyword Recommendations for Sponsored Search with Deep Reinforcement Learning

cs.IR · 2019-07-18 · unverdicted · novelty 5.0

A modified pointer network trained with actor-critic DRL and Equal Size K-Means clustering is applied to combinatorial keyword recommendation in sponsored search, reporting offline and online gains.

Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce

cs.LG · 2025-12-13 · unverdicted · novelty 4.0

Reinforcement learning policies for time-constrained slate recommendations improve engagement over contextual bandits in e-commerce settings.

citing papers explorer

Showing 3 of 3 citing papers.

Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation cs.IR · 2026-05-06 · conditional · none · ref 46
BLADE uses Bayesian list-wise alignment with dynamic estimation to create a self-evolving target that overcomes limitations of static references in LLM-based recommendation, yielding sustained gains in ranking and complex metrics.
Combinatorial Keyword Recommendations for Sponsored Search with Deep Reinforcement Learning cs.IR · 2019-07-18 · unverdicted · none · ref 13 · internal anchor
A modified pointer network trained with actor-critic DRL and Equal Size K-Means clustering is applied to combinatorial keyword recommendation in sponsored search, reporting offline and online gains.
Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce cs.LG · 2025-12-13 · unverdicted · none · ref 17 · internal anchor
Reinforcement learning policies for time-constrained slate recommendations improve engagement over contextual bandits in e-commerce settings.

Deep Reinforcement Learning for List-wise Recommendations

fields

years

verdicts

representative citing papers

citing papers explorer