Fast Slate Policy Optimization: Going Beyond Plackett-Luce

David Rohde; Nicolas Chopin; Otmane Sakhi

arxiv: 2308.01566 · v2 · pith:EWZ3AALWnew · submitted 2023-08-03 · 💻 cs.LG · cs.IR· stat.ML

Fast Slate Policy Optimization: Going Beyond Plackett-Luce

Otmane Sakhi , David Rohde , Nicolas Chopin This is my paper

classification 💻 cs.LG cs.IRstat.ML

keywords systemsactiondecisionlargelearningoptimizationpolicyclass

0 comments

read the original abstract

An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Credit-assigned Policy Gradient for Early Stage Retrieval in Two-stage Ranking
cs.IR 2026-05 unverdicted novelty 6.0

CA-PG reduces variance in Plackett-Luce ESR training by computing gradients on marginal item-inclusion probabilities rather than joint candidate-set probabilities.