Reinforcement learning from comparisons: Three alternatives is enough, two is not

Benoit Laslier; Jean-Francois Laslier

arxiv: 1301.5734 · v1 · pith:BOE4V5FBnew · submitted 2013-01-24 · 🧮 math.OC · cs.LG· math.PR

Reinforcement learning from comparisons: Three alternatives is enough, two is not

Benoit Laslier , Jean-Francois Laslier This is my paper

classification 🧮 math.OC cs.LGmath.PR

keywords alternativescomparisonsreinforcementrandomthreewhenalternativealways

0 comments

read the original abstract

The paper deals with the problem of finding the best alternatives on the basis of pairwise comparisons when these comparisons need not be transitive. In this setting, we study a reinforcement urn model. We prove convergence to the optimal solution when reinforcement of a winning alternative occurs each time after considering three random alternatives. The simpler process, which reinforces the winner of a random pair does not always converges: it may cycle.

This paper has not been read by Pith yet.

Reinforcement learning from comparisons: Three alternatives is enough, two is not

discussion (0)