Learning Permutations with Sinkhorn Policy Gradient

Learning permutations with sinkhorn policy gradient , author= · 2018 · cs.LG · arXiv 1805.07010

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Many problems at the intersection of combinatorics and computer science require solving for a permutation that optimally matches, ranks, or sorts some data. These problems usually have a task-specific, often non-differentiable objective function that data-driven algorithms can use as a learning signal. In this paper, we propose the Sinkhorn Policy Gradient (SPG) algorithm for learning policies on permutation matrices. The actor-critic neural network architecture we introduce for SPG uniquely decouples representation learning of the state space from the highly-structured action space of permutations with a temperature-controlled Sinkhorn layer. The Sinkhorn layer produces continuous relaxations of permutation matrices so that the actor-critic architecture can be trained end-to-end. Our empirical results show that agents trained with SPG can perform competitively on sorting, the Euclidean TSP, and matching tasks. We also observe that SPG is significantly more data efficient at the matching task than the baseline methods, which indicates that SPG is conducive to learning representations that are useful for reasoning about permutations.

representative citing papers

Learning Unbiased Permutations via Flow Matching

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

PermFlow applies conditional flow matching on the affine subspace of doubly stochastic matrices with a closed-form tangent projector and nearest-target coupling to capture multimodal permutation distributions.

Omni-scale Learning-based Sequential Decision Framework for Order Fulfillment of Tote-handling Robotic Systems

cs.RO · 2026-05-09 · unverdicted · novelty 7.0

OLSF-TRS is a generalized sequential decision framework using structured combinatorial optimization and multi-agent reinforcement learning for order-tote-robot coordination in tote-handling robotic systems, with near-optimal performance on small scales and 8-30%+ improvements over heuristics onlarge

citing papers explorer

Showing 2 of 2 citing papers.

Learning Unbiased Permutations via Flow Matching cs.LG · 2026-05-16 · unverdicted · none · ref 7 · internal anchor
PermFlow applies conditional flow matching on the affine subspace of doubly stochastic matrices with a closed-form tangent projector and nearest-target coupling to capture multimodal permutation distributions.
Omni-scale Learning-based Sequential Decision Framework for Order Fulfillment of Tote-handling Robotic Systems cs.RO · 2026-05-09 · unverdicted · none · ref 97
OLSF-TRS is a generalized sequential decision framework using structured combinatorial optimization and multi-agent reinforcement learning for order-tote-robot coordination in tote-handling robotic systems, with near-optimal performance on small scales and 8-30%+ improvements over heuristics onlarge

Learning Permutations with Sinkhorn Policy Gradient

fields

years

verdicts

representative citing papers

citing papers explorer