Differentiable Dynamic Programming for Structured Prediction and Attention

Arthur Mensch; Mathieu Blondel

arxiv: 1802.03676 · v2 · pith:SMXKV6AYnew · submitted 2018-02-11 · 📊 stat.ML · cs.LG

Differentiable Dynamic Programming for Structured Prediction and Attention

Arthur Mensch , Mathieu Blondel This is my paper

classification 📊 stat.ML cs.LG

keywords structureddynamicpredictionprogrammingalgorithmalgorithmsattentioncombinatorial

0 comments

read the original abstract

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Regularized Large Neighborhood Search
cs.LG 2026-06 unverdicted novelty 7.0

RLNS regularizes LNS to perform block Gibbs sampling under entropy, interpolating between pseudolikelihood and exact MLE for differentiable combinatorial optimization.