Order Matters: Sequence to sequence for sets

Manjunath Kudlur; Oriol Vinyals; Samy Bengio

arxiv: 1511.06391 · v4 · pith:6KMBPTECnew · submitted 2015-11-19 · 📊 stat.ML · cs.CL· cs.LG

Order Matters: Sequence to sequence for sets

Oriol Vinyals , Samy Bengio , Manjunath Kudlur This is my paper

classification 📊 stat.ML cs.CLcs.LG

keywords sequencesframeworkinputjointmodelprobabilityseq2seqsequence

0 comments

read the original abstract

Sequences have become first class citizens in supervised learning thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations can now be formulated with the sequence-to-sequence (seq2seq) framework which employs the chain rule to efficiently represent the joint probability of sequences. In many cases, however, variable sized inputs and/or outputs might not be naturally expressed as sequences. For instance, it is not clear how to input a set of numbers into a model where the task is to sort them; similarly, we do not know how to organize outputs when they correspond to random variables and the task is to model their unknown joint probability. In this paper, we first show using various examples that the order in which we organize input and/or output data matters significantly when learning an underlying model. We then discuss an extension of the seq2seq framework that goes beyond sequences and handles input sets in a principled way. In addition, we propose a loss which, by searching over possible orders during training, deals with the lack of structure of output sets. We show empirical evidence of our claims regarding ordering, and on the modifications to the seq2seq framework on benchmark language modeling and parsing tasks, as well as two artificial tasks -- sorting numbers and estimating the joint probability of unknown graphical models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Density estimation using Real NVP
cs.LG 2016-05 accept novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
Adaptive Computation Time for Recurrent Neural Networks
cs.NE 2016-03 accept novelty 8.0

ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.
ArrowFlow: Hierarchical Machine Learning in the Space of Permutations
cs.LG 2026-04 unverdicted novelty 7.0

ArrowFlow demonstrates that competitive classification is possible using a hierarchical architecture of ranking filters in permutation space, with a single polynomial-degree parameter controlling robustness-accuracy t...
The product structure of MPS-under-permutations
quant-ph 2024-10 unverdicted novelty 7.0

TI MPS with permutational symmetry (entanglement similar across bipartitions) are shown to be trivial (product states or few superpositions); extends to generic MPS and states like W and Dicke approximately.
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
cs.LG 2026-05 unverdicted novelty 6.0

GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.
Learning to Theorize the World from Observation
cs.LG 2026-05 unverdicted novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
Learning from Historical Activations in Graph Neural Networks
cs.LG 2026-01 unverdicted novelty 6.0

HISTOGRAPH applies unified layer-wise attention followed by node-wise attention over historical GNN activations to improve graph classification, especially in deep models.
Modal Decomposition and Identification for a Population of Structures Using Physics-Informed Graph Neural Networks and Transformers
cs.CE 2025-05 unverdicted novelty 6.0

A physics-informed GNN-transformer model performs unsupervised modal decomposition and identification for populations of structures from sparse dynamic measurements.
Attention-Based Deep Reinforcement Learning for Qubit Allocation in Modular Quantum Architectures
quant-ph 2024-06 unverdicted novelty 6.0

An attention-based DRL agent with Transformer encoder and GNN learns heuristics for qubit-to-core allocation in multi-core quantum systems to minimize state transfers and online compilation time.
Uncertainty-aware Generative Learning Path Recommendation with Cognition-Adaptive Diffusion
cs.IR 2026-04 unverdicted novelty 5.0

U-GLAD models learner uncertainty with Gaussian LSTMs and uses cognition-adaptive diffusion to generate goal-aligned learning path recommendations that outperform baselines on public datasets.
Explaining the Explainers in Graph Neural Networks: a Comparative Study
cs.LG 2022-10 unverdicted novelty 5.0

Benchmark study of ten GNN explainers on eight architectures and six datasets that isolates usable components and issues practical recommendations.