pith. machine review for the scientific record. sign in

arxiv: 1511.06391 · v4 · submitted 2015-11-19 · 📊 stat.ML · cs.CL· cs.LG

Recognition: unknown

Order Matters: Sequence to sequence for sets

Authors on Pith no claims yet
classification 📊 stat.ML cs.CLcs.LG
keywords sequencesframeworkinputjointmodelprobabilityseq2seqsequence
0
0 comments X
read the original abstract

Sequences have become first class citizens in supervised learning thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations can now be formulated with the sequence-to-sequence (seq2seq) framework which employs the chain rule to efficiently represent the joint probability of sequences. In many cases, however, variable sized inputs and/or outputs might not be naturally expressed as sequences. For instance, it is not clear how to input a set of numbers into a model where the task is to sort them; similarly, we do not know how to organize outputs when they correspond to random variables and the task is to model their unknown joint probability. In this paper, we first show using various examples that the order in which we organize input and/or output data matters significantly when learning an underlying model. We then discuss an extension of the seq2seq framework that goes beyond sequences and handles input sets in a principled way. In addition, we propose a loss which, by searching over possible orders during training, deals with the lack of structure of output sets. We show empirical evidence of our claims regarding ordering, and on the modifications to the seq2seq framework on benchmark language modeling and parsing tasks, as well as two artificial tasks -- sorting numbers and estimating the joint probability of unknown graphical models.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Density estimation using Real NVP

    cs.LG 2016-05 accept novelty 8.0

    Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

  2. Adaptive Computation Time for Recurrent Neural Networks

    cs.NE 2016-03 accept novelty 8.0

    ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

  3. ArrowFlow: Hierarchical Machine Learning in the Space of Permutations

    cs.LG 2026-04 unverdicted novelty 7.0

    ArrowFlow demonstrates that competitive classification is possible using a hierarchical architecture of ranking filters in permutation space, with a single polynomial-degree parameter controlling robustness-accuracy t...

  4. Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph

    cs.LG 2026-05 unverdicted novelty 6.0

    GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.

  5. Learning to Theorize the World from Observation

    cs.LG 2026-05 unverdicted novelty 6.0

    NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.

  6. Uncertainty-aware Generative Learning Path Recommendation with Cognition-Adaptive Diffusion

    cs.IR 2026-04 unverdicted novelty 5.0

    U-GLAD models learner uncertainty with Gaussian LSTMs and uses cognition-adaptive diffusion to generate goal-aligned learning path recommendations that outperform baselines on public datasets.