Order Matters: Sequence to sequence for sets
read the original abstract
Sequences have become first class citizens in supervised learning thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations can now be formulated with the sequence-to-sequence (seq2seq) framework which employs the chain rule to efficiently represent the joint probability of sequences. In many cases, however, variable sized inputs and/or outputs might not be naturally expressed as sequences. For instance, it is not clear how to input a set of numbers into a model where the task is to sort them; similarly, we do not know how to organize outputs when they correspond to random variables and the task is to model their unknown joint probability. In this paper, we first show using various examples that the order in which we organize input and/or output data matters significantly when learning an underlying model. We then discuss an extension of the seq2seq framework that goes beyond sequences and handles input sets in a principled way. In addition, we propose a loss which, by searching over possible orders during training, deals with the lack of structure of output sets. We show empirical evidence of our claims regarding ordering, and on the modifications to the seq2seq framework on benchmark language modeling and parsing tasks, as well as two artificial tasks -- sorting numbers and estimating the joint probability of unknown graphical models.
This paper has not been read by Pith yet.
Forward citations
Cited by 11 Pith papers
-
Density estimation using Real NVP
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
-
Adaptive Computation Time for Recurrent Neural Networks
ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.
-
ArrowFlow: Hierarchical Machine Learning in the Space of Permutations
ArrowFlow demonstrates that competitive classification is possible using a hierarchical architecture of ranking filters in permutation space, with a single polynomial-degree parameter controlling robustness-accuracy t...
-
The product structure of MPS-under-permutations
TI MPS with permutational symmetry (entanglement similar across bipartitions) are shown to be trivial (product states or few superpositions); extends to generic MPS and states like W and Dicke approximately.
-
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.
-
Learning to Theorize the World from Observation
NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
-
Learning from Historical Activations in Graph Neural Networks
HISTOGRAPH applies unified layer-wise attention followed by node-wise attention over historical GNN activations to improve graph classification, especially in deep models.
-
Modal Decomposition and Identification for a Population of Structures Using Physics-Informed Graph Neural Networks and Transformers
A physics-informed GNN-transformer model performs unsupervised modal decomposition and identification for populations of structures from sparse dynamic measurements.
-
Attention-Based Deep Reinforcement Learning for Qubit Allocation in Modular Quantum Architectures
An attention-based DRL agent with Transformer encoder and GNN learns heuristics for qubit-to-core allocation in multi-core quantum systems to minimize state transfers and online compilation time.
-
Uncertainty-aware Generative Learning Path Recommendation with Cognition-Adaptive Diffusion
U-GLAD models learner uncertainty with Gaussian LSTMs and uses cognition-adaptive diffusion to generate goal-aligned learning path recommendations that outperform baselines on public datasets.
-
Explaining the Explainers in Graph Neural Networks: a Comparative Study
Benchmark study of ten GNN explainers on eight architectures and six datasets that isolates usable components and issues practical recommendations.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.