Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
arXiv preprint arXiv:1802.06901 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
abstract
We propose a conditional non-autoregressive neural sequence model based on iterative refinement. The proposed model is designed based on the principles of latent variable models and denoising autoencoders, and is generally applicable to any sequence generation task. We extensively evaluate the proposed model on machine translation (En-De and En-Ro) and image caption generation, and observe that it significantly speeds up decoding while maintaining the generation quality comparable to the autoregressive counterpart.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Reinforce-NAT and FS-decoder retrieve target sequential information for non-autoregressive translation, yielding higher BLEU than baseline NAT while preserving fast decoding and approaching autoregressive quality.
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
The paper proposes CDCD, a continuous-time and continuous-space diffusion framework for categorical data, and reports results on language modeling tasks.
A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.
citing papers explorer
-
Flow Map Language Models: One-step Language Modeling via Continuous Denoising
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
-
Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation
Reinforce-NAT and FS-decoder retrieve target sequential information for non-autoregressive translation, yielding higher BLEU than baseline NAT while preserving fast decoding and approaching autoregressive quality.
-
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
-
Continuous diffusion for categorical data
The paper proposes CDCD, a continuous-time and continuous-space diffusion framework for categorical data, and reports results on language modeling tasks.
-
Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities
A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.