pith. machine review for the scientific record. sign in

arxiv: 1711.02281 · v2 · submitted 2017-11-07 · 💻 cs.CL · cs.LG

Recognition: unknown

Non-Autoregressive Neural Machine Translation

Authors on Pith no claims yet
classification 💻 cs.CL cs.LG
keywords autoregressivebleufertilitiesinferencemachinemodelneuralnon-autoregressive
0
0 comments X
read the original abstract

Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we achieve this at a cost of as little as 2.0 BLEU points relative to the autoregressive Transformer network used as a teacher. We demonstrate substantial cumulative improvements associated with each of the three aspects of our training strategy, and validate our approach on IWSLT 2016 English-German and two WMT language pairs. By sampling fertilities in parallel at inference time, our non-autoregressive model achieves near-state-of-the-art performance of 29.8 BLEU on WMT 2016 English-Romanian.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

    cs.HC 2026-05 unverdicted novelty 7.0

    HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.

  2. PlayGen-MoG: Framework for Diverse Multi-Agent Play Generation via Mixture-of-Gaussians Trajectory Prediction

    cs.CV 2026-04 unverdicted novelty 7.0

    PlayGen-MoG uses a shared Mixture-of-Gaussians head across agents plus relative attention to generate diverse coordinated plays from a single static formation, achieving 1.68 yard ADE and 3.98 yard FDE with full mixtu...

  3. BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

    cs.CL 2026-05 unverdicted novelty 6.0

    BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.

  4. Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts

    cs.CL 2026-05 conditional novelty 6.0

    Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.