On Long-Tailed Phenomena in Neural Machine Translation

Florian Metze; Siddharth Dalmia; Vikas Raunak; Vivek Gupta

arxiv: 2010.04924 · v1 · pith:GC556WP7new · submitted 2020-10-10 · 💻 cs.CL · cs.AI· cs.LG

On Long-Tailed Phenomena in Neural Machine Translation

Vikas Raunak , Siddharth Dalmia , Vivek Gupta , Florian Metze This is my paper

classification 💻 cs.CL cs.AIcs.LG

keywords generationlong-tailedmachinephenomenatranslationlosslow-frequencyneural

0 comments

read the original abstract

State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge. The analysis of long-tailed phenomena in the context of structured prediction tasks is further hindered by the added complexities of search during inference. In this work, we quantitatively characterize such long-tailed phenomena at two levels of abstraction, namely, token classification and sequence generation. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation by incorporating the inductive biases of beam search in the training process. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy across different language pairs, especially on the generation of low-frequency words. We have released the code to reproduce our results.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Static to Interactive: Adapting Visual in-Context Learners for User-Driven Tasks
cs.CV 2026-04 unverdicted novelty 6.0

Encoding user interactions into visual in-context example pairs turns static models into controllable systems that improve IoU, PSNR, and LPIPS on guided tasks without retraining.