Beam Search Strategies for Neural Machine Translation

Markus Freitag; Yaser Al-Onaizan

arxiv: 1702.01806 · v2 · pith:KJFVD6GSnew · submitted 2017-02-06 · 💻 cs.CL

Beam Search Strategies for Neural Machine Translation

Markus Freitag , Yaser Al-Onaizan This is my paper

classification 💻 cs.CL

keywords beamsearchtranslationbestcandidatesdecoderneuralperformance

0 comments

read the original abstract

The basic concept in Neural Machine Translation (NMT) is to train a large Neural Network that maximizes the translation performance on a given parallel corpus. NMT is then using a simple left-to-right beam-search decoder to generate new translations that approximately maximize the trained conditional probability. The current beam search strategy generates the target sentence word by word from left-to- right while keeping a fixed amount of active candidates at each time step. First, this simple search is less adaptive as it also expands candidates whose scores are much worse than the current best. Secondly, it does not expand hypotheses if they are not within the best scoring candidates, even if their scores are close to the best one. The latter one can be avoided by increasing the beam size until no performance improvement can be observed. While you can reach better performance, this has the draw- back of a slower decoding speed. In this paper, we concentrate on speeding up the decoder by applying a more flexible beam search strategy whose candidate size may vary at each time step depending on the candidate scores. We speed up the original decoder by up to 43% for the two language pairs German-English and Chinese-English without losing any translation quality.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Chronos: Learning the Language of Time Series
cs.LG 2024-03 conditional novelty 7.0

Chronos pretrains transformer models on tokenized time series to deliver strong zero-shot forecasting across diverse domains.
APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation
cs.CL 2026-05 unverdicted novelty 6.0

APCD reduces LLM hallucinations by expanding decoding paths adaptively when entropy signals uncertainty and by contrasting divergent paths to control their interaction.
TreeRanker: Fast and Model-agnostic Ranking System for Code Suggestions in IDEs
cs.SE 2025-08 unverdicted novelty 6.0

TreeRanker ranks static code completions by organizing candidates in a prefix tree and collecting token scores via a single greedy language-model decoding pass.
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
cs.LG 2023-10 conditional novelty 6.0

LURE reduces object hallucination in LVLMs by 23% via post-hoc revision informed by co-occurrence, uncertainty, and text position analysis.
APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation
cs.CL 2026-05 unverdicted novelty 5.0

APCD adaptively branches LLM decoding paths based on token entropy and contrasts divergent paths to improve factual accuracy while preserving efficiency.
AdaDec: A Uncertainty-Guided Lookahead Decoding Framework for LLM-Based Code Generation
cs.SE 2025-06 unverdicted novelty 5.0

AdaDec improves Pass@1 accuracy of LLM code generation by up to 20.9% over greedy decoding by triggering lookahead reranking only at high-uncertainty steps on HumanEval+, MBPP+, and DevEval.
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
cs.LG 2025-05 unverdicted novelty 5.0

TokUR estimates token-level uncertainty via low-rank weight perturbations in LLMs, aggregates signals to correlate with correctness, and uses them to improve reasoning performance on math tasks.