”PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning.” arXiv preprint arXiv:2602.03352 (2026)

Shen, Yunzhi, et al · 2026 · cs.CL · arXiv 2602.03352

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Reinforcement learning (RL) has shown strong promise for LLM-based machine translation, with recent methods such as GRPO demonstrating notable gains; nevertheless, translation-oriented RL remains challenged by noisy learning signals arising from Monte Carlo return estimation, as well as a large trajectory space that favors global exploration over fine-grained local optimization. We introduce \textbf{PEGRL}, a \textit{two-stage} RL framework that uses post-editing as an auxiliary task to stabilize training and guide overall optimization. At each iteration, translation outputs are sampled to construct post-editing inputs, allowing return estimation in the post-editing stage to benefit from conditioning on the current translation behavior, while jointly supporting both global exploration and fine-grained local optimization. A task-specific weighting scheme further balances the contributions of translation and post-editing objectives, yielding a biased yet more sample-efficient estimator. Experiments on English$\to$Finnish, English$\to$Turkish, and English$\leftrightarrow$Chinese show consistent gains over RL baselines, and for English$\to$Turkish, performance on COMET-KIWI is comparable to advanced LLM-based systems (DeepSeek-V3.2). Our code and a set of representative pretrained models are publicly available at \url{https://github.com/NJUNLP/peg-rl} and \url{https://huggingface.co/collections/DGME/pegrl}

representative citing papers

Bayesian Rate Inference for Sequence Motif Dynamics in Systems of Reactive Nucleic Acids

physics.bio-ph · 2026-04-28 · unverdicted · novelty 5.0

A Bayesian inference framework is presented to infer parameters of motif rate equations from ligation count data generated by strand reactor simulations in reactive nucleic acid systems.

Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation

cs.CL · 2026-04-28 · unverdicted · novelty 4.0

DPO with backtranslation post-training raises English-to-German COMET from 0.703 to 0.747 on gemma3-1b.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Bayesian Rate Inference for Sequence Motif Dynamics in Systems of Reactive Nucleic Acids physics.bio-ph · 2026-04-28 · unverdicted · none · ref 9 · internal anchor
A Bayesian inference framework is presented to infer parameters of motif rate equations from ligation count data generated by strand reactor simulations in reactive nucleic acid systems.
Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation cs.CL · 2026-04-28 · unverdicted · none · ref 9 · internal anchor
DPO with backtranslation post-training raises English-to-German COMET from 0.703 to 0.747 on gemma3-1b.

”PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning.” arXiv preprint arXiv:2602.03352 (2026)

fields

years

verdicts

representative citing papers

citing papers explorer