A Bayesian inference framework is presented to infer parameters of motif rate equations from ligation count data generated by strand reactor simulations in reactive nucleic acid systems.
”PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning.” arXiv preprint arXiv:2602.03352 (2026)
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
Reinforcement learning (RL) has shown strong promise for LLM-based machine translation, with recent methods such as GRPO demonstrating notable gains; nevertheless, translation-oriented RL remains challenged by noisy learning signals arising from Monte Carlo return estimation, as well as a large trajectory space that favors global exploration over fine-grained local optimization. We introduce \textbf{PEGRL}, a \textit{two-stage} RL framework that uses post-editing as an auxiliary task to stabilize training and guide overall optimization. At each iteration, translation outputs are sampled to construct post-editing inputs, allowing return estimation in the post-editing stage to benefit from conditioning on the current translation behavior, while jointly supporting both global exploration and fine-grained local optimization. A task-specific weighting scheme further balances the contributions of translation and post-editing objectives, yielding a biased yet more sample-efficient estimator. Experiments on English$\to$Finnish, English$\to$Turkish, and English$\leftrightarrow$Chinese show consistent gains over RL baselines, and for English$\to$Turkish, performance on COMET-KIWI is comparable to advanced LLM-based systems (DeepSeek-V3.2). Our code and a set of representative pretrained models are publicly available at \url{https://github.com/NJUNLP/peg-rl} and \url{https://huggingface.co/collections/DGME/pegrl}
years
2026 2verdicts
UNVERDICTED 2representative citing papers
DPO with backtranslation post-training raises English-to-German COMET from 0.703 to 0.747 on gemma3-1b.
citing papers explorer
No citing papers match the current filters.