MENT benchmark plus RATE agentic evaluator raise combined system- and segment-level correlation with human judgments by at least 3.2 points over prior MT metrics and LLM judges.
https://arxiv.org/abs/2509.05209
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
NeoAMT trains an RL agent with a Wiktionary toolkit, novel reward, and adaptive rollouts to translate sentences containing neologisms in 16 languages and 75 directions.
A Bayesian calibration technique lets production teams compare replacement LLMs reliably using limited human feedback on correctness, refusal, and style.
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.
citing papers explorer
-
Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation
MENT benchmark plus RATE agentic evaluator raise combined system- and segment-level correlation with human judgments by at least 3.2 points over prior MT metrics and LLM judges.
-
NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning
NeoAMT trains an RL agent with a Wiktionary toolkit, novel reward, and adaptive rollouts to translate sentences containing neologisms in 16 languages and 75 directions.
-
When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems
A Bayesian calibration technique lets production teams compare replacement LLMs reliably using limited human feedback on correctness, refusal, and style.
-
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.