D eep T rans: Deep Reasoning Translation via Reinforcement Learning

Jiaan Wang, Fandong Meng, Jie Zhou · 2026 · DOI 10.1162/tacl.a.65

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models

cs.LG · 2026-06-11 · unverdicted · novelty 7.0

LLM chain-of-thought crosses a commitment boundary early; subsequent steps are epiphenomenal, enabling early-exit that shortens traces 55% with negligible performance change.

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

RL with chrF reward trains LLMs to better utilize in-context linguistic knowledge for zero-shot translation of unseen languages, outperforming ICL and SFT.

When Languages Disagree: Self-Evolving Multilingual LLM Judges

cs.CL · 2026-06-06 · unverdicted · novelty 6.0

SEMJ is a self-evolving multilingual LLM judge that turns cross-lingual inconsistency into iterative self-reflection, outperforming voting and reflection baselines on accuracy and consistency.

citing papers explorer

Showing 3 of 3 citing papers.

Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models cs.LG · 2026-06-11 · unverdicted · none · ref 78
LLM chain-of-thought crosses a commitment boundary early; subsequent steps are epiphenomenal, enabling early-exit that shortens traces 55% with negligible performance change.
Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation cs.CL · 2026-06-04 · unverdicted · none · ref 45
RL with chrF reward trains LLMs to better utilize in-context linguistic knowledge for zero-shot translation of unseen languages, outperforming ICL and SFT.
When Languages Disagree: Self-Evolving Multilingual LLM Judges cs.CL · 2026-06-06 · unverdicted · none · ref 78
SEMJ is a self-evolving multilingual LLM judge that turns cross-lingual inconsistency into iterative self-reflection, outperforming voting and reflection baselines on accuracy and consistency.

D eep T rans: Deep Reasoning Translation via Reinforcement Learning

fields

years

verdicts

representative citing papers

citing papers explorer