GRPO reinforcement learning on the new PolyFact dataset outperforms SFT and CPT for cross-lingual factual consistency in Qwen-2.5-7B and OLMo-2-7B by reducing language specialization in MLP and attention layers.
A Glimpse into Babel: An Analysis of Multilinguality in
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning
GRPO reinforcement learning on the new PolyFact dataset outperforms SFT and CPT for cross-lingual factual consistency in Qwen-2.5-7B and OLMo-2-7B by reducing language specialization in MLP and attention layers.