COPSD improves mathematical reasoning in low-resource languages by having LLMs self-distill from their own high-resource English behavior via token-level divergence on rollouts with privileged crosslingual context.
When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.CL 3years
2026 3representative citing papers
LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.
citing papers explorer
-
Crosslingual On-Policy Self-Distillation for Multilingual Reasoning
COPSD improves mathematical reasoning in low-resource languages by having LLMs self-distill from their own high-resource English behavior via token-level divergence on rollouts with privileged crosslingual context.
-
LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance
LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.
- Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization