SOLAR aligns soft-token probability mixtures across languages in embedding space during SFT and raises multilingual reasoning accuracy by up to 17.7 points over the base model.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
9 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 9years
2026 9verdicts
UNVERDICTED 9representative citing papers
ST-Merge uses gated cross-attention to adaptively weight source models during merging, outperforming baselines on multilingual reasoning tasks across 21 languages.
Luar is a reinforcement learning method enabling reasoning language models to decide when to invoke English translation for improved multilingual reasoning.
Unsupervised RL enforces cross-lingual self-consistency to improve multilingual math reasoning by up to 21.7% on MGSM without gold answers or parallel data, with generalization to unseen languages.
Macro uses DPO on composite preference pairs to raise validity of multilingual self-generated counterfactual explanations by 12.55% on average over chain-of-thought while preserving minimality.
COPSD improves mathematical reasoning in low-resource languages by having LLMs self-distill from their own high-resource English behavior via token-level divergence on rollouts with privileged crosslingual context.
CroCo applies English-reward-ranked self-generations for contrastive preference tuning that improves two LLMs on structured and open-ended tasks across 14 languages without language-specific annotations.
LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.
Treating language as a latent variable via polyGRPO RL improves Qwen2.5-7B-Instruct by 6.72% on English reasoning benchmarks and 6.89% on multilingual ones, with cross-task gains on commonsense reasoning from math-only training.
citing papers explorer
-
Soft Token Alignment for Cross-Lingual Reasoning
SOLAR aligns soft-token probability mixtures across languages in embedding space during SFT and raises multilingual reasoning accuracy by up to 17.7 points over the base model.
-
Enhancing Multilingual Reasoning via Steerable Model Merging
ST-Merge uses gated cross-attention to adaptively weight source models during merging, outperforming baselines on multilingual reasoning tasks across 21 languages.
-
Learning When to Translate for Multilingual Reasoning
Luar is a reinforcement learning method enabling reasoning language models to decide when to invoke English translation for improved multilingual reasoning.
-
Cross-lingual Self-Consistency for Multilingual Reasoning with Language Models
Unsupervised RL enforces cross-lingual self-consistency to improve multilingual math reasoning by up to 21.7% on MGSM without gold answers or parallel data, with generalization to unseen languages.
-
Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization
Macro uses DPO on composite preference pairs to raise validity of multilingual self-generated counterfactual explanations by 12.55% on average over chain-of-thought while preserving minimality.
-
Crosslingual On-Policy Self-Distillation for Multilingual Reasoning
COPSD improves mathematical reasoning in low-resource languages by having LLMs self-distill from their own high-resource English behavior via token-level divergence on rollouts with privileged crosslingual context.
-
CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations
CroCo applies English-reward-ranked self-generations for contrastive preference tuning that improves two LLMs on structured and open-ended tasks across 14 languages without language-specific annotations.
-
LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance
LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.
-
Language as a Latent Variable for Reasoning Optimization
Treating language as a latent variable via polyGRPO RL improves Qwen2.5-7B-Instruct by 6.72% on English reasoning benchmarks and 6.89% on multilingual ones, with cross-task gains on commonsense reasoning from math-only training.