arXiv preprint arXiv:2503.10497 , year=

Mmlu-prox: A multilingual benchmark for advanced large language model evaluation , author= · 1947 · arXiv 2503.10497

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Audited olympiad corpus and Physics-R1 recipe improve 8B VLM by up to 18 points on held-out physics problems while exposing contamination in prior evals.

Is Biomedical Specialization Still Worth It? Insights from Domain-Adaptive Language Modelling with a New French Health Corpus

cs.CL · 2026-04-08 · unverdicted · novelty 7.0

Domain-adaptive pre-training on a new French health corpus yields limited gains and risks general capability loss unless followed by model merging, which can even boost specialized performance.

COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.

LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance

cs.CL · 2026-05-21 · unverdicted · novelty 5.0

LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.

Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?

cs.CL · 2025-10-31 · unverdicted · novelty 5.0

Multilingual reasoning gaps in RLMs arise primarily from language understanding failures that can be detected and mitigated by selectively translating inputs to English.

citing papers explorer

Showing 5 of 5 citing papers.

Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning cs.CL · 2026-05-13 · unverdicted · none · ref 45
Audited olympiad corpus and Physics-R1 recipe improve 8B VLM by up to 18 points on held-out physics problems while exposing contamination in prior evals.
Is Biomedical Specialization Still Worth It? Insights from Domain-Adaptive Language Modelling with a New French Health Corpus cs.CL · 2026-04-08 · unverdicted · none · ref 13
Domain-adaptive pre-training on a new French health corpus yields limited gains and risks general capability loss unless followed by model merging, which can even boost specialized performance.
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling cs.LG · 2026-04-22 · unverdicted · none · ref 79
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance cs.CL · 2026-05-21 · unverdicted · none · ref 53
LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models? cs.CL · 2025-10-31 · unverdicted · none · ref 4
Multilingual reasoning gaps in RLMs arise primarily from language understanding failures that can be detected and mitigated by selectively translating inputs to English.

arXiv preprint arXiv:2503.10497 , year=

fields

years

verdicts

representative citing papers

citing papers explorer