Audited olympiad corpus and Physics-R1 recipe improve 8B VLM by up to 18 points on held-out physics problems while exposing contamination in prior evals.
arXiv preprint arXiv:2503.10497 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
Domain-adaptive pre-training on a new French health corpus yields limited gains and risks general capability loss unless followed by model merging, which can even boost specialized performance.
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.
Multilingual reasoning gaps in RLMs arise primarily from language understanding failures that can be detected and mitigated by selectively translating inputs to English.
citing papers explorer
-
Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning
Audited olympiad corpus and Physics-R1 recipe improve 8B VLM by up to 18 points on held-out physics problems while exposing contamination in prior evals.
-
Is Biomedical Specialization Still Worth It? Insights from Domain-Adaptive Language Modelling with a New French Health Corpus
Domain-adaptive pre-training on a new French health corpus yields limited gains and risks general capability loss unless followed by model merging, which can even boost specialized performance.
-
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
-
LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance
LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.
-
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
Multilingual reasoning gaps in RLMs arise primarily from language understanding failures that can be detected and mitigated by selectively translating inputs to English.