BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.
Flexolmo: Open language models for flexible data use.arXiv preprint arXiv:2507.07024
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
support 1representative citing papers
Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
PubSwap uses a small public dataset for selective off-policy response swapping in federated RLVR to improve coordination and performance over standard baselines on math and medical reasoning tasks.
citing papers explorer
-
Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts
BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.
-
A Human-Centric Framework for Data Attribution in Large Language Models
Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
-
PubSwap: Public-Data Off-Policy Coordination for Federated RLVR
PubSwap uses a small public dataset for selective off-policy response swapping in federated RLVR to improve coordination and performance over standard baselines on math and medical reasoning tasks.