DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
arXiv preprint arXiv:2406.19976 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
TANDEM solves bi-level data mixture optimization for LLMs via twin proxy and reference networks that measure domain efficacy by model difference and up-weight beneficial domains, with claimed theoretical guarantees and gains in restricted-data and SFT settings.
citing papers explorer
-
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
-
TANDEM: Bi-Level Data Mixture Optimization with Twin Networks
TANDEM solves bi-level data mixture optimization for LLMs via twin proxy and reference networks that measure domain efficacy by model difference and up-weight beneficial domains, with claimed theoretical guarantees and gains in restricted-data and SFT settings.