arXiv preprint arXiv:2406.19976 , year=

Rui Pan, Jipeng Zhang, Xingyuan Pan, Renjie Pi, Xiaoyu Wang, Tong Zhang · 2024 · arXiv 2406.19976

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

TANDEM: Bi-Level Data Mixture Optimization with Twin Networks

cs.LG · 2026-06-03 · unverdicted · novelty 5.0

TANDEM solves bi-level data mixture optimization for LLMs via twin proxy and reference networks that measure domain efficacy by model difference and up-weight beneficial domains, with claimed theoretical guarantees and gains in restricted-data and SFT settings.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling cs.LG · 2026-05-14 · unverdicted · none · ref 242
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
TANDEM: Bi-Level Data Mixture Optimization with Twin Networks cs.LG · 2026-06-03 · unverdicted · none · ref 27
TANDEM solves bi-level data mixture optimization for LLMs via twin proxy and reference networks that measure domain efficacy by model difference and up-weight beneficial domains, with claimed theoretical guarantees and gains in restricted-data and SFT settings.

arXiv preprint arXiv:2406.19976 , year=

fields

years

verdicts

representative citing papers

citing papers explorer