DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
arXiv preprint arXiv:2403.09032 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
Early Stopping Rollout limits on-policy distillation to initial tokens to counter teacher decay, outperforming full rollouts in accuracy, efficiency, and stability while revealing cascading alignment effects.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
citing papers explorer
-
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.