DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.
citing papers explorer
-
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
-
Behavior-Consistent Deep Reinforcement Learning
QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
-
Mechanisms of Misgeneralization in Physical Sequence Modeling
Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核
-
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.