DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning
6 Pith papers cite this work. Polarity classification is still indexing.
6
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
years
2026 6roles
background 2polarities
background 2representative citing papers
SimDist pretrains world models in simulation and adapts them to real-world robots by updating only the latent dynamics model, enabling rapid improvement on contact-rich tasks where prior methods fail.
Non-uniform replay helps most when replay volume is low; high-entropy sampling remains important, and a truncated geometric distribution delivers better sample efficiency with negligible overhead.
UniCon standardizes states and control logic into modular execution graphs for efficient transfer of learning controllers across heterogeneous robots, with lower latency than ROS.