Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.
Meta reinforcement learning as task inference.arXiv preprint arXiv:1905.06424
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
Neural operators approximate the solution operator for multi-task optimal control, generalizing to new tasks and enabling efficient adaptation via branch-trunk structure and meta-training.
Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.
citing papers explorer
-
Solving Rubik's Cube with a Robot Hand
Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.
-
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
-
Neural Operators for Multi-Task Control and Adaptation
Neural operators approximate the solution operator for multi-task optimal control, generalizing to new tasks and enabling efficient adaptation via branch-trunk structure and meta-training.
-
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.