pith. sign in

Meta reinforcement learning as task inference.arXiv preprint arXiv:1905.06424

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.LG 3 cs.AI 1

roles

background 2

polarities

background 1 unclear 1

representative citing papers

Solving Rubik's Cube with a Robot Hand

cs.LG · 2019-10-16 · accept · novelty 7.0

Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.

Neural Operators for Multi-Task Control and Adaptation

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

Neural operators approximate the solution operator for multi-task optimal control, generalizing to new tasks and enabling efficient adaptation via branch-trunk structure and meta-training.

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

cs.AI · 2024-08-13 · unverdicted · novelty 6.0

Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.

citing papers explorer

Showing 4 of 4 citing papers.

  • Solving Rubik's Cube with a Robot Hand cs.LG · 2019-10-16 · accept · none · ref 47

    Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.

  • A Meta Reinforcement Learning Approach to Goals-Based Wealth Management cs.LG · 2026-05-04 · unverdicted · none · ref 87

    MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.

  • Neural Operators for Multi-Task Control and Adaptation cs.LG · 2026-04-03 · unverdicted · none · ref 6

    Neural operators approximate the solution operator for multi-task optimal control, generalizing to new tasks and enabling efficient adaptation via branch-trunk structure and meta-training.

  • Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents cs.AI · 2024-08-13 · unverdicted · none · ref 25

    Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.