Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.
Reinforcement learning and control as probabilistic inference: Tutorial and review, 2018
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.AI 3verdicts
UNVERDICTED 3roles
method 1polarities
use method 1representative citing papers
Introduces a resource-constrained POMDP framework and derives three principles of frugal inference and control that generalize to nonlinear tasks like pole balancing.
Meta-learning produces a helper agent that infers and executes tasks for a prime agent using emergent physical communication in cooperative foraging environments.
citing papers explorer
-
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.
-
Principles of frugal inference and control
Introduces a resource-constrained POMDP framework and derives three principles of frugal inference and control that generalize to nonlinear tasks like pole balancing.
-
Training an Interactive Helper
Meta-learning produces a helper agent that infers and executes tasks for a prime agent using emergent physical communication in cooperative foraging environments.