MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.
Direct preference optimization: Your language model is secretly a reward model.Advances in Neural Information Processing Systems, 36:53728–53741, 2023
3 Pith papers cite this work. Polarity classification is still indexing.
years
2025 3representative citing papers
PI-MPC turns finite-horizon optimal control into inference over a Boltzmann-weighted control distribution and generates actions via variational inference, with MPPI as a key sampling-based example.
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.
citing papers explorer
-
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.
-
Model Predictive Control via Probabilistic Inference: A Tutorial and Survey
PI-MPC turns finite-horizon optimal control into inference over a Boltzmann-weighted control distribution and generates actions via variational inference, with MPPI as a key sampling-based example.
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.