{"paper":{"title":"Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A multi-agent system with an orchestrator achieves competitive performance on complex AI agent benchmarks without modifications.","cross_cats":["cs.MA"],"primary_cat":"cs.AI","authors_text":"Adam Fourney, Ahmed Awadallah, Cheng Tan, Ece Kamar, Eduardo Salinas, Erkang (Eric) Zhu, Friederike Niedtner, Gagan Bansal, Grace Proebsting, Griffin Bassman, Hussein Mozannar, Jack Gerrits, Jacob Alber, Peter Chang, Rafah Hosn, Ricky Loynd, Robert West, Saleema Amershi, Victor Dibia","submitted_at":"2024-11-07T06:36:19Z","abstract_excerpt":"Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations, and recover from errors, to successfully complete complex tasks across a wide range of scenarios. In this work, we introduce Magentic-One, a high-performing open-source agentic system for solving such tasks. Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, plans, t"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks: GAIA, AssistantBench, and WebArena.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the modular multi-agent design with an orchestrator allows agents to be added or removed without additional prompt tuning or training while maintaining performance across tasks.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Magentic-One is a modular multi-agent system that matches state-of-the-art performance on GAIA, AssistantBench, and WebArena using an orchestrator-led team of specialized agents.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A multi-agent system with an orchestrator achieves competitive performance on complex AI agent benchmarks without modifications.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"79f55f82c6dda58e24d960eb95b21bc80faef2e64b16e8b72498395d07a76fc7"},"source":{"id":"2411.04468","kind":"arxiv","version":1},"verdict":{"id":"071688f3-1a90-4d2d-9880-967dc119d984","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T18:45:15.680308Z","strongest_claim":"Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks: GAIA, AssistantBench, and WebArena.","one_line_summary":"Magentic-One is a modular multi-agent system that matches state-of-the-art performance on GAIA, AssistantBench, and WebArena using an orchestrator-led team of specialized agents.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the modular multi-agent design with an orchestrator allows agents to be added or removed without additional prompt tuning or training while maintaining performance across tasks.","pith_extraction_headline":"A multi-agent system with an orchestrator achieves competitive performance on complex AI agent benchmarks without modifications."},"references":{"count":79,"sample":[{"doi":"","year":2024,"title":"T. Abuelsaad, D. Akkil, P. Dey, A. Jagmohan, A. Vempaty, and R. Kokku. Agent-e: From autonomous web navigation to foundational design principles in agentic systems, 2024","work_id":"b0d846eb-61d5-49ea-832d-2b0c1166278f","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Github — babyagi","work_id":"7f8ed173-86e2-4252-b7bc-92da2d6f1774","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"R. Bonatti, D. Zhao, F. Bonacci, D. Dupont, S. Abdali, Y. Li, Y. Lu, J. Wagle, K. Koishida, A. Bucker, L. Jang, and Z. Hui. Windows agent arena: Evaluating multi-modal os agents at scale, 2024","work_id":"a00a5fa2-e8e4-4061-9332-3a47d14c56ff","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"R. Cao, F. Lei, H. Wu, J. Chen, Y. Fu, H. Gao, X. Xiong, H. Zhang, Y. Mao, W. Hu, T. Xie, H. Xu, D. Zhang, S. Wang, R. Sun, P. Yin, C. Xiong, A. Ni, Q. Liu, V. Zhong, L. Chen, K. Yu, and T. Yu. Spider","work_id":"d8e7d0b2-9458-43a6-b3c0-9ebdc4dd369a","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Z. Chen, M. White, R. Mooney, A. Payani, Y. Su, and H. Sun. When is tree search useful for llm planning? it depends on the discriminator, 2024","work_id":"4fbf4a33-0604-412e-92d8-716fe57abbae","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":79,"snapshot_sha256":"7d85405319d63e4fb87de11120d9f6aaed1f5807243d2f6a06d460743ed7f37b","internal_anchors":14},"formal_canon":{"evidence_count":2,"snapshot_sha256":"64053d29edb14d26670127e5b62c6470b496c014275eff7cd51c2840a51c549d"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}