{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2024:ML6YPSJGAYDNYY5KNIPPP5W3KI","short_pith_number":"pith:ML6YPSJG","schema_version":"1.0","canonical_sha256":"62fd87c9260606dc63aa6a1ef7f6db5201497c109f82a367458e2350d6bdec7a","source":{"kind":"arxiv","id":"2411.04468","version":1},"attestation_state":"computed","paper":{"title":"Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A multi-agent system with an orchestrator achieves competitive performance on complex AI agent benchmarks without modifications.","cross_cats":["cs.MA"],"primary_cat":"cs.AI","authors_text":"Adam Fourney, Ahmed Awadallah, Cheng Tan, Ece Kamar, Eduardo Salinas, Erkang (Eric) Zhu, Friederike Niedtner, Gagan Bansal, Grace Proebsting, Griffin Bassman, Hussein Mozannar, Jack Gerrits, Jacob Alber, Peter Chang, Rafah Hosn, Ricky Loynd, Robert West, Saleema Amershi, Victor Dibia","submitted_at":"2024-11-07T06:36:19Z","abstract_excerpt":"Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations, and recover from errors, to successfully complete complex tasks across a wide range of scenarios. In this work, we introduce Magentic-One, a high-performing open-source agentic system for solving such tasks. Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, plans, t"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2411.04468","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.AI","submitted_at":"2024-11-07T06:36:19Z","cross_cats_sorted":["cs.MA"],"title_canon_sha256":"f5a99c24279c399d55bffad740b03c7fb05ece58c5b986910b9fbefd7caca04c","abstract_canon_sha256":"ab1fada245f0cc946033623a52fd268210295af3e6ba1da60f7cae05f491d24b"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:46.930990Z","signature_b64":"VNF9uHUCmaCwD9oJ9d1ubASis6Di0jn3uJyBbv6I3ypoiEdylW+QVnuSAUp8e6jvU1eb7lvTQ0ovIwryxl9PBA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"62fd87c9260606dc63aa6a1ef7f6db5201497c109f82a367458e2350d6bdec7a","last_reissued_at":"2026-05-17T23:38:46.930548Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:46.930548Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A multi-agent system with an orchestrator achieves competitive performance on complex AI agent benchmarks without modifications.","cross_cats":["cs.MA"],"primary_cat":"cs.AI","authors_text":"Adam Fourney, Ahmed Awadallah, Cheng Tan, Ece Kamar, Eduardo Salinas, Erkang (Eric) Zhu, Friederike Niedtner, Gagan Bansal, Grace Proebsting, Griffin Bassman, Hussein Mozannar, Jack Gerrits, Jacob Alber, Peter Chang, Rafah Hosn, Ricky Loynd, Robert West, Saleema Amershi, Victor Dibia","submitted_at":"2024-11-07T06:36:19Z","abstract_excerpt":"Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations, and recover from errors, to successfully complete complex tasks across a wide range of scenarios. In this work, we introduce Magentic-One, a high-performing open-source agentic system for solving such tasks. Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, plans, t"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks: GAIA, AssistantBench, and WebArena.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the modular multi-agent design with an orchestrator allows agents to be added or removed without additional prompt tuning or training while maintaining performance across tasks.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Magentic-One is a modular multi-agent system that matches state-of-the-art performance on GAIA, AssistantBench, and WebArena using an orchestrator-led team of specialized agents.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A multi-agent system with an orchestrator achieves competitive performance on complex AI agent benchmarks without modifications.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"79f55f82c6dda58e24d960eb95b21bc80faef2e64b16e8b72498395d07a76fc7"},"source":{"id":"2411.04468","kind":"arxiv","version":1},"verdict":{"id":"071688f3-1a90-4d2d-9880-967dc119d984","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T18:45:15.680308Z","strongest_claim":"Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks: GAIA, AssistantBench, and WebArena.","one_line_summary":"Magentic-One is a modular multi-agent system that matches state-of-the-art performance on GAIA, AssistantBench, and WebArena using an orchestrator-led team of specialized agents.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the modular multi-agent design with an orchestrator allows agents to be added or removed without additional prompt tuning or training while maintaining performance across tasks.","pith_extraction_headline":"A multi-agent system with an orchestrator achieves competitive performance on complex AI agent benchmarks without modifications."},"references":{"count":79,"sample":[{"doi":"","year":2024,"title":"T. Abuelsaad, D. Akkil, P. Dey, A. Jagmohan, A. Vempaty, and R. Kokku. Agent-e: From autonomous web navigation to foundational design principles in agentic systems, 2024","work_id":"b0d846eb-61d5-49ea-832d-2b0c1166278f","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Github — babyagi","work_id":"7f8ed173-86e2-4252-b7bc-92da2d6f1774","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"R. Bonatti, D. Zhao, F. Bonacci, D. Dupont, S. Abdali, Y. Li, Y. Lu, J. Wagle, K. Koishida, A. Bucker, L. Jang, and Z. Hui. Windows agent arena: Evaluating multi-modal os agents at scale, 2024","work_id":"a00a5fa2-e8e4-4061-9332-3a47d14c56ff","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"R. Cao, F. Lei, H. Wu, J. Chen, Y. Fu, H. Gao, X. Xiong, H. Zhang, Y. Mao, W. Hu, T. Xie, H. Xu, D. Zhang, S. Wang, R. Sun, P. Yin, C. Xiong, A. Ni, Q. Liu, V. Zhong, L. Chen, K. Yu, and T. Yu. Spider","work_id":"d8e7d0b2-9458-43a6-b3c0-9ebdc4dd369a","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Z. Chen, M. White, R. Mooney, A. Payani, Y. Su, and H. Sun. When is tree search useful for llm planning? it depends on the discriminator, 2024","work_id":"4fbf4a33-0604-412e-92d8-716fe57abbae","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":79,"snapshot_sha256":"7d85405319d63e4fb87de11120d9f6aaed1f5807243d2f6a06d460743ed7f37b","internal_anchors":14},"formal_canon":{"evidence_count":2,"snapshot_sha256":"64053d29edb14d26670127e5b62c6470b496c014275eff7cd51c2840a51c549d"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2411.04468","created_at":"2026-05-17T23:38:46.930626+00:00"},{"alias_kind":"arxiv_version","alias_value":"2411.04468v1","created_at":"2026-05-17T23:38:46.930626+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2411.04468","created_at":"2026-05-17T23:38:46.930626+00:00"},{"alias_kind":"pith_short_12","alias_value":"ML6YPSJGAYDN","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"ML6YPSJGAYDNYY5K","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"ML6YPSJG","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":30,"internal_anchor_count":30,"sample":[{"citing_arxiv_id":"2605.23414","citing_title":"When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems","ref_index":138,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22566","citing_title":"GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22154","citing_title":"IdleSpec: Exploiting Idle Time via Speculative Planning for LLM Agents","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19010","citing_title":"AgentNLQ: A General-Purpose Agent for Natural Language to SQL","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19149","citing_title":"Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15215","citing_title":"SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2510.21236","citing_title":"AgentBound: Securing Execution Boundaries of AI Agents","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2511.01594","citing_title":"MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2511.21686","citing_title":"Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2512.04129","citing_title":"Don't Trust Your Upstream: Exploiting LLM Multi-Agent System via Topology-Guided Adversarial Propagation","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2512.19396","citing_title":"EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2601.11848","citing_title":"Compass vs Railway Tracks: Unpacking User Mental Models for Communicating Long-Horizon Work to Humans vs. AI","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2508.07407","citing_title":"A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07509","citing_title":"MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2511.20857","citing_title":"Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory","ref_index":77,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12755","citing_title":"State-Centric Decision Process","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07637","citing_title":"Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27586","citing_title":"Trace-Level Analysis of Information Contamination in Multi-Agent Systems","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08831","citing_title":"AssemPlanner: A Multi-Agent Based Task Planning Framework for Flexible Assembly System","ref_index":43,"is_internal_anchor":true},{"citing_arxiv_id":"2503.13657","citing_title":"Why Do Multi-Agent LLM Systems Fail?","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08647","citing_title":"AgentCollabBench: Diagnosing When Good Agents Make Bad Collaborators","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05657","citing_title":"Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation","ref_index":61,"is_internal_anchor":true},{"citing_arxiv_id":"2604.22446","citing_title":"From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2604.19247","citing_title":"BONSAI: A Mixed-Initiative Workspace for Human-AI Co-Development of Visual Analytics Applications","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18847","citing_title":"Human-Guided Harm Recovery for Computer Use Agents","ref_index":21,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/ML6YPSJGAYDNYY5KNIPPP5W3KI","json":"https://pith.science/pith/ML6YPSJGAYDNYY5KNIPPP5W3KI.json","graph_json":"https://pith.science/api/pith-number/ML6YPSJGAYDNYY5KNIPPP5W3KI/graph.json","events_json":"https://pith.science/api/pith-number/ML6YPSJGAYDNYY5KNIPPP5W3KI/events.json","paper":"https://pith.science/paper/ML6YPSJG"},"agent_actions":{"view_html":"https://pith.science/pith/ML6YPSJGAYDNYY5KNIPPP5W3KI","download_json":"https://pith.science/pith/ML6YPSJGAYDNYY5KNIPPP5W3KI.json","view_paper":"https://pith.science/paper/ML6YPSJG","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2411.04468&json=true","fetch_graph":"https://pith.science/api/pith-number/ML6YPSJGAYDNYY5KNIPPP5W3KI/graph.json","fetch_events":"https://pith.science/api/pith-number/ML6YPSJGAYDNYY5KNIPPP5W3KI/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/ML6YPSJGAYDNYY5KNIPPP5W3KI/action/timestamp_anchor","attest_storage":"https://pith.science/pith/ML6YPSJGAYDNYY5KNIPPP5W3KI/action/storage_attestation","attest_author":"https://pith.science/pith/ML6YPSJGAYDNYY5KNIPPP5W3KI/action/author_attestation","sign_citation":"https://pith.science/pith/ML6YPSJGAYDNYY5KNIPPP5W3KI/action/citation_signature","submit_replication":"https://pith.science/pith/ML6YPSJGAYDNYY5KNIPPP5W3KI/action/replication_record"}},"created_at":"2026-05-17T23:38:46.930626+00:00","updated_at":"2026-05-17T23:38:46.930626+00:00"}