{"paper":{"title":"MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"MetaAgent-X jointly trains the designer and executors of automatic multi-agent systems using end-to-end reinforcement learning.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Huazheng Wang, Jiayu Chang, Jishen Zhao, Nan Wang, Qingyun Wu, Yaolun Zhang, Yiran Wu, Yizhao Chen, Yujie Zhao","submitted_at":"2026-05-14T00:11:27Z","abstract_excerpt":"Automatic multi-agent systems aim to instantiate agent workflows without relying on manually designed or fixed orchestration. However, existing automatic MAS approaches remain only partially adaptive: they either perform training-free test-time search or optimize the meta-level designer while keeping downstream execution agents frozen, which creating a frozen-executor ceiling and leaving the end-to-end training of self-designing and self-executing agentic models unexplored. To address this, we introduce MetaAgent-X, an end-to-end reinforcement learning framework that jointly optimizes automati"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"MetaAgent-X consistently outperforms existing automatic MAS baselines, achieving up to 21.7% gains. ... These results establish end-to-end trainable automatic MAS as a practical paradigm for building self-designing and self-executing agentic models.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That Executor Designer Hierarchical Rollout and Stagewise Co-evolution provide stable joint optimization and accurate credit assignment across designer and executor trajectories without introducing new instabilities or biases that would prevent both components from improving.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"MetaAgent-X uses end-to-end RL to jointly optimize automatic multi-agent system design and execution, outperforming baselines by up to 21.7% through hierarchical rollouts and stagewise co-evolution.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"MetaAgent-X jointly trains the designer and executors of automatic multi-agent systems using end-to-end reinforcement learning.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"76b7e384b93f58cdc485ece5cef168a0be915c86185b3c43061d562b113e25f2"},"source":{"id":"2605.14212","kind":"arxiv","version":1},"verdict":{"id":"3bde2c6f-5fdf-4a2f-b951-47ddd30f87c8","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:45:37.586097Z","strongest_claim":"MetaAgent-X consistently outperforms existing automatic MAS baselines, achieving up to 21.7% gains. ... These results establish end-to-end trainable automatic MAS as a practical paradigm for building self-designing and self-executing agentic models.","one_line_summary":"MetaAgent-X uses end-to-end RL to jointly optimize automatic multi-agent system design and execution, outperforming baselines by up to 21.7% through hierarchical rollouts and stagewise co-evolution.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That Executor Designer Hierarchical Rollout and Stagewise Co-evolution provide stable joint optimization and accurate credit assignment across designer and executor trajectories without introducing new instabilities or biases that would prevent both components from improving.","pith_extraction_headline":"MetaAgent-X jointly trains the designer and executors of automatic multi-agent systems using end-to-end reinforcement learning."},"references":{"count":52,"sample":[{"doi":"","year":2024,"title":"Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu, and Jishen Zhao","work_id":"e6d87867-de15-4827-b740-f4aa603f708a","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"14 Figure 4: Sensitivity analysis on the stage length for designer–executor alternation","work_id":"3590c28c-4ac0-41c9-b839-a869f25bb253","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Delivery formatting.Inter-agent messages must be strictly enclosed within <delivery>...</delivery> tags. This constraint serves a dual purpose: it establishes a structured, easily parsable communicati","work_id":"cadd2d18-128e-43b1-918c-7001c4cce933","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Compute total number of ways to choose 4 numbers from 10:C(10,4)","work_id":"2aa90731-24bf-47c6-9ffd-29d736a4d47c","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"ComputeP(grand prize): number of ways to match all 4 numbers","work_id":"df92995a-93fa-4a13-8312-47293fb49251","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":52,"snapshot_sha256":"d34860785c889846c0b0a127339043d0ffd501cd70baa0e7b2bc9fd0e614884a","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"8db17f618145b6f3b5bbc4d3213a0fe744253cf312ed4f399402b985c038bd42"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}