ACE-Router: Generalizing History-Aware Routing from MCP Tools to the Agent Web

Cheng Yang; Shuo Zhang; Weinan Zhang; Weiwen Liu; Xingshan Zeng; Yifu Guo; Zhiguang Han; Zhiyuan Yao; Zishan Xu

arxiv: 2601.08276 · v2 · submitted 2026-01-13 · 💻 cs.AI

ACE-Router: Generalizing History-Aware Routing from MCP Tools to the Agent Web

Zhiyuan Yao , Zishan Xu , Yifu Guo , Zhiguang Han , Cheng Yang , Shuo Zhang , Weinan Zhang , Xingshan Zeng

show 1 more author

Weiwen Liu

This is my paper

Pith reviewed 2026-05-16 15:21 UTC · model grok-4.3

classification 💻 cs.AI

keywords history-aware routingAgent WebModel Context Protocolmulti-agent collaborationtool navigationtrajectory synthesisLight Routing Agent

0 comments

The pith

ACE-Router trains history-aware routers on trajectories synthesized from dependency graphs to navigate large-scale agent tool ecosystems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ACE-Router as a pipeline that trains history-aware routers capable of precise navigation amid the rapid growth of tools in the Agent Web under the Model Context Protocol. It constructs a dependency-rich candidate Graph to generate synthetic multi-turn trajectories, which then train a plug-and-play Light Routing Agent with dynamic context understanding. The approach yields strong benchmark results while showing it can generalize to multi-agent collaboration, tolerate noise, and handle very large candidate spaces.

Core claim

ACE-Router is a pipeline for training history-aware routers to empower precise navigation in large-scale ecosystems. By leveraging a dependency-rich candidate Graph to synthesize multi-turn trajectories, we effectively train routers with dynamic context understanding to create the plug-and-play Light Routing Agent. Experiments on the real-world benchmarks MCP-Universe and MCP-Mark demonstrate superior performance. Notably, ACE-Router exhibits critical properties for the future Agent Web: it not only generalizes to multi-agent collaboration with minimal adaptation but also maintains exceptional robustness against noise and scales effectively to massive candidate spaces.

What carries the argument

The dependency-rich candidate Graph that synthesizes multi-turn trajectories to train the history-aware router and produce the Light Routing Agent.

If this is right

Superior performance on MCP-Universe and MCP-Mark benchmarks
Generalizes to multi-agent collaboration with minimal adaptation
Maintains exceptional robustness against noise
Scales effectively to massive candidate spaces

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method supplies an empirical basis for building universal orchestration across open agent ecosystems
Graph-based trajectory synthesis may reduce reliance on scarce real interaction data when training routers for dynamic tool environments
Similar synthesis pipelines could extend to routing problems in other collaborative AI systems beyond MCP

Load-bearing premise

Trajectories synthesized from a dependency-rich candidate Graph accurately capture the distribution of real multi-turn user interactions and tool dependencies in open MCP ecosystems.

What would settle it

If the trained Light Routing Agent shows sharply lower accuracy when evaluated on real user interaction logs whose tool dependencies differ substantially from those in the candidate Graph, the synthesis step fails to support the claimed performance.

read the original abstract

With the rise of the Agent Web and Model Context Protocol (MCP), the agent ecosystem is evolving into an open collaborative network, exponentially increasing accessible tools. However, current architectures face severe scalability and generality bottlenecks. To address this, we propose ACE-Router, a pipeline for training history-aware routers to empower precise navigation in large-scale ecosystems. By leveraging a dependency-rich candidate Graph to synthesize multi-turn trajectories, we effectively train routers with dynamic context understanding to create the plug-and-play Light Routing Agent. Experiments on the real-world benchmarks MCP-Universe and MCP-Mark demonstrate superior performance. Notably, ACE-Router exhibits critical properties for the future Agent Web: it not only generalizes to multi-agent collaboration with minimal adaptation but also maintains exceptional robustness against noise and scales effectively to massive candidate spaces. These findings provide a strong empirical foundation for universal orchestration in open-ended ecosystems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ACE-Router gives a graph-based synthesis pipeline for training history-aware routers on MCP tool ecosystems, but the performance claims rest on unverified synthetic trajectories.

read the letter

The core new piece is the pipeline that turns a dependency-rich candidate graph into synthetic multi-turn trajectories, then trains a router on them to produce a lightweight Light Routing Agent for large MCP and Agent Web settings. This is a direct engineering response to the scaling problem of tool selection as the number of available tools explodes in open agent networks. The approach is concrete: build the graph, generate paths, train for dynamic context, and deploy with minimal adaptation for multi-agent cases. That framing and the end-to-end flow are the parts that feel useful on first read. The paper also flags the right practical properties—noise robustness and scaling to massive candidate spaces—which matter for real deployments even if they are not theoretically novel. The graph synthesis step is a straightforward way to bootstrap training data without waiting for huge real-world logs. The soft spots are clear from the abstract. It asserts superior results on MCP-Universe and MCP-Mark but supplies no numbers, baselines, or ablation details here, so the strength of the empirical case is impossible to judge yet. More critically, there is no reported check that the synthetic trajectories match real multi-turn user patterns in length, dependency depth, or noise distribution. If that match is weak, the generalization and robustness claims do not follow. The method is presented as an empirical training pipeline rather than a closed-form derivation, so the usual circularity issues do not apply. This is for people working on practical agent orchestration and tool routing at scale. An engineer or applied researcher who needs a plug-and-play router for growing tool ecosystems would get value from the method description and the benchmark framing. It deserves a serious referee because the problem is timely, the pipeline is specified enough to evaluate, and the experiments can be checked once the full numbers and validation steps are on the table. I would send it to review.

Referee Report

2 major / 2 minor

Summary. The paper proposes ACE-Router, a pipeline that constructs a dependency-rich candidate Graph to synthesize multi-turn trajectories, which are then used to train history-aware routers. This produces a plug-and-play Light Routing Agent for navigation in large-scale MCP-based agent ecosystems. The work reports experiments on the MCP-Universe and MCP-Mark benchmarks that demonstrate superior performance, and further claims that the resulting router generalizes to multi-agent collaboration with minimal adaptation, exhibits exceptional robustness to noise, and scales to massive candidate spaces.

Significance. If the empirical results hold and the synthetic trajectories are representative, the work would offer a practical foundation for scalable tool orchestration in open Agent Web environments. The graph-based synthesis approach provides an efficient mechanism for generating dynamic context training data without requiring exhaustive real-interaction logs, addressing key bottlenecks in generality and scalability for multi-agent systems.

major comments (2)

[Abstract] Abstract: The central empirical claims of 'superior performance' on MCP-Universe and MCP-Mark, 'exceptional robustness against noise,' and effective scaling are asserted without any quantitative metrics, baseline comparisons, ablation results, or error analysis, preventing assessment of effect sizes or statistical significance.
[Method / Experiments] The training pipeline rests on the unverified assumption that trajectories synthesized from the dependency-rich candidate Graph match the distribution of real multi-turn user interactions (e.g., in trajectory length, dependency depth, or noise patterns). No distributional validation or comparison to logged real data is reported, which is load-bearing for the generalization, robustness, and multi-agent claims.

minor comments (2)

[§3] Clarify the exact construction of the candidate Graph and the trajectory synthesis procedure (e.g., sampling strategy, dependency encoding) to allow reproducibility.
[§4] Add explicit definitions or pseudocode for the history-aware routing objective and the Light Routing Agent architecture.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for major revision. We address each point below and have revised the manuscript accordingly to strengthen the presentation of results and clarify the assumptions underlying the synthetic data pipeline.

read point-by-point responses

Referee: [Abstract] Abstract: The central empirical claims of 'superior performance' on MCP-Universe and MCP-Mark, 'exceptional robustness against noise,' and effective scaling are asserted without any quantitative metrics, baseline comparisons, ablation results, or error analysis, preventing assessment of effect sizes or statistical significance.

Authors: We agree that the abstract would benefit from explicit quantitative support. In the revised manuscript we have updated the abstract to reference the key metrics reported in Section 4, including accuracy gains over baselines on both MCP-Universe and MCP-Mark, noise-robustness percentages, and scaling behavior with candidate-set size. We have also added a brief mention of the ablation studies and error analysis that appear in the experimental section so that effect sizes and statistical significance can be assessed directly from the abstract. revision: yes
Referee: [Method / Experiments] The training pipeline rests on the unverified assumption that trajectories synthesized from the dependency-rich candidate Graph match the distribution of real multi-turn user interactions (e.g., in trajectory length, dependency depth, or noise patterns). No distributional validation or comparison to logged real data is reported, which is load-bearing for the generalization, robustness, and multi-agent claims.

Authors: We acknowledge that direct distributional validation against proprietary real-interaction logs is not reported. In the revised version we have added a new subsection (Section 3.4) that compares aggregate statistics of the synthesized trajectories—trajectory length, dependency depth, and injected noise patterns—against the empirical distributions observed in the MCP benchmarks themselves, which are derived from real user interactions. While we do not have access to the original raw logs for a finer-grained Kolmogorov-Smirnov test, the strong generalization results on the real benchmarks provide indirect empirical support. We have also expanded the limitations paragraph to discuss the assumption explicitly. revision: partial

Circularity Check

0 steps flagged

Empirical training pipeline with no self-referential reductions

full rationale

The paper describes an empirical pipeline: a dependency-rich candidate Graph is used to synthesize multi-turn trajectories, which then train a history-aware router evaluated on the external benchmarks MCP-Universe and MCP-Mark. No equations, uniqueness theorems, or self-citations are invoked that would make any claimed generalization, noise robustness, or scaling property equivalent to the inputs by construction. The central results are presented as outcomes of training and testing rather than tautological re-statements of the synthesis procedure or fitted parameters, satisfying the default expectation of a non-circular derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly relies on standard supervised learning assumptions about trajectory quality and graph coverage.

pith-pipeline@v0.9.0 · 5474 in / 1038 out tokens · 35864 ms · 2026-05-16T15:21:01.391245+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By leveraging a dependency-rich candidate Graph to synthesize multi-turn trajectories, we effectively train routers with dynamic context understanding
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Graph-based Extension With Self-Evolutionary Mutation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning
cs.AI 2026-04 unverdicted novelty 7.0

MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and eva...