RouteProfile: Graph-Based Profiling for Cold-Start LLM Routing

Ge Liu; Haozhen Zhang; Hongji Pu; Jiaxuan You; Jingjun Xu; Tao Feng

arxiv: 2605.00180 · v2 · pith:RGCV7AB6new · submitted 2026-04-30 · 💻 cs.NI · cs.CL

RouteProfile: Graph-Based Profiling for Cold-Start LLM Routing

Jingjun Xu , Hongji Pu , Tao Feng , Haozhen Zhang , Jiaxuan You , Ge Liu This is my paper

Pith reviewed 2026-05-09 20:09 UTC · model grok-4.3

classification 💻 cs.NI cs.CL

keywords LLM routingprofile designstructured profilesmodel capabilitiesrouting performancegeneralizationLLM ecosysteminformation integration

0 comments

The pith

Structured profiles for LLM capabilities outperform flat ones in routing tasks and improve generalization to new models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks how the design of LLM profiles, which record model strengths across queries and domains, shapes routing decisions that assign inputs to the most suitable model. It treats profile construction as the task of integrating heterogeneous interaction histories into usable representations. A design space called RouteProfile is defined along four axes of choice: whether to organize information in structured or flat form, the type of representation used, the depth of aggregation from raw data, and whether the profile is learned or fixed. Systematic tests on three routers in both standard and new-LLM settings show structured forms beat flat ones, query-level signals beat coarse domain signals, and trainable structured profiles help most when routing to unseen models.

Core claim

LLM profiling is a structured information integration problem over heterogeneous interaction histories. A general design space along organizational form, representation type, aggregation depth, and learning configuration reveals that structured profiles consistently outperform flat ones, query-level signals are more reliable than domain-level signals, and generalization to newly introduced models benefits most from structured profiles under trainable configurations.

What carries the argument

RouteProfile, the four-dimensional design space for LLM profiles that organizes capability information from interaction histories into structured or flat forms with chosen representation, depth, and learning configuration.

If this is right

Structured profiles should replace flat ones in router implementations to raise overall accuracy.
Query-level signals should be collected and used in preference to domain-level summaries for more reliable routing.
Trainable structured profiles should be adopted when the router must handle newly introduced models.
Router mechanisms can be compared more fairly by holding profile design fixed across experiments.
Profile engineering becomes a separable and optimizable component of routing system development.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Treating profiles as an independent design variable may allow routing systems to improve without changes to the router algorithm itself.
Standardized profile formats could support shared benchmarks that isolate the contribution of each router.
The emphasis on query-level detail suggests routing may scale better when profiles track fine-grained interaction outcomes rather than broad categories.

Load-bearing premise

That evaluations on three routers and the chosen standard plus generalization settings are representative enough to elucidate the full design space and apply to other routing systems and LLMs.

What would settle it

A fourth router or new set of LLMs where flat profiles produce higher routing accuracy than structured profiles under the same generalization test conditions would falsify the main performance claims.

Figures

Figures reproduced from arXiv: 2605.00180 by Ge Liu, Haozhen Zhang, Hongji Pu, Jiaxuan You, Jingjun Xu, Tao Feng.

**Figure 1.** Figure 1: Model strengths vary substantially across query, task, and domain levels. Radar plots compare the performance of candidate LLMs under three views: query difficulty, benchmark task, and domain category. No single model dominates all dimensions; instead, different models exhibit complementary strengths and weaknesses, motivating the need for structured model profiling in routing. and are often interdependent… view at source ↗

**Figure 2.** Figure 2: Overview of the RouteProfile. LLM profiles are constructed from interaction histories comprising model family, task evaluation, domain coverage, and query-level signals. The design space is characterized along four dimensions: organizational form (flat/structured), representation type (text/embedding), aggregation depth (hop ∈ {0, 1, 2, ...}), and learning configuration (training-free/trainable). Three r… view at source ↗

**Figure 3.** Figure 3: Effect of aggregation hop differs across profile designs and routers (RQ1). Depth helps overall, but its value is dependent on the profile design (i.e., representation type and learning configuration) and router. 5.5 Routing Tasks and Metrics We consider two settings to assess the utility and generalizability of LLM profiles in routing. Standard Routing. In the standard setting, all candidate LLMs are incl… view at source ↗

**Figure 4.** Figure 4: Routing performance across different profile designs under the new-LLM routing setting (RQ3). The three panels compare how different profile designs behave under each router. Trainable GNNs achieve the strongest cold-start performance (Eq. 8). Generalization to new LLMs requires structured and trainable profiles. As shown in [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

LLM routing is increasingly important for selecting suitable models under diverse user needs and deployment constraints, but its practical effectiveness depends on continual adaptation to emerging queries and newly released models. New-LLM integration is particularly challenging, as newly released models lack the query-response-reward interactions required for router training and cannot be profiled as directly as new queries via semantic embeddings. Existing profiles are limited: LLM-generated descriptions are often coarse, while interaction-based embeddings are costly to construct. To address this problem, we propose RouteProfile, a graph-based profiling framework that constructs LLM profiles from public signals in technical reports or model cards, including model family, model description, reported benchmark scores, and benchmark domains. RouteProfile organizes these heterogeneous signals into a graph and studies profile construction along four dimensions: organizational form, representation type, aggregation depth, and learning configuration. We evaluate RouteProfile in training-free cold-start routing and new-LLM integration settings. Experiments show that: (1) structured profiles outperform flat baselines in training-free cold-start routing; (2) model family metadata is more reliable than benchmark domain information; and (3) effective new-LLM integration requires profile-router co-design. Overall, our findings highlight the importance of profile design for enabling routing systems to adapt to the evolving model ecosystem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RouteProfile lays out a four-dimension design space for LLM profiles and finds structured ones beat flat ones plus some other patterns in tests on three routers, but that router count is too small to support broad claims about the design space.

read the letter

RouteProfile carves out a design space for how to build profiles of LLM capabilities for use in routing systems, and their experiments suggest that structured profiles give better results than flat ones across the routers they tried. They break the design into four dimensions: whether the profile is organized in a structured way or flat, what kind of representation it uses, how deep the aggregation goes over past interactions, and whether the profile is learned in a trainable setup or not. Then they plug these into three different routers and test both on regular benchmarks and on cases where new LLMs are added later. The results point to structured profiles winning, query-level signals being more useful than domain-level ones, and the generalization benefit coming mostly from trainable structured profiles. This is a solid move because routing work has mostly tinkered with the router itself while leaving the profile as an afterthought. By making the profile choices explicit and testing them separately, the paper helps separate the two and gives people a way to compare systems more fairly. The generalization experiments are a nice touch since real deployments will keep adding new models. The main limitation is that everything rests on just three routers. The stress-test concern is on point here: if those three happen to process profile information in comparable ways, then the advantages seen for structured profiles could be specific to that style of router rather than a general property of profile design. The paper would need to show more diversity in router architectures or test on additional ones to make the design space claims stick for the broader field. Without the actual numbers from the tables, it's also hard to tell how big the gains are or whether they hold up under different metrics. This kind of work is aimed at researchers and engineers building routing layers for LLM services, especially those dealing with heterogeneous model pools. It gives them concrete dimensions to experiment with and some evidence to start from. I would send this to peer review. The idea is timely and the approach is straightforward, even if the current experiments leave room for questions about how widely the findings apply.

Referee Report

1 major / 2 minor

Summary. The paper introduces RouteProfile, a design space for LLM profiles in routing viewed as structured information integration over interaction histories, with four dimensions (organizational form, representation type, aggregation depth, learning configuration). Systematic evaluation across three representative routers in standard and new-LLM generalization settings shows structured profiles outperform flat ones, query-level signals are more reliable than domain-level, and generalization to new models benefits most from structured trainable profiles.

Significance. If the empirical results hold under broader validation, the work is significant for disentangling profile design from router mechanisms, enabling fairer comparisons across routing systems, and providing actionable guidelines for profile construction that could improve performance and generalization in LLM routing.

major comments (1)

[Experimental Setup and Results (Sections 4-5)] The central claims rest on evaluation across only three routers presented as representative, but without explicit analysis of how these routers differ mechanistically in ingesting and utilizing profile signals (e.g., structured vs. flat inputs or query-level vs. domain signals), the observed consistencies may reflect router-specific behaviors rather than general properties of the RouteProfile design space. This limits the ability to fully elucidate the design space and generalize beyond the chosen routers.

minor comments (2)

[Abstract] Abstract provides only high-level findings without any quantitative metrics, specific datasets, router names, or result tables; the full paper should include a concise summary of key numbers (e.g., performance deltas) to allow readers to assess the claims immediately.
[RouteProfile Design Space (Section 3)] Notation for the four dimensions and their instantiations could be clarified with a summary table early in the design space section to improve readability when comparing configurations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment below and will incorporate revisions to strengthen the paper.

read point-by-point responses

Referee: [Experimental Setup and Results (Sections 4-5)] The central claims rest on evaluation across only three routers presented as representative, but without explicit analysis of how these routers differ mechanistically in ingesting and utilizing profile signals (e.g., structured vs. flat inputs or query-level vs. domain signals), the observed consistencies may reflect router-specific behaviors rather than general properties of the RouteProfile design space. This limits the ability to fully elucidate the design space and generalize beyond the chosen routers.

Authors: We appreciate this observation and agree that an explicit mechanistic comparison would strengthen the claims. In the revised manuscript, we will add a dedicated subsection (4.2) in the Experimental Setup that analyzes the three routers' distinct input processing mechanisms. This will describe: (i) how each router encodes profile inputs (e.g., vector concatenation for embedding-based routers versus attention over hierarchical structures for others), (ii) differential handling of query-level versus domain-level signals, and (iii) how structured versus flat profiles are parsed. By mapping these differences to the consistent performance patterns we observe, the addition will better support that the advantages of structured profiles reflect general properties of the RouteProfile design space. We will also clarify the selection criteria for the three routers as covering major paradigms in the literature (embedding similarity, learned classifiers, and LLM-based routing). revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on direct evaluations

full rationale

The paper develops a conceptual design space (RouteProfile) along four dimensions and supports its claims exclusively through systematic empirical evaluations on three routers under standard and new-LLM settings. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the abstract or described methodology. The performance comparisons (structured vs. flat profiles, query-level vs. domain-level signals) are observational outcomes, not reductions to the inputs by construction. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The work rests on the domain assumption that LLM capabilities can be usefully captured via structured integration of interaction histories and that the four dimensions cover the main design choices for profiles.

axioms (1)

domain assumption LLM capabilities can be effectively captured through structured integration of heterogeneous interaction histories
Basis for treating profiling as an information integration problem and for the RouteProfile dimensions.

invented entities (1)

RouteProfile design space no independent evidence
purpose: To systematize and explore LLM profile design along four dimensions
Newly introduced framework to organize profile choices and enable fair comparisons.

pith-pipeline@v0.9.0 · 5526 in / 1165 out tokens · 42121 ms · 2026-05-09T20:09:42.748911+00:00 · methodology

RouteProfile: Graph-Based Profiling for Cold-Start LLM Routing

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)