Neural Router: Semantic Content Matching for Agentic AI

Abhishek Kumar; Alaa Saleh; Alexander Engelhardt; Lauri Lov\'en; Naser Hossein Motlagh; Roberto Morabito; Sasu Tarkoma; XiaoLi Liu

arxiv: 2605.25701 · v1 · pith:KMW7HFCCnew · submitted 2026-05-25 · 💻 cs.DC · cs.CL· cs.IR· cs.NI

Neural Router: Semantic Content Matching for Agentic AI

Lauri Lov\'en , Abhishek Kumar , Alexander Engelhardt , Alaa Saleh , Roberto Morabito , Xiaoli Liu , Naser Hossein Motlagh , Sasu Tarkoma This is my paper

Pith reviewed 2026-06-29 20:21 UTC · model grok-4.3

classification 💻 cs.DC cs.CLcs.IRcs.NI

keywords semantic matchingpublish/subscribeagentic AIlarge language modelsedge-cloud computingcontent-based routingmulti-label retrievalcost-accuracy tradeoffs

0 comments

The pith

Large language models can serve as the semantic-matching engine of a content-based publish/subscribe broker for agentic AI across the edge-cloud computing continuum.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LLMs can bridge vocabulary and modality gaps in content matching for agentic AI where keyword and embedding filters fail. It frames the task as offline multi-label retrieval and evaluates six LLMs against seven baselines on three public datasets spanning social-media, legal, and smart-home sensor domains. A central contribution is the two-crossover cost-accuracy characterisation: an analytical context-window crossover below which CoverAndMerge compression reduces LLM invocations, and an empirical discrimination-capacity crossover above which accuracy collapses independently of context budget. If this holds, agentic AI systems could rely on LLMs for reliable semantic routing in distributed edge-cloud setups, with model selection emerging as the dominant decision factor over pipeline tuning.

Core claim

LLMs serve as the semantic-matching engine of a content-based publish/subscribe broker for agentic AI across the edge-cloud computing continuum, bridging the vocabulary and modality gaps that defeat keyword and embedding filters. The work characterises performance via an analytical context-window crossover below which a CoverAndMerge compression pipeline reduces LLM invocations and an empirical discrimination-capacity crossover above which matching accuracy collapses independently of context budget by a model-dependent factor of parameter count and training generation.

What carries the argument

The two-crossover cost-accuracy characterisation, which analytically locates the context window where compression lowers invocations and empirically locates the discrimination capacity limit set by model scale and training generation.

If this is right

Above the discrimination crossover, compression cannot recover accuracy and only frontier-scale models can clear large subscription sets.
Backend choice dominates configuration choice, so model selection is the primary operator lever.
Three composable algorithms and a per-cluster Quality-of-Experience framework support autonomic LLM-tier selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The routing mechanism could extend semantic matching to live, multi-modal agent interactions that cross edge and cloud boundaries.
Improvements in smaller models may shift the discrimination crossover and widen the set of usable backends.
The offline characterisation supplies a baseline for designing online adaptation layers in production agentic systems.

Load-bearing premise

The three public datasets spanning social-media, legal, and smart-home sensor domains sufficiently represent the content-matching workloads that will arise in deployed agentic AI systems across the edge-cloud continuum.

What would settle it

Running the multi-label retrieval evaluations on content drawn from actual deployed agentic AI applications operating across edge and cloud environments to check whether the identified crossovers and model-dependent accuracy patterns persist.

Figures

Figures reproduced from arXiv: 2605.25701 by Abhishek Kumar, Alaa Saleh, Alexander Engelhardt, Lauri Lov\'en, Naser Hossein Motlagh, Roberto Morabito, Sasu Tarkoma, XiaoLi Liu.

**Figure 2.** Figure 2: Neural Router architecture (single-broker view). [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Cost-model and scaling validation. (a) Predicted vs. measured per-cluster LLM invocations across configs {A0, A1, A3} and 𝑘 ∈ {1, 2, 5, 10, 19}, 𝑛=81 markers (Qwen-2.5-7B on Mahti GPU and dry-run client). Predicted 𝐼pred from Eq. (5) with production constants. The model is a conservative ceiling (median ratio 1.00, zero under-predictions): the grey wedge marks the one-sided within-2× region, trivial cells … view at source ↗

**Figure 4.** Figure 4: Empirical crossover validation on D1 with Qwen-2.5-7B at [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

**Figure 5.** Figure 5: Perturbation results on D1, Qwen-2.5 7B/32B tier gradient, calibration fraction [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: Parameter sensitivity on D1 (Haiku, A4): F1 vs. each parameter holding the others at the production [PITH_FULL_IMAGE:figures/full_fig_p032_6.png] view at source ↗

**Figure 7.** Figure 7: Calibration-fraction sweep on D1, Qwen-2.5 7B / 32B tier gradient, matched-pair LLM cache, [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗

read the original abstract

Large language models (LLMs) can serve as the semantic-matching engine of a content-based publish/subscribe broker for agentic AI across the edge-cloud computing continuum, bridging the vocabulary and modality gaps that defeat keyword and embedding filters. Framed as offline multi-label retrieval over three public datasets spanning social-media, legal, and smart-home sensor domains (six LLMs, seven baselines), our central contribution is a two-crossover cost-accuracy characterisation: an analytical context-window crossover below which a CoverAndMerge compression pipeline reduces LLM invocations, and an empirical discrimination-capacity crossover above which matching accuracy collapses independently of context budget, by a model-dependent factor of parameter count and training generation. Two findings carry practical weight: above the discrimination crossover, compression cannot recover accuracy and only frontier-scale models clear large subscription sets; and there backend choice dominates configuration choice, so model selection, not pipeline tuning, is the primary operator lever. We accompany this with three composable algorithms and a per-cluster Quality-of-Experience framework for autonomic LLM-tier selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a usable two-crossover rule for picking LLMs in agentic pub/sub and shows backend choice matters more than tuning, but the static datasets leave transfer to real edge-cloud workloads uncertain.

read the letter

The paper frames LLMs as semantic matchers inside content-based pub/sub for agentic AI across edge and cloud. It reports an analytical context-window crossover where a CoverAndMerge pipeline reduces calls, plus an empirical discrimination-capacity crossover where accuracy drops regardless of budget, scaled by model size and generation.

They test six LLMs against seven baselines on three public datasets covering social media, legal text, and smart-home sensors. The practical claims are that compression fails past the discrimination point, only frontier models handle large subscription sets, and model selection dominates configuration tweaks. They add three algorithms and a QoE framework for autonomic tier selection.

This is new in the specific crossover framing applied to agentic pub/sub and in the dominance finding. The empirical setup on public data supplies a concrete decision rule that readers in distributed agent systems can test directly.

The experiments follow standard multi-label retrieval methods, so the numbers on these corpora look internally consistent. No circularity appears since the work stays empirical rather than deriving fitted parameters from its own claims.

The soft spot is the datasets. Offline static corpora do not include dynamic subscription changes, agent-specific vocabulary drift, real-time multi-modal streams, or the latency and resource constraints of actual edge-cloud deployments. If the crossovers are tied to these particular domains, the advice that backend choice dominates may not carry over.

This paper is for researchers working on semantic routing in distributed AI who want empirical guidance on model selection. Readers who need a testable characterization of retrieval cost-accuracy tradeoffs will find it useful. It has enough structure and results to go to a serious referee, though reviewers will need to check how far the findings extend beyond the chosen corpora.

Send it for peer review.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that large language models can function as the semantic-matching component in a content-based publish/subscribe broker tailored for agentic AI systems operating across the edge-cloud computing continuum. Through offline multi-label retrieval experiments on three public datasets (social media, legal, and smart-home sensor domains) involving six LLMs and seven baselines, the authors identify two key crossovers: an analytical context-window crossover enabling a CoverAndMerge compression pipeline to reduce LLM invocations, and an empirical discrimination-capacity crossover beyond which matching accuracy declines independently of context budget, depending on model parameters and training generation. Key practical findings include that above the discrimination crossover, compression fails to restore accuracy and only frontier-scale models can handle large subscription sets, with backend choice outweighing configuration choices; the work also introduces three composable algorithms and a Quality-of-Experience framework for autonomic LLM-tier selection.

Significance. If the empirical crossovers and dominance findings generalize, this work offers valuable guidance for deploying LLM-based semantic matching in distributed agentic AI systems, highlighting the primacy of model selection over pipeline tuning. The provision of composable algorithms and the QoE framework for tier selection adds practical utility. The multi-model, multi-baseline evaluation strengthens the empirical basis.

major comments (2)

[§4 Evaluation] The central claims regarding the transferability of the context-window and discrimination-capacity crossovers to agentic AI pub/sub workloads rest on the three public datasets. However, these datasets consist of static offline multi-label retrieval tasks and do not capture dynamic subscription sets, evolving agent-specific vocabularies, real-time multi-modal streams, or latency constraints characteristic of the edge-cloud continuum, raising questions about whether the reported findings are artifacts of the chosen corpora.
[§5 Results] The assertion that 'backend choice dominates configuration choice' and the recommendation for frontier-scale models on large subscription sets are based solely on the performance observed in the social-media, legal, and smart-home domains. The manuscript would benefit from additional analysis or experiments demonstrating robustness to workloads more representative of deployed agentic systems.

minor comments (1)

[Abstract] The abstract mentions 'six LLMs, seven baselines' but does not name them; including the specific models and baselines would improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, clarifying the deliberate scope of our offline evaluation while acknowledging its limitations.

read point-by-point responses

Referee: [§4 Evaluation] The central claims regarding the transferability of the context-window and discrimination-capacity crossovers to agentic AI pub/sub workloads rest on the three public datasets. However, these datasets consist of static offline multi-label retrieval tasks and do not capture dynamic subscription sets, evolving agent-specific vocabularies, real-time multi-modal streams, or latency constraints characteristic of the edge-cloud continuum, raising questions about whether the reported findings are artifacts of the chosen corpora.

Authors: Our experiments are explicitly designed as controlled offline multi-label retrieval to isolate the analytical context-window crossover (derived from token budgets and independent of workload dynamics) and the empirical discrimination-capacity crossover (a model-intrinsic property of parameter count and training generation). The manuscript frames the contribution as a characterization of the semantic-matching engine rather than a full end-to-end dynamic pub/sub simulation; the three domains were chosen to span representative content types. We agree that dynamic subscription evolution, multi-modal streams, and latency are not modeled and that online validation would strengthen transferability claims, but these lie outside the current scope. No revision is required as the offline framing is stated throughout. revision: no
Referee: [§5 Results] The assertion that 'backend choice dominates configuration choice' and the recommendation for frontier-scale models on large subscription sets are based solely on the performance observed in the social-media, legal, and smart-home domains. The manuscript would benefit from additional analysis or experiments demonstrating robustness to workloads more representative of deployed agentic systems.

Authors: The dominance finding holds consistently across the three domains, which differ in vocabulary density, subscription cardinality, and content formality. This cross-domain consistency supports the conclusion that model selection is the primary lever. While we recognize that additional workloads (e.g., conversational or multi-modal agent traces) could test broader robustness, the present evidence across diverse static corpora is sufficient to ground the practical recommendation. We do not intend to add new experiments. revision: no

Circularity Check

0 steps flagged

No circularity; experimental results on public datasets

full rationale

The paper's central claims rest on offline multi-label retrieval experiments across three public datasets using six LLMs and seven baselines. The two crossovers are presented as one analytical (context-window) and one empirical (discrimination-capacity), derived from standard retrieval metrics and model comparisons rather than any self-defined parameters, fitted inputs renamed as predictions, or load-bearing self-citations. No equations or steps reduce the findings to inputs by construction; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.1-grok · 5741 in / 1059 out tokens · 27555 ms · 2026-06-29T20:21:04.729261+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Autonomic Federated-Market Orchestration for the Edge-Cloud Continuum
cs.DC 2026-05 unverdicted novelty 6.0

Neural Pub/Sub uses a MAPE-K loop with Walrasian price signals on service DAGs to achieve autonomic federated orchestration that matches centralized welfare under gross-substitutes assumptions and outperforms baseline...

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · cited by 1 Pith paper

[1]

Ilias Chalkidis, Manos Fergadiotis, and Ion Androutsopoulos

Karlsruhe, Germany, 163–174. Ilias Chalkidis, Manos Fergadiotis, and Ion Androutsopoulos. 2021. MultiEURLEX – A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. InProc. Conference on Empirical Methods in Natural Language Processing (EMNLP). 6974–6996. Diane J. Cook, Aaron S. Crandall, Brian L. Thoma...

work page doi:10.1109/mc.2012.328 2021
[2]

doi:10.1007/11587552_13 Mingdong Li, Qifeng Luo, Lu Wang, Ruisheng Shi, and Jinqiao Shi

Springer, 249–269. doi:10.1007/11587552_13 Mingdong Li, Qifeng Luo, Lu Wang, Ruisheng Shi, and Jinqiao Shi. 2020. Privacy-preserving content-based publish/subscribe service based on order preserving encryption. InInternet of Vehicles. Technologies and Services Toward Smart Cities: 6th International Conference, IOV 2019, Kaohsiung, Taiwan, November 18–21, ...

work page doi:10.1007/11587552_13 2020
[3]

[id] description

Real-Time AI Service Economy: A Framework for Agentic Computing Across the Continuum.arXiv preprint arXiv:2603.05614(2026). Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications...

work page doi:10.1109/acsos-c58168.2023.00048 2026

[1] [1]

Ilias Chalkidis, Manos Fergadiotis, and Ion Androutsopoulos

Karlsruhe, Germany, 163–174. Ilias Chalkidis, Manos Fergadiotis, and Ion Androutsopoulos. 2021. MultiEURLEX – A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. InProc. Conference on Empirical Methods in Natural Language Processing (EMNLP). 6974–6996. Diane J. Cook, Aaron S. Crandall, Brian L. Thoma...

work page doi:10.1109/mc.2012.328 2021

[2] [2]

doi:10.1007/11587552_13 Mingdong Li, Qifeng Luo, Lu Wang, Ruisheng Shi, and Jinqiao Shi

Springer, 249–269. doi:10.1007/11587552_13 Mingdong Li, Qifeng Luo, Lu Wang, Ruisheng Shi, and Jinqiao Shi. 2020. Privacy-preserving content-based publish/subscribe service based on order preserving encryption. InInternet of Vehicles. Technologies and Services Toward Smart Cities: 6th International Conference, IOV 2019, Kaohsiung, Taiwan, November 18–21, ...

work page doi:10.1007/11587552_13 2020

[3] [3]

[id] description

Real-Time AI Service Economy: A Framework for Agentic Computing Across the Continuum.arXiv preprint arXiv:2603.05614(2026). Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications...

work page doi:10.1109/acsos-c58168.2023.00048 2026