arxiv: 2605.01710 · v1 · submitted 2026-05-03 · 💻 cs.AI · cs.CY

Recognition: unknown

Model Routing as a Trust Problem: Route Receipts for Adaptive AI Systems

Vincent Schmalbach

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:44 UTC · model grok-4.3

classification 💻 cs.AI cs.CY

keywords AI routingroute receiptstransparencyadaptive AI systemsmodel cardstrustruntime documentationredaction

0 comments

The pith

Adaptive AI systems should attach a route receipt to each response to document the runtime path taken without exposing proprietary logic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that routing decisions in AI services affect cost, quality, and accountability yet remain invisible to users, eroding trust. It proposes that every response include a compact route receipt capturing enough material facts for users to reconstruct key decisions such as model version, tier, or safety handling. This receipt would function as a runtime counterpart to static model cards, which describe only the trained artifact. The author surveys current platforms and notes that fragments of routing information already exist but lack a standardized, per-answer portable format. If adopted, route receipts would let relying parties verify the conditions under which an answer was produced.

Core claim

The central claim is that model routing constitutes a trust problem best addressed by producing a route receipt for each request: a minimal, redacted record of the serving path that supplies enough facts for external reconstruction of routing choices while protecting internal proprietary details.

What carries the argument

The route receipt, a compact runtime record of the path that served a request, designed with a minimal schema and redaction rules to enable reconstruction without full disclosure.

If this is right

Route transparency becomes a required element of model documentation alongside existing model cards.
Users gain the ability to verify which version, tier, or fallback produced a given answer.
Platforms can share receipt fragments already generated internally in a unified, portable format.
Accountability improves because changes in cost or quality can be traced to specific routing steps.
Safety and compliance reviews can reference the exact runtime conditions of a response.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standardized receipt formats could integrate with existing logging and audit systems to reduce duplication.
High-stakes domains such as medical or financial AI might adopt receipts first to meet regulatory expectations.
Over time, receipts could evolve to include optional fields for user-requested transparency levels.
The approach focuses on path documentation rather than model internals, complementing rather than replacing explainability techniques.

Load-bearing premise

A compact redacted receipt can be produced and shared at acceptable cost without either leaking proprietary routing logic or creating excessive overhead.

What would settle it

Demonstrating that any usable receipt either reveals enough routing details to compromise competitive advantage or adds latency and storage costs that production systems reject would falsify the proposal.

read the original abstract

AI products often route requests through version aliases, service tiers, tool choices, regional endpoints, fallback rules, or safety handling before responding. These routing steps are documented product surfaces in several widely used AI platforms and serving stacks. Routing helps AI services stay affordable, fast, and available at scale, and it shapes trust. Trust can break when routing changes the cost, quality, or accountability of a response without the user being able to tell what happened. "Which model answered?" is only part of the audit question. The runtime path matters. Adaptive AI systems should produce a runtime transparency artifact called the route receipt. A route receipt is a compact record of the route that served a request. It should capture enough material facts for people relying on the output to reconstruct important routing decisions without exposing proprietary internals or hidden reasoning. Route transparency should be part of model documentation. Model cards describe trained model artifacts, while route receipts describe the runtime conditions under which a particular answer was produced. The paper introduces the route-receipt concept, a minimal schema and redaction model, and a documentation-based survey of selected platforms showing that receipt fragments already exist without a portable per-answer record.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames routing decisions in AI systems as a trust issue and proposes a portable route receipt with a minimal schema, but supplies no example or test of whether redaction can actually work.

read the letter

The main takeaway is that this paper treats runtime routing in adaptive AI services as something that needs its own transparency layer, separate from model cards. It introduces the route receipt idea as a compact record of which model version, tier, fallback, or safety path handled a request, with a redaction model to avoid leaking proprietary logic. That specific portable per-answer artifact and the explicit trust framing do not show up in the cited prior work, so the concept is new on that narrow point. The survey of existing platforms is also a plus because it shows that fragments of routing metadata already get logged in various serving stacks, which makes the proposal feel grounded rather than invented from scratch. The argument stays internally consistent throughout. The soft spot is exactly the one the stress-test note flags. The paper defines a schema and redaction rules but never shows a worked example on a real router, never checks whether the redacted receipt still lets a downstream party reconstruct the material facts, and never addresses overhead or leakage risk in practice. Without that step the central claim remains untested, so the soundness stays at the conceptual level. This is the kind of paper that would interest people working on AI governance, deployment standards, or regulatory transparency requirements. A reader already thinking about accountability in production systems could use the schema as a discussion starter. It deserves a serious referee because the idea is coherent, points to a genuine gap, and builds on observable platform behavior, even if the next version would need concrete feasibility work to carry weight. I would send it to review.

Referee Report

2 major / 1 minor

Summary. The paper claims that routing decisions in adaptive AI systems (version aliases, tiers, tool choices, fallbacks, safety handling) shape trust and accountability, and proposes 'route receipts' as compact runtime transparency artifacts. These receipts should capture enough material facts via a minimal schema and redaction model to let downstream parties reconstruct key routing decisions without exposing proprietary internals. The work positions receipts as complementary to model cards, introduces the schema and redaction approach, and surveys documentation from selected platforms to show that receipt-like fragments already exist in practice.

Significance. If the redaction model can be validated to balance reconstructibility with proprietary protection and acceptable overhead, the proposal could help standardize runtime accountability for adaptive AI services, filling a gap between static model documentation and dynamic serving behavior. The conceptual framing is internally consistent, and the survey of existing platform fragments provides a practical foundation that strengthens the case for a portable standard.

major comments (2)

[Schema and Redaction Model] The section defining the minimal schema and redaction model provides no worked example on a real router nor any argument (formal or informal) demonstrating that the redaction rules preserve sufficient information to reconstruct material routing facts (model version, tier, fallback path, safety handling) while provably avoiding leakage of proprietary decision logic. This assumption is load-bearing for the central claim that receipts can be both useful and safe.
[Survey of Platforms] The documentation-based survey of platforms shows that routing fragments appear in existing systems but does not address or evaluate whether these can be unified into a single portable per-answer receipt format without unacceptable overhead or requiring disclosure of proprietary internals.

minor comments (1)

The distinction between the proposed route receipt and existing platform-specific logs or metadata could be clarified with a small comparison table to improve readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments, which identify key areas where the conceptual proposal can be made more concrete. We address each major comment below and indicate the revisions we will incorporate.

read point-by-point responses

Referee: [Schema and Redaction Model] The section defining the minimal schema and redaction model provides no worked example on a real router nor any argument (formal or informal) demonstrating that the redaction rules preserve sufficient information to reconstruct material routing facts (model version, tier, fallback path, safety handling) while provably avoiding leakage of proprietary decision logic. This assumption is load-bearing for the central claim that receipts can be both useful and safe.

Authors: We agree that the manuscript would be strengthened by an explicit worked example and a clearer articulation of how the redaction model balances reconstructibility and protection. The schema is defined to record only observable material facts (model alias or version, tier, fallback indicator, safety handling flag) while the redaction rules exclude internal routing logic, decision trees, or proprietary heuristics. Although the paper relies on an informal design argument rather than a formal proof, we will add a worked example in the revision using a representative multi-tier router configuration. This example will show step-by-step how a receipt enables reconstruction of the key facts listed by the referee without exposing proprietary elements. We will also expand the surrounding text to make the informal preservation argument explicit. These changes directly address the load-bearing assumption. revision: yes
Referee: [Survey of Platforms] The documentation-based survey of platforms shows that routing fragments appear in existing systems but does not address or evaluate whether these can be unified into a single portable per-answer receipt format without unacceptable overhead or requiring disclosure of proprietary internals.

Authors: The survey is deliberately documentation-based to demonstrate that receipt-like fragments already appear in public platform documentation, thereby grounding the proposal in existing practice rather than pure invention. The unification into a portable per-answer format is the central proposal, with the redaction model intended to ensure no proprietary internals need be disclosed. The manuscript does not contain a quantitative overhead evaluation because it is a conceptual contribution focused on the schema and its rationale. In the revision we will add a short discussion of expected overhead, observing that the schema is intentionally minimal and emitted per request, which aligns with the low-cost logging already performed by serving systems. We maintain that the redaction approach precludes disclosure of proprietary logic by construction. revision: partial

standing simulated objections not resolved

A formal (as opposed to informal) proof that the redaction rules provably avoid leakage of proprietary decision logic would require information-theoretic or cryptographic analysis beyond the scope of this conceptual paper.

Circularity Check

0 steps flagged

Conceptual proposal with no derivations, predictions, or self-referential steps

full rationale

The paper is a definitional proposal introducing the route-receipt concept, a minimal schema, a redaction model, and an observational survey of existing platform fragments. No equations, fitted parameters, predictions, or derivation chains appear in the provided text. All load-bearing content consists of new definitions and documentation-based observations rather than reductions to prior results, self-citations, or inputs by construction. The work is therefore self-contained as a conceptual contribution with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on the domain assumption that hidden routing decisions materially affect user trust and on the invention of the route-receipt artifact itself.

axioms (1)

domain assumption Routing decisions can change cost, quality, or accountability without the user being able to tell what happened
Stated directly in the abstract as the core trust problem.

invented entities (1)

route receipt no independent evidence
purpose: Compact record of the route that served a request for transparency
Newly introduced artifact with a proposed minimal schema and redaction model.

pith-pipeline@v0.9.0 · 5499 in / 1217 out tokens · 53963 ms · 2026-05-10T15:44:03.475367+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Priority processing | OpenAI API

OpenAI API Documentation. “Priority processing | OpenAI API” . Ac- cessed April 29, 2026

2026
[2]

Service tiers

Anthropic API Documentation. “Service tiers” . Accessed April 29, 2026

2026
[3]

Service tiers for optimizing performance and cost

A WS Documentation. “Service tiers for optimizing performance and cost” . Accessed April 29, 2026

2026
[4]

Models | Gemini API

Google AI for Developers. “Models | Gemini API” . Accessed April 29, 2026

2026
[5]

Foundry Models lifecycle and support policy

Microsoft Learn. “Foundry Models lifecycle and support policy” . Accessed April 29, 2026. 20

2026
[6]

Understanding intelligent prompt routing in Ama- zon Bedrock

A WS Documentation. “Understanding intelligent prompt routing in Ama- zon Bedrock” . Accessed April 29, 2026

2026
[7]

Model router for Microsoft Foundry concepts

Microsoft Learn. “Model router for Microsoft Foundry concepts” . Ac- cessed April 29, 2026

2026
[8]

Web search | OpenAI API

OpenAI API Documentation. “Web search | OpenAI API”. Accessed April 29, 2026

2026
[9]

Deployments and endpoints | Generative AI on Vertex AI

Google Cloud Vertex AI Documentation. “Deployments and endpoints | Generative AI on Vertex AI” . Accessed April 29, 2026

2026
[10]

Provider Routing

OpenRouter Documentation. “Provider Routing” . Accessed April 29, 2026

2026
[11]

Zero Data Retention

OpenRouter Documentation. “Zero Data Retention” . Accessed April 29, 2026

2026
[12]

Model Fallbacks

OpenRouter Documentation. “Model Fallbacks”. Accessed April 29, 2026

2026
[13]

arXiv preprint arXiv:2207.10342 , year=

David Dohan, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber, Raphael Gontijo Lopes, Yuhuai Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl-Dickstein, Kevin Murphy, and Charles Sutton. “Language Model Cascades” . arXiv:2207.10342, 2022

work page arXiv 2022
[14]

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Lingjiao Chen, Matei Zaharia, and James Zou. “FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Perfor- mance”. arXiv:2305.05176, 2023

work page internal anchor Pith review arXiv 2023
[15]

RouteLLM: Learning to Route LLMs with Preference Data

Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M. Waleed Kadous, and Ion Stoica. “RouteLLM: Learning to Route LLMs with Preference Data” . arXiv:2406.18665, 2024

work page internal anchor Pith review arXiv 2024
[16]

LLMRouterBench: A massive benchmark and unified framework for LLM routing.arXiv preprint arXiv:2601.07206, 2026

Hao Li, Yiqun Zhang, Zhaoyan Guo, Chenxu Wang, Shengji Tang, Qiaosheng Zhang, Yang Chen, Biqing Qi, Peng Ye, Lei Bai, Zhen Wang, and Shuyue Hu. “LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing” . arXiv:2601.07206, 2026

work page arXiv 2026
[17]

Reasoning models | OpenAI API

OpenAI API Documentation. “Reasoning models | OpenAI API” . Ac- cessed April 29, 2026

2026
[18]

GenerationConfig | Generative AI on Vertex AI

Google Cloud Vertex AI Documentation. “GenerationConfig | Generative AI on Vertex AI” . Accessed April 29, 2026

2026
[19]

Model Cards for Model Reporting

Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. “Model Cards for Model Reporting” . Proceedings of the Conference on Fairness, Accountability, and Transparency (F AT*), 2019

2019
[20]

Artificial Intelligence Risk Management Framework (AI RMF 1.0)

National Institute of Standards and Technology. “Artificial Intelligence Risk Management Framework (AI RMF 1.0)” . NIST AI 100-1, 2023

2023
[21]

OECD AI Principles

OECD. “OECD AI Principles” . OECD Recommendation on Artificial Intelligence, 2019, updated 2024

2019
[22]

Appropriate Reliance on AI Advice: Conceptualization and the Effect of Explanations

Max Schemmer, Niklas Kühl, Carina Benz, Andrea Bartos, and Gerhard Satzger. “Appropriate Reliance on AI Advice: Conceptualization and the Effect of Explanations” . Proceedings of the 28th International Conference on Intelligent User Interfaces (IUI), 2023

2023
[23]

PROV-DM: The PROV Data Model

W3C. “PROV-DM: The PROV Data Model” . W3C Recommendation, 2013

2013
[24]

PROV-O: The PROV Ontology

W3C. “PROV-O: The PROV Ontology” . W3C Recommendation, 2013. 21

2013
[25]

MLflow Tracking

MLflow Documentation. “MLflow Tracking”. Accessed April 29, 2026

2026
[26]

SLSA Provenance

SLSA. “SLSA Provenance”. Version 1.2. Accessed April 29, 2026

2026
[27]

Datasheets for Datasets

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wort- man Vaughan, Hanna Wallach, Hal Daume III, and Kate Crawford. “Datasheets for Datasets” . Communications of the ACM, 64(12), 2021

2021
[28]

Data Statements for Natural Language Processing

Emily M. Bender and Batya Friedman. “Data Statements for Natural Language Processing”. Transactions of the Association for Computational Linguistics, 6, 2018

2018
[29]

FactSheets: Increasing trust in AI services through supplier’s declarations of conformity

Matthew Arnold, Rachel K. E. Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, Aleksandra Mojsilović, Ravi Nair, Karthikeyan Natesan Ramamurthy, Darrell Reimer, Alexandra Olteanu, David Piorkowski, Ja- son Tsay, and Kush R. Varshney. “FactSheets: Increasing trust in AI services through supplier’s declarations of conformity” . IBM Journal of Research a...

2019
[30]

Hidden Technical Debt in Machine Learning Sys- tems

D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. “Hidden Technical Debt in Machine Learning Sys- tems”. Advances in Neural Information Processing Systems 28 (NIPS), 2015

2015
[31]

The ML Test Score: A Rubric for ML Production Readiness and Techni- cal Debt Reduction

Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, and D. Sculley. “The ML Test Score: A Rubric for ML Production Readiness and Techni- cal Debt Reduction” . IEEE International Conference on Big Data, 2017

2017
[32]

TFX: A TensorFlow-Based Production-Scale Machine Learning Platform

Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Naresh Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, and Martin Zinke- vich. “TFX: A TensorFlow-Bas...

2017
[33]

Data Validation for Machine Learning

Eric Breck, Martin Zinkevich, Neoklis Polyzotis, Steven Whang, and Sudip Roy. “Data Validation for Machine Learning” . Proceedings of MLSys, 2019

2019
[34]

Fiddler Observability

Fiddler Documentation. “Fiddler Observability”
[35]

WhyLabs Observe

WhyLabs Documentation. “WhyLabs Observe”
[36]

LLM Tracing and Observability with Arize Phoenix

Arize. “LLM Tracing and Observability with Arize Phoenix”
[37]

OpenAI-Compatible Server

vLLM Documentation. “OpenAI-Compatible Server” . Accessed April 29, 2026

2026
[38]

Production Metrics

SGLang Documentation. “Production Metrics” . Accessed April 29, 2026

2026
[39]

SGLang: Eﬀicient Execution of Structured Language Model Programs

Zheng et al. “SGLang: Eﬀicient Execution of Structured Language Model Programs”
[40]

Regulation (EU) 2024/1689

European Union. “Regulation (EU) 2024/1689”

2024
[41]

ISO/IEC 42001:2023

ISO. “ISO/IEC 42001:2023”

2023
[42]

Sycophancy in GPT-4o: what happened and what we’re doing about it

OpenAI. “Sycophancy in GPT-4o: what happened and what we’re doing about it”
[43]

Data controls in the OpenAI platform

OpenAI API Documentation. “Data controls in the OpenAI platform” . Accessed April 29, 2026. 22

2026
[44]

Web search tool

Anthropic API Documentation. “Web search tool” . Accessed April 29, 2026

2026
[45]

GPT-5.3-Codex Model | OpenAI API

OpenAI API Documentation. “GPT-5.3-Codex Model | OpenAI API” . Accessed April 29, 2026

2026
[46]

Route Receipt Specification

Route Receipt Specification. “Route Receipt Specification” . Maintained by Vincent Schmalbach. Accessed April 30, 2026

2026
[47]

Semantic conventions for generative AI systems

OpenTelemetry. “Semantic conventions for generative AI systems” . Ac- cessed May 1, 2026. Appendix A: Minimal route receipt JSON Schema This schema is intentionally small. It defines a portable receipt object that can be embedded in API responses, logs, audit exports, or benchmark records. Providers can add extension fields under provider_extensions, but ...

2026