pith. sign in

arxiv: 2505.21627 · v4 · pith:ODLP5BJOnew · submitted 2025-05-27 · 💻 cs.GT · cs.AI· cs.CY· cs.LG

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

classification 💻 cs.GT cs.AIcs.CYcs.LG
keywords mechanismpricingusersmodelprovidertokenslanguagelarge
0
0 comments X
read the original abstract

State-of-the-art large language models require specialized hardware and substantial energy to operate. As a consequence, cloud-based services that provide access to large language models have become very popular. In these services, the price users pay for an output provided by a model depends on the number of tokens the model uses to generate it: they pay a fixed price per token. In this work, we show that this pricing mechanism creates a financial incentive for providers to strategize and misreport the (number of) tokens a model used to generate an output, and users cannot prove, or even know, whether a provider is overcharging them. However, we also show that, if an unfaithful provider is obliged to be transparent about the generative process used by the model, misreporting optimally without raising suspicion is hard. Nevertheless, as a proof-of-concept, we develop an efficient heuristic algorithm that allows providers to significantly overcharge users without raising suspicion. Crucially, we demonstrate that the cost of running the algorithm is lower than the additional revenue from overcharging users, highlighting the vulnerability of users under the current pay-per-token pricing mechanism. Further, we show that, to eliminate the financial incentive to strategize, a pricing mechanism must price tokens linearly on their character count. While this makes a provider's profit margin vary across tokens, we introduce a simple prescription under which the provider who adopts such an incentive-compatible pricing mechanism can maintain the average profit margin they had under the pay-per-token pricing mechanism. Along the way, to illustrate and complement our theoretical results, we conduct experiments with several large language models from the $\texttt{Llama}$, $\texttt{Gemma}$ and $\texttt{Ministral}$ families, and input prompts from the LMSYS Chatbot Arena platform.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Regret Minimization in Single-Dimensional Contract-Design with Binary Actions

    cs.GT 2026-06 unverdicted novelty 7.0

    Derives tight Θ(T^{2/3}) regret independent of outcome count m for adversarial agent types and Õ(√T) regret via explore-then-commit for fixed hidden type in single-dimensional binary-action contract design.