The Price Reversal Phenomenon: When Cheaper Reasoning Models Cost More

Chi Zhang; Ion Stoica; James Zou; Lingjiao Chen; Matei Zaharia; Yeye He

arxiv: 2603.23971 · v2 · pith:K35Z4Y7Vnew · submitted 2026-03-25 · 💻 cs.CL · cs.AI· cs.GT· cs.LG· cs.MA

The Price Reversal Phenomenon: When Cheaper Reasoning Models Cost More

Lingjiao Chen , Chi Zhang , Yeye He , Ion Stoica , Matei Zaharia , James Zou This is my paper

classification 💻 cs.CL cs.AIcs.GTcs.LGcs.MA

keywords costlistedactualmodelpricereversalthinkingacross

0 comments

read the original abstract

Developers and consumers increasingly choose reasoning models (RMs) based on their listed API prices. However, how accurately do these prices reflect actual inference costs? We conduct the first systematic study of this question, evaluating 8 frontier RMs across 12 diverse tasks covering competition math, science QA, code generation, and multi-domain agents. We uncover the pricing reversal phenomenon: in 32% of model-pair comparisons, the model with a lower listed price actually incurs a higher total cost, with reversal magnitude reaching up to 28x. For example, Gemini 3 Flash's listed price is 80% cheaper than GPT-5.4's, yet its actual cost across all tasks is 38% higher. We build a formal cost attribution framework based on Shapley value, and leverage it to trace the dominating contributors to vast heterogeneity in thinking token consumption and number of interaction turns: on the same query, one model may use 900% more thinking tokens than another, or 10x more turns of environment interactions. We further show that per-query cost prediction is fundamentally difficult: repeated runs of the same query yield thinking token variation up to 9.7x, establishing an irreducible noise floor for any predictor. Thus, we propose cost distribution prediction as an open challenge. Our findings demonstrate that listed API pricing is an unreliable proxy for actual cost, calling for cost-aware model selection and transparent per-request cost monitoring.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Statistically-Lossless Quantization of Large Language Models
cs.LG 2026-05 unverdicted novelty 6.0

SLQ achieves task-lossless LLM quantization below 4 bits per parameter and distribution-lossless at 5-6 bits on average, with 1.7-3.6x speedups over FP16.