Effective LLM inference cost per million output tokens varies 2.5-36x with offered request rate due to utilization, addressed by a concurrency-aware measurement methodology and open-source vLLM tool validated across model types.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Quantized open-weight LMs on consumer hardware match closed-source API accuracy for LM-enhanced relational operators while delivering 390x lower cost and 3.8x lower latency in the BlendSQL framework.
citing papers explorer
-
Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation
Effective LLM inference cost per million output tokens varies 2.5-36x with offered request rate due to utilization, addressed by a concurrency-aware measurement methodology and open-source vLLM tool validated across model types.
-
Large Databases Need Small, Open-Weight Language Models
Quantized open-weight LMs on consumer hardware match closed-source API accuracy for LM-enhanced relational operators while delivering 390x lower cost and 3.8x lower latency in the BlendSQL framework.