The Token Tax of Epistemic Accuracy: Comparing RAG and Long-Context Architectures for Document-Grounded Generative AI Applications

Arthur Carvalho; Austin Hamilton; Fadel M. Megahed; Ibrahim Yousif; Lora A. Cavuoto; Michael Wise; Mohammad Mayyas; Ryan Singh; Zhe Shan

arxiv: 2606.20898 · v1 · pith:NXKKKL4Snew · submitted 2026-06-18 · 💻 cs.IR · cs.AI· cs.CL· cs.CY

The Token Tax of Epistemic Accuracy: Comparing RAG and Long-Context Architectures for Document-Grounded Generative AI Applications

Austin Hamilton , Ryan Singh , Michael Wise , Ibrahim Yousif , Arthur Carvalho , Zhe Shan , Mohammad Mayyas , Lora A. Cavuoto

show 1 more author

Fadel M. Megahed

This is my paper

classification 💻 cs.IR cs.AIcs.CLcs.CY

keywords tokenaccessepistemiclong-contextpromptingaccuracyarchitecturesbroader

0 comments

read the original abstract

Document-grounded assistants built on large language models are increasingly used in high-stakes, knowledge-intensive work. Their usefulness, however, may depend on how evidence is allocated before generation. We investigate such a claim by comparing two grounding architectures: (a) retrieval-augmented generation (RAG) that retrieves a few relevant passages, and (b) long-context prompting, which loads the whole document collection in context. We view these as two regimes of "epistemic access" on an accuracy--cost frontier. We use "epistemic accuracy" to capture model correctness that depends on having the right evidence. We posit that broader access (via long context) can increase it, but with a "token tax" (i.e., a substantial increase in cost due to larger input token consumption). We probe this framing with a case study in manufacturing safety training. Using an expert-validated benchmark, we evaluate 972 answers across three machines, two small language models, and three retrieval/in-context prompting approaches. Long-context prompting achieved the highest correctness (73.1% vs. 65.4% for semantic RAG), but at 26 times the per-query token cost. We interpret this gap as the token tax of broader evidentiary access. We carefully discuss the implications of our findings for resource-constrained organizations.

This paper has not been read by Pith yet.

The Token Tax of Epistemic Accuracy: Comparing RAG and Long-Context Architectures for Document-Grounded Generative AI Applications

discussion (0)