pith. machine review for the scientific record. sign in

arxiv: 2605.09104 · v1 · submitted 2026-05-09 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

Token Economics for LLM Agents: A Dual-View Study from Computing and Economics

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:30 UTC · model grok-4.3

classification 💻 cs.AI
keywords token economicsLLM agentsagentic AIneoclassical firm theorytransaction cost economicsmechanism designmulti-agent systemsAI security
0
0 comments X

The pith

Tokens act as production factors, exchange mediums, and units of account in LLM agents, enabling a four-level economic taxonomy to organize optimization of quality against computational cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey unifies computer science and economics to treat tokens as the central economic primitives of agentic AI systems. It organizes existing work into four dimensions: micro-level optimization for single agents using budget-constrained factor substitution, meso-level reduction of collaboration friction in multi-agent groups, macro-level handling of congestion and pricing in agent ecosystems, and security by treating threats as internal economic constraints. A reader would care because exponential token consumption creates hard limits on agent reliability and scalability that technical fixes alone have not resolved. The paper argues that viewing tokens through neoclassical firm theory, transaction cost economics, principal-agent models, and mechanism design reveals systematic trade-offs between output quality and resource expenditure. This synthesis supplies a foundation for designing more efficient next-generation agent architectures.

Core claim

The paper claims that tokens function as production factors, exchange mediums, and units of account, and that the scattered literature on LLM agent systems can be synthesized into a four-dimensional taxonomy. Micro-level analysis applies neoclassical firm theory to let single agents substitute token uses under fixed budgets. Meso-level analysis draws on transaction cost and principal-agent theories to lower friction in multi-agent collaboration. Macro-level analysis employs mechanism design to resolve congestion externalities and set prices across agent ecosystems. Security analysis recasts adversarial threats as endogenous economic constraints that must be internalized. The resulting view,

What carries the argument

The four-dimensional taxonomy that reframes tokens as economic primitives and maps literature to micro (single-agent), meso (multi-agent), macro (ecosystem), and security levels using firm theory, transaction costs, principal-agent models, and mechanism design.

If this is right

  • Single agents can substitute among token-consuming actions under budget limits to raise output quality per unit cost.
  • Multi-agent systems can be structured to cut transaction costs and align incentives between collaborating agents.
  • Agent ecosystems can apply pricing rules to reduce congestion externalities and allocate tokens more efficiently.
  • Security defenses can be designed as ways to make adversarial attacks carry direct economic costs for the attacker.
  • Future systems could incorporate differentiable token budgets that respond to changing economic conditions and dynamic token markets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agent designers could test whether embedding the taxonomy improves measurable performance metrics such as task completion rate per token spent in controlled benchmarks.
  • The framework suggests that large-scale agent deployments might require new market mechanisms for token allocation, similar to how compute resources are traded today.
  • Security research could gain from treating attack surfaces as budget items whose costs agents must internalize in their planning loops.
  • The approach opens a path to agent regulations that price aggregate token consumption as an externality with societal impact.

Load-bearing premise

The fragmented research on LLM agent optimization, architecture, and security fits accurately and without major omissions into the proposed four-dimensional economic taxonomy.

What would settle it

A systematic scan of recent LLM agent papers that identifies a substantial cluster of techniques or problems that cannot be placed in any of the four taxonomy categories would falsify the claim of comprehensive synthesis.

read the original abstract

As LLM agents evolve, tokens have emerged as the core economic primitives of Agentic AI. However, their exponential consumption introduces severe computational, collaborative, and security bottlenecks. Current surveys remain fragmented across system optimization, architecture design, and trust, lacking a unified framework to evaluate the fundamental trade-off between output quality and economic cost. To bridge this gap, this survey presents the first comprehensive survey of Token Economics. By unifying computer science and economics, we conceptualize tokens as production factors, exchange mediums, and units of account. We synthesize existing literature across a four-dimensional taxonomy: (1) Micro-level (Single Agent): Optimizing budget-constrained factor substitution via neoclassical firm theory. (2) Meso-level (Multi-Agent Systems): Minimizing collaboration friction using transaction cost and principal-agent theories. (3) Macro-level (Agent Ecosystems): Addressing congestion externalities and pricing via mechanism design. (4) Security: Internalizing adversarial threats as endogenous economic constraints. Finally, we outline frontier directions, including differentiable token budgets and dynamic markets, to lay the theoretical foundation for scalable next-generation agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript is a survey paper that unifies computer science and economics perspectives on token economics for LLM agents. It conceptualizes tokens as production factors, exchange mediums, and units of account, then synthesizes fragmented literature on system optimization, architecture design, and trust into a four-dimensional taxonomy: (1) micro-level single-agent optimization of budget-constrained factor substitution using neoclassical firm theory; (2) meso-level multi-agent collaboration minimizing friction via transaction cost and principal-agent theories; (3) macro-level agent ecosystems addressing congestion via mechanism design; and (4) security internalizing adversarial threats as endogenous economic constraints. It concludes by outlining future directions such as differentiable token budgets and dynamic markets.

Significance. If the taxonomy provides an accurate, non-overlapping, and comprehensive synthesis of the literature without forcing concepts into categories, the work could offer a useful interdisciplinary lens for analyzing quality-cost trade-offs in scalable LLM agent systems. As a conceptual survey without new derivations, empirical results, or machine-checked proofs, its primary value would lie in organizing existing work to guide future research on efficient and secure agentic AI.

major comments (2)
  1. [Abstract / Taxonomy description] Abstract and taxonomy overview: The claim that the four dimensions are exhaustive and non-overlapping is not yet substantiated; for example, security considerations (dimension 4) appear relevant to micro-level optimization and meso-level collaboration as well, raising the risk of conceptual overlap that could undermine the taxonomy's utility as a clean organizing framework.
  2. [Abstract] The synthesis of literature: The abstract asserts a mapping of CS concerns (optimization, architecture, trust) onto specific economic theories without providing concrete citations or examples in the provided text; this makes it impossible to verify whether high-impact strands (e.g., specific works on LLM inference optimization or multi-agent coordination) fit without misalignment or omission, which is load-bearing for the central claim of a 'comprehensive' unification.
minor comments (2)
  1. [Abstract] The abstract is information-dense; consider using enumerated sub-bullets or a table to present the four taxonomy dimensions for improved readability.
  2. [Future directions] Clarify the scope of 'frontier directions' section by distinguishing speculative ideas (e.g., differentiable token budgets) from those with preliminary supporting references.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments, which help clarify the presentation of our taxonomy and the substantiation of our literature synthesis. We address each point below and propose targeted revisions to strengthen the manuscript's clarity as an interdisciplinary organizing framework.

read point-by-point responses
  1. Referee: [Abstract / Taxonomy description] Abstract and taxonomy overview: The claim that the four dimensions are exhaustive and non-overlapping is not yet substantiated; for example, security considerations (dimension 4) appear relevant to micro-level optimization and meso-level collaboration as well, raising the risk of conceptual overlap that could undermine the taxonomy's utility as a clean organizing framework.

    Authors: We acknowledge that security threats can intersect with micro-level optimization (e.g., adversarial inputs affecting single-agent token budgets) and meso-level collaboration (e.g., trust issues in multi-agent interactions). Our taxonomy, however, is organized by primary analytical lens rather than strict exclusivity: the micro dimension applies neoclassical firm theory to budget-constrained factor substitution under standard conditions; the meso dimension uses transaction cost and principal-agent theories to address collaboration frictions; and the security dimension specifically models adversarial behavior as endogenous economic constraints at the system level. This structure draws from established economic practice of separating normal-operation analysis from threat internalization. To address the concern, we will revise the taxonomy overview section to explicitly discuss inter-dimensional interactions, provide boundary examples, and include a clarifying diagram or table. We do not claim mathematical exhaustiveness but rather a comprehensive synthesis of existing literature strands. revision: partial

  2. Referee: [Abstract] The synthesis of literature: The abstract asserts a mapping of CS concerns (optimization, architecture, trust) onto specific economic theories without providing concrete citations or examples in the provided text; this makes it impossible to verify whether high-impact strands (e.g., specific works on LLM inference optimization or multi-agent coordination) fit without misalignment or omission, which is load-bearing for the central claim of a 'comprehensive' unification.

    Authors: Abstracts conventionally omit citations to maintain brevity. The full manuscript substantiates the mappings in dedicated sections with concrete citations and examples. The micro-level section maps LLM inference optimization literature (e.g., works on token-efficient serving and budget-constrained decoding) to neoclassical production theory. The meso-level section aligns multi-agent coordination papers with transaction cost economics and principal-agent models. The macro and security dimensions similarly reference mechanism design and adversarial economics literature. These mappings are load-bearing and are detailed with references throughout the body. We will not alter the abstract but will ensure the introduction and taxonomy sections cross-reference the detailed syntheses more explicitly for readers who encounter only the abstract. revision: no

Circularity Check

0 steps flagged

No circularity: conceptual survey with external theoretical grounding

full rationale

The paper is a literature survey that organizes existing work on token usage in LLM agents into a four-dimensional taxonomy by mapping CS concerns onto standard neoclassical economics, transaction-cost economics, mechanism design, and principal-agent theory. No equations, fitted parameters, predictions, or derivations appear in the abstract or described content. The taxonomy is presented as a synthesis framework rather than a result derived from the paper's own inputs or prior self-citations. No self-definitional loops, fitted-input predictions, uniqueness theorems, or ansatz smuggling are present. The central claim rests on external literature and established economic primitives, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 4 axioms · 0 invented entities

The survey rests on standard economic theories applied to the new domain of LLM agents without introducing free parameters, new entities, or ad-hoc inventions.

axioms (4)
  • domain assumption Neoclassical firm theory applies to optimizing budget-constrained factor substitution for single LLM agents.
    Invoked for the micro-level analysis in the taxonomy.
  • domain assumption Transaction cost theory and principal-agent theory can minimize collaboration friction in multi-agent LLM systems.
    Invoked for the meso-level analysis.
  • domain assumption Mechanism design addresses congestion externalities and pricing in agent ecosystems.
    Invoked for the macro-level analysis.
  • domain assumption Adversarial threats can be treated as endogenous economic constraints for security.
    Invoked for the security dimension.

pith-pipeline@v0.9.0 · 5525 in / 1624 out tokens · 48003 ms · 2026-05-12T02:30:26.307791+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

201 extracted references · 201 canonical work pages · 14 internal anchors

  1. [1]

    T oken reduction should go beyond efficiency in generative models–from vision, language to multimodality

    Zhenglun Kong, Yize Li, Fanhu Zeng, Lei Xin, Shvat Messica, Xue Lin, Pu Zhao, Manolis Kellis, Hao Tang, and Marinka Zitnik. T oken reduction should go beyond efficiency in generative models–from vision, language to multimodality. arXiv preprint arXiv:2505.18227, 2025

  2. [2]

    Memorybank: Enhancing large language models with long-term memory

    Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024

  3. [3]

    MasRouter: Learning to route LLMs for multi-agent systems

    Yanwei Yue, Guibin Zhang, Boyang Liu, Guancheng Wan, Kun Wang, Dawei Cheng, and Yiyan Qi. MasRouter: Learning to route LLMs for multi-agent systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics, 2025

  4. [4]

    TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

    Zhuohang Bian, Feiyang Wu, Chengrui Zhang, Hangcheng Dong, Yun Liang, and Youwei Zhuo. T okendance: Scaling multi-agent llm serving via collective kv cache sharing. arXiv preprint arXiv:2604.03143, 2026

  5. [5]

    Subbalakshmi, Jimin Huang, Lingfei Qian, Xueqing Peng, Jordan W

    Haohang Li, Yupeng Cao, Yangyang Yu, Shashidhar Reddy Javaji, Zhiyang Deng, Yueru He, Yuechen Jiang, Zining Zhu, K.p. Subbalakshmi, Jimin Huang, Lingfei Qian, Xueqing Peng, Jordan W. Suchow , and Qianqian Xie. INVESTORBENCH: A benchmark for financial decision-making tasks with LLM-based agent. In Proceedings of the Annual Meeting of the Association for Com...

  6. [6]

    GraphRAG: Leveraging graph-based efficiency to minimize hallucinations in LLM-driven RAG for finance data

    Mariam Barry , Gaetan Caillaut, Pierre Halftermeyer, Raheel Qader, Mehdi Mouayad, Fabrice Le Deit, Dimitri Cariolaro, and Joseph Gesnouin. GraphRAG: Leveraging graph-based efficiency to minimize hallucinations in LLM-driven RAG for finance data. In Proceedings of the Workshop on Generative AI and Knowledge Graphs, 2025

  7. [7]

    Time travel is cheating: Going live with deepfund for real-time fund investment benchmarking

    Changlun Li, Yao Shi, Chen Wang, Qiqi Duan, Runke Ruan, Weijie Huang, Haonan Long, Lijun Huang, Nan Tang, and Yuyu Luo. Time travel is cheating: Going live with deepfund for real-time fund investment benchmarking. In Advances in Neural Information Processing Systems Datasets and Benchmarks T rack, 2025

  8. [8]

    NitiBench: Benchmark- ing LLM frameworks on Thai legal question answering capabilities

    Pawitsapak Akarajaradwong, Pirat Pothavorn, Chompakorn Chaksangchaichot, Panuthep Ta- sawong, Thitiwat Nopparatbundit, Keerakiat Pratai, and Sarana Nutanong. NitiBench: Benchmark- ing LLM frameworks on Thai legal question answering capabilities. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2025

  9. [9]

    Legalagentbench: Evaluating llm agents in legal domain

    Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, Wuyue Wang, Yiqun Liu, and Minlie Huang. Legalagentbench: Evaluating llm agents in legal domain. In Proceedings of the Annual Meeting of the Association for Computational Lin- guistics, 2025

  10. [10]

    Lexrag: Benchmarking retrieval-augmented generation in multi-turn legal consultation conversation

    Haitao Li, Yifan Chen, Yiran Hu, Qingyao Ai, Junjie Chen, Xiaoyu Yang, Jianhui Yang, Yueyue Wu, Zeyang Liu, and Yiqun Liu. Lexrag: Benchmarking retrieval-augmented generation in multi-turn legal consultation conversation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval , 2025. 47 T oken Economi...

  11. [11]

    De novo design of gpcr exoframe modulators

    Shizhuo Cheng, Jia Guo, Yun-li Zhou, Xumei Luo, Gufang Zhang, Ya-zhi Zhang, Yixin Yang, Jiannan Xie, Ping Xu, Dan-dan Shen, Shaokun Zang, Huicui Yang, Xuechu Zhen, Min Zhang, and Yan Zhang. De novo design of gpcr exoframe modulators. Nature, 651:1–9, 2026

  12. [12]

    Ask patients with patience: Enabling LLMs for human-centric medical dialogue with grounded reasoning

    Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu, Fenglin Liu, and Junde Wu. Ask patients with patience: Enabling LLMs for human-centric medical dialogue with grounded reasoning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2025

  13. [13]

    Sijing Li, Zhongwei Qiu, Jiang Liu, Wenqiao Zhang, Tianwei Lin, Yihan Xie, Jianxiang An, Boxiang Yun, Chenglin Yang, Jun Xiao, Guangyu Guo, Jiawen Yao, Wei Liu, Yuan Gao, Ke Yan, Weiwei Cao, Zhilin Zheng, T ony C. W. MOK, Kai Cao, Yu Shi, Jiuyu Zhang, Jian Zhou, Beng Chin Ooi, Yingda Xia, and Ling Zhang. T umorchain: Interleaved multimodal chain-of-though...

  14. [14]

    Energy and AI, 2025

    International Energy Agency. Energy and AI, 2025

  15. [15]

    Transcend- ing cost-quality tradeoff in agent serving via session-awareness

    Yanyu Ren, Li Chen, Dan Li, Xizheng Wang, Zhiyuan Wu, Yukai Miao, and Yu Bai. Transcend- ing cost-quality tradeoff in agent serving via session-awareness. In Advances in Neural Information Processing Systems, 2026

  16. [16]

    TTD-SQL: Tree-guided token decoding for efficient and schema-aware SQL gen- eration

    Chetan Sharma, Ramasuri Narayanam, Soumyabrata Pal, Kalidas Yeturu, Shiv Kumar Saini, and Koyel Mukherjee. TTD-SQL: Tree-guided token decoding for efficient and schema-aware SQL gen- eration. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: Industry T rack, 2025

  17. [17]

    Flow: Modu- larized agentic workflow automation

    Boye Niu, Yiliao Song, Kai Lian, Yifan Shen, Yu Yao, Kun Zhang, and T ongliang Liu. Flow: Modu- larized agentic workflow automation. In International Conference on Learning Representations , 2025

  18. [18]

    AMAS: Adaptively determining communication topology for LLM-based multi-agent system

    Hui Yi Leong, Yuheng Li, Yuqing Wu, Wenwen Ouyang, Wei Zhu, Jiechao Gao, and Wei Han. AMAS: Adaptively determining communication topology for LLM-based multi-agent system. InProceedings of the Conference on Empirical Methods in Natural Language Processing: Industry T rack, 2025

  19. [19]

    Cut the crap: An economical communication pipeline for llm-based multi-agent systems

    Guibin Zhang, Yanwei Yue, Zhixun Li, Sukwon Yun, Guancheng Wan, Kun Wang, Dawei Cheng, Jeffrey Yu, and Tianlong Chen. Cut the crap: An economical communication pipeline for llm-based multi-agent systems. In International Conference on Learning Representations, 2025

  20. [20]

    A survey on llm-based multi-agent systems: workflow , infrastructure, and challenges

    Xinyi Li, Sai Wang, Siqi Zeng, Yu Wu, and Yi Yang. A survey on llm-based multi-agent systems: workflow , infrastructure, and challenges. Vicinagearth, 1(1):9, 2024

  21. [21]

    The rise and potential of large language model based agents: A survey

    Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongx- iang Weng, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Qi Zhang, and Tao Gui. ...

  22. [22]

    T owards efficient generative large language model serving: A survey from algorithms to systems

    Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, and Zhihao Jia. T owards efficient generative large language model serving: A survey from algorithms to systems. ACM Computing Surveys, 58(1):1–37, 2025. 48 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics

  23. [23]

    Resource- efficient algorithms and systems of foundation models: A survey

    Mengwei Xu, Dongqi Cai, Wangsong Yin, Shangguang Wang, Xin Jin, and Xuanzhe Liu. Resource- efficient algorithms and systems of foundation models: A survey. ACM Computing Surveys, 57(5):1– 39, 2025

  24. [24]

    AI agents under threat: A survey of key security challenges and future pathways

    Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. AI agents under threat: A survey of key security challenges and future pathways. ACM Computing Surveys, 57(7):1–36, 2025

  25. [25]

    The emerged security and privacy of llm agent: A survey with case studies

    Feng He, Tianqing Zhu, Dayong Ye, Bo Liu, Wanlei Zhou, and Philip S Yu. The emerged security and privacy of llm agent: A survey with case studies. ACM Computing Surveys, 58(6):1–36, 2025

  26. [26]

    T utoring efficacy , household substitution, and student achievement: Experimental evidence from an after-school tutoring program in rural china

    Jere R Behrman, C Simon Fan, Naijia Guo, Xiangdong Wei, Hongliang Zhang, and Junsen Zhang. T utoring efficacy , household substitution, and student achievement: Experimental evidence from an after-school tutoring program in rural china. International Economic Review, 65(1):149–189, 2024

  27. [27]

    Testing the production approach to markup estimation

    Devesh Raval. Testing the production approach to markup estimation. Review of Economic Studies , 90(5):2592–2611, 2023

  28. [28]

    T oolformer: Language models can teach them- selves to use tools

    Timo Schick, Jane Dwivedi- Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. T oolformer: Language models can teach them- selves to use tools. In Advances in Neural Information Processing Systems, 2023

  29. [29]

    The neoclassical theory of firm investment and taxes: A reassessment

    Gabriel Chodorow-Reich. The neoclassical theory of firm investment and taxes: A reassessment. Technical report, National Bureau of Economic Research, 2025

  30. [30]

    Not a typical firm: Capital–labor substitution and firms’ labor shares

    Joachim Hubmer and Pascual Restrepo. Not a typical firm: Capital–labor substitution and firms’ labor shares. American Economic Journal: Macroeconomics , 18(2):34–71, 2026

  31. [31]

    Firm performance in digitally integrated supply chains: a combined perspective of transaction cost economics and relational exchange theory

    Kiran Patil, Vipul Garg, Janeth Gabaldon, Himali Patil, Suman Niranjan, and Timothy Hawkins. Firm performance in digitally integrated supply chains: a combined perspective of transaction cost economics and relational exchange theory. Journal of Enterprise Information Management , 37(2):381– 413, 2024

  32. [32]

    Principal-agent vcg contracts

    Ron Lavi and Elisheva S Shamash. Principal-agent vcg contracts. Journal of Economic Theory , 201:105443, 2022

  33. [33]

    Stablefees: A predictable fee market for cryptocurrencies

    Soumya Basu, David Easley , Maureen OHara, and Emin Gün Sirer. Stablefees: A predictable fee market for cryptocurrencies. Management Science, 69(11):6508–6524, 2023

  34. [34]

    A theory of simplicity in games and mechanism design.Econometrica, 91(4):1495–1526, 2023

    Marek Pycia and Peter Troyan. A theory of simplicity in games and mechanism design.Econometrica, 91(4):1495–1526, 2023

  35. [35]

    Variety-based congestion in online markets: Evidence from mobile apps

    Daniel Ershov . Variety-based congestion in online markets: Evidence from mobile apps. American Economic Journal: Microeconomics , 16(2):180–203, 2024

  36. [36]

    A neural probabilistic language model

    Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of machine learning research , 3:1137–1155, 2003

  37. [37]

    Efficient estimation of word represen- tations in vector space

    T omas Mikolov , Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word represen- tations in vector space. In International Conference on Learning Representations Workshop, 2013

  38. [38]

    Glove: Global vectors for word representation

    Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the conference on empirical methods in natural language processing, 2014. 49 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics

  39. [39]

    Neural machine translation of rare words with subword units

    Rico Sennrich, Barry Haddow , and Alexandra Birch. Neural machine translation of rare words with subword units. In Proceedings of the annual meeting of the association for computational linguistics , 2016

  40. [40]

    Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

    Taku Kudo and John Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the conference on empirical methods in natural language processing: System demonstrations , 2018

  41. [41]

    Neural discrete representation learn- ing

    Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learn- ing. In Advances in Neural Information Processing Systems, pages 6306–6315, 2017

  42. [42]

    Generating diverse high-fidelity images with vq-vae-2

    Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with vq-vae-2. In Advances in Neural Information Processing Systems, 2019

  43. [43]

    and Van Durme, B

    Jeffrey Cheng and Benjamin Van Durme. Compressed chain of thought: Efficient reasoning through dense representations. arXiv preprint arXiv:2412.13171, 2024

  44. [44]

    CODI: compress- ing chain-of-thought into continuous space via self-distillation

    Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, and Yulan He. CODI: compress- ing chain-of-thought into continuous space via self-distillation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2025

  45. [45]

    R1-compress: Long chain-of-thought compression via chunk compression and search.arXiv preprint arXiv:2505.16838, 2025

    Yibo Wang, Haotian Luo, Huanjin Yao, Tiansheng Huang, Haiying He, Rui Liu, Naiqiang Tan, Jiax- ing Huang, Xiaochun Cao, Dacheng Tao, and Li Shen. R1-compress: Long chain-of-thought com- pression via chunk compression and search. arXiv preprint arXiv:2505.16838, 2025

  46. [46]

    T okenskip: Controllable chain-of-thought compression in llms

    Heming Xia, Chak T ou Leong, Wenjie Wang, Yongqi Li, and Wenjie Li. T okenskip: Controllable chain-of-thought compression in llms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2025

  47. [47]

    Training large language models to reason in a continuous latent space

    Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason E Weston, and Yuandong Tian. Training large language models to reason in a continuous latent space. In Second Conference on Language Modeling, 2025

  48. [48]

    Dynamic early exit in reasoning models

    Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Qiaowei Li, Minghui Chen, Zheng Lin, and Weiping Wang. Dynamic early exit in reasoning models. In International Conference on Learning Representations, 2026

  49. [49]

    arXiv preprint arXiv:2505.13949 , year=

    Guochao Jiang, Guofeng Quan, Zepeng Ding, Ziqin Luo, Dixuan Wang, and Zheng Hu. Flashthink: An early exit method for efficient reasoning. arXiv preprint arXiv:2505.13949, 2025

  50. [50]

    Flashattention: Fast and memory-efficient exact attention with io-awareness

    Tri Dao, Daniel Y Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and memory-efficient exact attention with io-awareness. In Advances in Neural Information Processing Systems, 2022

  51. [51]

    Big bird: Trans- formers for longer sequences

    Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey , Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. Big bird: Trans- formers for longer sequences. In Advances in Neural Information Processing Systems, 2020

  52. [52]

    Transformers are RNNs: Fast autoregressive transformers with linear attention

    Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Proceedings of the International Conference on Machine Learning, 2020. 50 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics

  53. [53]

    Snapkv: Llm knows what you are looking for before generation

    Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, and Deming Chen. Snapkv: Llm knows what you are looking for before generation. In Advances in Neural Information Processing Systems, 2024

  54. [54]

    H2o: Heavy-hitter oracle for efficient generative inference of large language models

    Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang "Atlas" Wang, and Beidi Chen. H2o: Heavy-hitter oracle for efficient generative inference of large language models. In Advances in Neural Information Processing Systems, 2023

  55. [55]

    OPTQ: Accurate quantization for generative pre-trained transformers

    Elias Frantar, Saleh Ashkboos, T orsten Hoefler, and Dan Alistarh. OPTQ: Accurate quantization for generative pre-trained transformers. In International Conference on Learning Representations, 2023

  56. [56]

    Victor Sanh, Thomas Wolf, and Alexander M. Rush. Movement pruning: Adaptive sparsity by fine-tuning. In Hugo Larochelle, Marc’ Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems, 2020

  57. [57]

    Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

    William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research , 23(120):1–39, 2022

  58. [58]

    ST-MoE: Designing Stable and Transferable Sparse Expert Models

    Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. St-moe: Designing stable and transferable sparse expert models. arXiv preprint arXiv:2202.08906, 2022

  59. [59]

    Fast inference from transformers via speculative decoding

    Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative decoding. In Proceedings of the International Conference on Machine Learning , 2023

  60. [60]

    Draft & verify: Lossless large language model acceleration via self-speculative decoding

    Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Gang Chen, and Sharad Mehrotra. Draft & verify: Lossless large language model acceleration via self-speculative decoding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics , 2024

  61. [61]

    LLMLingua: Compress- ing prompts for accelerated inference of large language models

    Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LLMLingua: Compress- ing prompts for accelerated inference of large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2023

  62. [62]

    LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression

    Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024

  63. [63]

    Learning to compress prompts with gist tokens

    Jesse Mu, Xiang Li, and Noah Goodman. Learning to compress prompts with gist tokens. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Informa- tion Processing Systems, volume 36, pages 19327–19352. Curran Associates, Inc., 2023

  64. [64]

    Adapting language models to compress contexts

    Alexis Chevalier, Alexander Wettig, Anirudh Ajith, and Danqi Chen. Adapting language models to compress contexts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2023

  65. [65]

    Compressing context to enhance inference efficiency of large language models

    Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. Compressing context to enhance inference efficiency of large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2023

  66. [66]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Memgpt: T owards llms as operating systems. arXiv preprint arXiv:2310.08560, 2024. 51 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics

  67. [67]

    Bernstein

    Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the Annual ACM Symposium on User Interface Software and T echnology, 2023

  68. [68]

    Reflexion: language agents with verbal reinforcement learning

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems, 2023

  69. [69]

    A-mem: Agentic memory for llm agents

    Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents. Advances in Neural Information Processing Systems, 2026

  70. [70]

    Mem0: Building production-ready AI agents with scalable long-term memory

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav . Mem0: Building production-ready AI agents with scalable long-term memory. In Proceedings of the European Confer- ence on Artificial Intelligence , 2025

  71. [71]

    Patil, Tianjun Zhang, Xin Wang, and Joseph E

    Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive apis. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2024

  72. [72]

    T oolLLM: Facilitating large language models to master 16000+ real-world APIs

    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. T oolLLM: Facilitating large language models to master 16000+ real-world APIs. In International Conference on Learning Represent...

  73. [73]

    Anytool: Self-reflective, hierarchical agents for large- scale API calls

    Yu Du, Fangyun Wei, and Hongyang Zhang. Anytool: Self-reflective, hierarchical agents for large- scale API calls. In Proceedings of the International Conference on Machine Learning , 2024

  74. [74]

    T oolRL: Reward is all tool learning needs

    Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-T ur, Gokhan T ur, and Heng Ji. T oolRL: Reward is all tool learning needs. In Advances in Neural Information Processing Systems, 2025

  75. [75]

    Model context protocol (mcp): Land- scape, security threats, and future research directions

    Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (mcp): Land- scape, security threats, and future research directions. ACM T ransactions on Software Engineering and Methodology, 2026

  76. [76]

    Self-rag: Learning to retrieve, generate, and critique through self-reflection

    Akari Asai, Zeqiu Wu, Yizhong Wang, Avi Sil, and Hannaneh Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection. In International Conference on Learning Repre- sentations, 2024

  77. [77]

    Corrective Retrieval Augmented Generation

    Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. Corrective retrieval augmented generation. arXiv preprint arXiv:2401.15884, 2024

  78. [78]

    Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity

    Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies, 2024

  79. [79]

    From Local to Global: A Graph RAG Approach to Query-Focused Summarization

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley , Alex Chao, Apurva Mody , Steven Truitt, and Jonathan Larson. From local to global: A graph RAG approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024. 52 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics

  80. [80]

    Raptor: Recursive abstractive processing for tree-organized retrieval

    Parth Sarthi, Salman Abdullah, Aditi T uli, Shubh Khanna, Anna Goldie, and Christopher Manning. Raptor: Recursive abstractive processing for tree-organized retrieval. In International Conference on Learning Representations, 2024

Showing first 80 references.