arxiv: 2605.09104 · v1 · submitted 2026-05-09 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

Token Economics for LLM Agents: A Dual-View Study from Computing and Economics

Yuxi Chen , Junming Chen , Chenyu He , Yiwei Li , Yicheng Ji , Yifan Wu , Dingyu Yang , Lansong Diao

show 4 more authors

Lidan Shou Hongliang Zhang Huan Li Gang Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:30 UTC · model grok-4.3

classification 💻 cs.AI

keywords token economicsLLM agentsagentic AIneoclassical firm theorytransaction cost economicsmechanism designmulti-agent systemsAI security

0 comments

The pith

Tokens act as production factors, exchange mediums, and units of account in LLM agents, enabling a four-level economic taxonomy to organize optimization of quality against computational cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey unifies computer science and economics to treat tokens as the central economic primitives of agentic AI systems. It organizes existing work into four dimensions: micro-level optimization for single agents using budget-constrained factor substitution, meso-level reduction of collaboration friction in multi-agent groups, macro-level handling of congestion and pricing in agent ecosystems, and security by treating threats as internal economic constraints. A reader would care because exponential token consumption creates hard limits on agent reliability and scalability that technical fixes alone have not resolved. The paper argues that viewing tokens through neoclassical firm theory, transaction cost economics, principal-agent models, and mechanism design reveals systematic trade-offs between output quality and resource expenditure. This synthesis supplies a foundation for designing more efficient next-generation agent architectures.

Core claim

The paper claims that tokens function as production factors, exchange mediums, and units of account, and that the scattered literature on LLM agent systems can be synthesized into a four-dimensional taxonomy. Micro-level analysis applies neoclassical firm theory to let single agents substitute token uses under fixed budgets. Meso-level analysis draws on transaction cost and principal-agent theories to lower friction in multi-agent collaboration. Macro-level analysis employs mechanism design to resolve congestion externalities and set prices across agent ecosystems. Security analysis recasts adversarial threats as endogenous economic constraints that must be internalized. The resulting view,

What carries the argument

The four-dimensional taxonomy that reframes tokens as economic primitives and maps literature to micro (single-agent), meso (multi-agent), macro (ecosystem), and security levels using firm theory, transaction costs, principal-agent models, and mechanism design.

If this is right

Single agents can substitute among token-consuming actions under budget limits to raise output quality per unit cost.
Multi-agent systems can be structured to cut transaction costs and align incentives between collaborating agents.
Agent ecosystems can apply pricing rules to reduce congestion externalities and allocate tokens more efficiently.
Security defenses can be designed as ways to make adversarial attacks carry direct economic costs for the attacker.
Future systems could incorporate differentiable token budgets that respond to changing economic conditions and dynamic token markets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agent designers could test whether embedding the taxonomy improves measurable performance metrics such as task completion rate per token spent in controlled benchmarks.
The framework suggests that large-scale agent deployments might require new market mechanisms for token allocation, similar to how compute resources are traded today.
Security research could gain from treating attack surfaces as budget items whose costs agents must internalize in their planning loops.
The approach opens a path to agent regulations that price aggregate token consumption as an externality with societal impact.

Load-bearing premise

The fragmented research on LLM agent optimization, architecture, and security fits accurately and without major omissions into the proposed four-dimensional economic taxonomy.

What would settle it

A systematic scan of recent LLM agent papers that identifies a substantial cluster of techniques or problems that cannot be placed in any of the four taxonomy categories would falsify the claim of comprehensive synthesis.

read the original abstract

As LLM agents evolve, tokens have emerged as the core economic primitives of Agentic AI. However, their exponential consumption introduces severe computational, collaborative, and security bottlenecks. Current surveys remain fragmented across system optimization, architecture design, and trust, lacking a unified framework to evaluate the fundamental trade-off between output quality and economic cost. To bridge this gap, this survey presents the first comprehensive survey of Token Economics. By unifying computer science and economics, we conceptualize tokens as production factors, exchange mediums, and units of account. We synthesize existing literature across a four-dimensional taxonomy: (1) Micro-level (Single Agent): Optimizing budget-constrained factor substitution via neoclassical firm theory. (2) Meso-level (Multi-Agent Systems): Minimizing collaboration friction using transaction cost and principal-agent theories. (3) Macro-level (Agent Ecosystems): Addressing congestion externalities and pricing via mechanism design. (4) Security: Internalizing adversarial threats as endogenous economic constraints. Finally, we outline frontier directions, including differentiable token budgets and dynamic markets, to lay the theoretical foundation for scalable next-generation agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey reorganizes token issues in LLM agents into a four-level economic taxonomy but adds little beyond conceptual grouping of prior work.

read the letter

The main thing to know is that the paper is a literature survey that treats tokens as economic primitives and sorts existing research into micro (single-agent optimization via firm theory), meso (multi-agent friction via transaction costs), macro (ecosystem pricing via mechanism design), and security (adversarial threats as constraints). It correctly flags the quality-versus-cost tension that shows up across agent papers and sketches some future angles like dynamic token markets.

Referee Report

2 major / 2 minor

Summary. The manuscript is a survey paper that unifies computer science and economics perspectives on token economics for LLM agents. It conceptualizes tokens as production factors, exchange mediums, and units of account, then synthesizes fragmented literature on system optimization, architecture design, and trust into a four-dimensional taxonomy: (1) micro-level single-agent optimization of budget-constrained factor substitution using neoclassical firm theory; (2) meso-level multi-agent collaboration minimizing friction via transaction cost and principal-agent theories; (3) macro-level agent ecosystems addressing congestion via mechanism design; and (4) security internalizing adversarial threats as endogenous economic constraints. It concludes by outlining future directions such as differentiable token budgets and dynamic markets.

Significance. If the taxonomy provides an accurate, non-overlapping, and comprehensive synthesis of the literature without forcing concepts into categories, the work could offer a useful interdisciplinary lens for analyzing quality-cost trade-offs in scalable LLM agent systems. As a conceptual survey without new derivations, empirical results, or machine-checked proofs, its primary value would lie in organizing existing work to guide future research on efficient and secure agentic AI.

major comments (2)

[Abstract / Taxonomy description] Abstract and taxonomy overview: The claim that the four dimensions are exhaustive and non-overlapping is not yet substantiated; for example, security considerations (dimension 4) appear relevant to micro-level optimization and meso-level collaboration as well, raising the risk of conceptual overlap that could undermine the taxonomy's utility as a clean organizing framework.
[Abstract] The synthesis of literature: The abstract asserts a mapping of CS concerns (optimization, architecture, trust) onto specific economic theories without providing concrete citations or examples in the provided text; this makes it impossible to verify whether high-impact strands (e.g., specific works on LLM inference optimization or multi-agent coordination) fit without misalignment or omission, which is load-bearing for the central claim of a 'comprehensive' unification.

minor comments (2)

[Abstract] The abstract is information-dense; consider using enumerated sub-bullets or a table to present the four taxonomy dimensions for improved readability.
[Future directions] Clarify the scope of 'frontier directions' section by distinguishing speculative ideas (e.g., differentiable token budgets) from those with preliminary supporting references.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments, which help clarify the presentation of our taxonomy and the substantiation of our literature synthesis. We address each point below and propose targeted revisions to strengthen the manuscript's clarity as an interdisciplinary organizing framework.

read point-by-point responses

Referee: [Abstract / Taxonomy description] Abstract and taxonomy overview: The claim that the four dimensions are exhaustive and non-overlapping is not yet substantiated; for example, security considerations (dimension 4) appear relevant to micro-level optimization and meso-level collaboration as well, raising the risk of conceptual overlap that could undermine the taxonomy's utility as a clean organizing framework.

Authors: We acknowledge that security threats can intersect with micro-level optimization (e.g., adversarial inputs affecting single-agent token budgets) and meso-level collaboration (e.g., trust issues in multi-agent interactions). Our taxonomy, however, is organized by primary analytical lens rather than strict exclusivity: the micro dimension applies neoclassical firm theory to budget-constrained factor substitution under standard conditions; the meso dimension uses transaction cost and principal-agent theories to address collaboration frictions; and the security dimension specifically models adversarial behavior as endogenous economic constraints at the system level. This structure draws from established economic practice of separating normal-operation analysis from threat internalization. To address the concern, we will revise the taxonomy overview section to explicitly discuss inter-dimensional interactions, provide boundary examples, and include a clarifying diagram or table. We do not claim mathematical exhaustiveness but rather a comprehensive synthesis of existing literature strands. revision: partial
Referee: [Abstract] The synthesis of literature: The abstract asserts a mapping of CS concerns (optimization, architecture, trust) onto specific economic theories without providing concrete citations or examples in the provided text; this makes it impossible to verify whether high-impact strands (e.g., specific works on LLM inference optimization or multi-agent coordination) fit without misalignment or omission, which is load-bearing for the central claim of a 'comprehensive' unification.

Authors: Abstracts conventionally omit citations to maintain brevity. The full manuscript substantiates the mappings in dedicated sections with concrete citations and examples. The micro-level section maps LLM inference optimization literature (e.g., works on token-efficient serving and budget-constrained decoding) to neoclassical production theory. The meso-level section aligns multi-agent coordination papers with transaction cost economics and principal-agent models. The macro and security dimensions similarly reference mechanism design and adversarial economics literature. These mappings are load-bearing and are detailed with references throughout the body. We will not alter the abstract but will ensure the introduction and taxonomy sections cross-reference the detailed syntheses more explicitly for readers who encounter only the abstract. revision: no

Circularity Check

0 steps flagged

No circularity: conceptual survey with external theoretical grounding

full rationale

The paper is a literature survey that organizes existing work on token usage in LLM agents into a four-dimensional taxonomy by mapping CS concerns onto standard neoclassical economics, transaction-cost economics, mechanism design, and principal-agent theory. No equations, fitted parameters, predictions, or derivations appear in the abstract or described content. The taxonomy is presented as a synthesis framework rather than a result derived from the paper's own inputs or prior self-citations. No self-definitional loops, fitted-input predictions, uniqueness theorems, or ansatz smuggling are present. The central claim rests on external literature and established economic primitives, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 4 axioms · 0 invented entities

The survey rests on standard economic theories applied to the new domain of LLM agents without introducing free parameters, new entities, or ad-hoc inventions.

axioms (4)

domain assumption Neoclassical firm theory applies to optimizing budget-constrained factor substitution for single LLM agents.
Invoked for the micro-level analysis in the taxonomy.
domain assumption Transaction cost theory and principal-agent theory can minimize collaboration friction in multi-agent LLM systems.
Invoked for the meso-level analysis.
domain assumption Mechanism design addresses congestion externalities and pricing in agent ecosystems.
Invoked for the macro-level analysis.
domain assumption Adversarial threats can be treated as endogenous economic constraints for security.
Invoked for the security dimension.

pith-pipeline@v0.9.0 · 5525 in / 1624 out tokens · 48003 ms · 2026-05-12T02:30:26.307791+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We synthesize existing literature across a four-dimensional taxonomy: (1) Micro-level (Single Agent): Optimizing budget-constrained factor substitution via neoclassical firm theory. (2) Meso-level (Multi-Agent Systems): Minimizing collaboration friction using transaction cost and principal-agent theories. (3) Macro-level (Agent Ecosystems): Addressing congestion externalities and pricing via mechanism design.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Y = A · [δK^ρ + (1 − δ)M^ρ]^θ/ρ · L^β · e^ϵ and min TC s.t. Y ≥ Z

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

201 extracted references · 201 canonical work pages · 14 internal anchors

[1]

T oken reduction should go beyond eﬃciency in generative models–from vision, language to multimodality

Zhenglun Kong, Yize Li, Fanhu Zeng, Lei Xin, Shvat Messica, Xue Lin, Pu Zhao, Manolis Kellis, Hao Tang, and Marinka Zitnik. T oken reduction should go beyond eﬃciency in generative models–from vision, language to multimodality. arXiv preprint arXiv:2505.18227, 2025

work page arXiv 2025
[2]

Memorybank: Enhancing large language models with long-term memory

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 2024

work page 2024
[3]

MasRouter: Learning to route LLMs for multi-agent systems

Yanwei Yue, Guibin Zhang, Boyang Liu, Guancheng Wan, Kun Wang, Dawei Cheng, and Yiyan Qi. MasRouter: Learning to route LLMs for multi-agent systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics, 2025

work page 2025
[4]

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

Zhuohang Bian, Feiyang Wu, Chengrui Zhang, Hangcheng Dong, Yun Liang, and Youwei Zhuo. T okendance: Scaling multi-agent llm serving via collective kv cache sharing. arXiv preprint arXiv:2604.03143, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

Subbalakshmi, Jimin Huang, Lingfei Qian, Xueqing Peng, Jordan W

Haohang Li, Yupeng Cao, Yangyang Yu, Shashidhar Reddy Javaji, Zhiyang Deng, Yueru He, Yuechen Jiang, Zining Zhu, K.p. Subbalakshmi, Jimin Huang, Lingfei Qian, Xueqing Peng, Jordan W. Suchow , and Qianqian Xie. INVESTORBENCH: A benchmark for ﬁnancial decision-making tasks with LLM-based agent. In Proceedings of the Annual Meeting of the Association for Com...

work page 2025
[6]

GraphRAG: Leveraging graph-based eﬃciency to minimize hallucinations in LLM-driven RAG for ﬁnance data

Mariam Barry , Gaetan Caillaut, Pierre Halftermeyer, Raheel Qader, Mehdi Mouayad, Fabrice Le Deit, Dimitri Cariolaro, and Joseph Gesnouin. GraphRAG: Leveraging graph-based eﬃciency to minimize hallucinations in LLM-driven RAG for ﬁnance data. In Proceedings of the Workshop on Generative AI and Knowledge Graphs, 2025

work page 2025
[7]

Time travel is cheating: Going live with deepfund for real-time fund investment benchmarking

Changlun Li, Yao Shi, Chen Wang, Qiqi Duan, Runke Ruan, Weijie Huang, Haonan Long, Lijun Huang, Nan Tang, and Yuyu Luo. Time travel is cheating: Going live with deepfund for real-time fund investment benchmarking. In Advances in Neural Information Processing Systems Datasets and Benchmarks T rack, 2025

work page 2025
[8]

NitiBench: Benchmark- ing LLM frameworks on Thai legal question answering capabilities

Pawitsapak Akarajaradwong, Pirat Pothavorn, Chompakorn Chaksangchaichot, Panuthep Ta- sawong, Thitiwat Nopparatbundit, Keerakiat Pratai, and Sarana Nutanong. NitiBench: Benchmark- ing LLM frameworks on Thai legal question answering capabilities. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2025

work page 2025
[9]

Legalagentbench: Evaluating llm agents in legal domain

Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, Wuyue Wang, Yiqun Liu, and Minlie Huang. Legalagentbench: Evaluating llm agents in legal domain. In Proceedings of the Annual Meeting of the Association for Computational Lin- guistics, 2025

work page 2025
[10]

Lexrag: Benchmarking retrieval-augmented generation in multi-turn legal consultation conversation

Haitao Li, Yifan Chen, Yiran Hu, Qingyao Ai, Junjie Chen, Xiaoyu Yang, Jianhui Yang, Yueyue Wu, Zeyang Liu, and Yiqun Liu. Lexrag: Benchmarking retrieval-augmented generation in multi-turn legal consultation conversation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval , 2025. 47 T oken Economi...

work page 2025
[11]

De novo design of gpcr exoframe modulators

Shizhuo Cheng, Jia Guo, Yun-li Zhou, Xumei Luo, Gufang Zhang, Ya-zhi Zhang, Yixin Yang, Jiannan Xie, Ping Xu, Dan-dan Shen, Shaokun Zang, Huicui Yang, Xuechu Zhen, Min Zhang, and Yan Zhang. De novo design of gpcr exoframe modulators. Nature, 651:1–9, 2026

work page 2026
[12]

Ask patients with patience: Enabling LLMs for human-centric medical dialogue with grounded reasoning

Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu, Fenglin Liu, and Junde Wu. Ask patients with patience: Enabling LLMs for human-centric medical dialogue with grounded reasoning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2025

work page 2025
[13]

Sijing Li, Zhongwei Qiu, Jiang Liu, Wenqiao Zhang, Tianwei Lin, Yihan Xie, Jianxiang An, Boxiang Yun, Chenglin Yang, Jun Xiao, Guangyu Guo, Jiawen Yao, Wei Liu, Yuan Gao, Ke Yan, Weiwei Cao, Zhilin Zheng, T ony C. W. MOK, Kai Cao, Yu Shi, Jiuyu Zhang, Jian Zhou, Beng Chin Ooi, Yingda Xia, and Ling Zhang. T umorchain: Interleaved multimodal chain-of-though...

work page 2026
[14]

Energy and AI, 2025

International Energy Agency. Energy and AI, 2025

work page 2025
[15]

Transcend- ing cost-quality tradeoﬀ in agent serving via session-awareness

Yanyu Ren, Li Chen, Dan Li, Xizheng Wang, Zhiyuan Wu, Yukai Miao, and Yu Bai. Transcend- ing cost-quality tradeoﬀ in agent serving via session-awareness. In Advances in Neural Information Processing Systems, 2026

work page 2026
[16]

TTD-SQL: Tree-guided token decoding for eﬃcient and schema-aware SQL gen- eration

Chetan Sharma, Ramasuri Narayanam, Soumyabrata Pal, Kalidas Yeturu, Shiv Kumar Saini, and Koyel Mukherjee. TTD-SQL: Tree-guided token decoding for eﬃcient and schema-aware SQL gen- eration. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: Industry T rack, 2025

work page 2025
[17]

Flow: Modu- larized agentic workﬂow automation

Boye Niu, Yiliao Song, Kai Lian, Yifan Shen, Yu Yao, Kun Zhang, and T ongliang Liu. Flow: Modu- larized agentic workﬂow automation. In International Conference on Learning Representations , 2025

work page 2025
[18]

AMAS: Adaptively determining communication topology for LLM-based multi-agent system

Hui Yi Leong, Yuheng Li, Yuqing Wu, Wenwen Ouyang, Wei Zhu, Jiechao Gao, and Wei Han. AMAS: Adaptively determining communication topology for LLM-based multi-agent system. InProceedings of the Conference on Empirical Methods in Natural Language Processing: Industry T rack, 2025

work page 2025
[19]

Cut the crap: An economical communication pipeline for llm-based multi-agent systems

Guibin Zhang, Yanwei Yue, Zhixun Li, Sukwon Yun, Guancheng Wan, Kun Wang, Dawei Cheng, Jeﬀrey Yu, and Tianlong Chen. Cut the crap: An economical communication pipeline for llm-based multi-agent systems. In International Conference on Learning Representations, 2025

work page 2025
[20]

A survey on llm-based multi-agent systems: workﬂow , infrastructure, and challenges

Xinyi Li, Sai Wang, Siqi Zeng, Yu Wu, and Yi Yang. A survey on llm-based multi-agent systems: workﬂow , infrastructure, and challenges. Vicinagearth, 1(1):9, 2024

work page 2024
[21]

The rise and potential of large language model based agents: A survey

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongx- iang Weng, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Qi Zhang, and Tao Gui. ...

work page 2025
[22]

T owards eﬃcient generative large language model serving: A survey from algorithms to systems

Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, and Zhihao Jia. T owards eﬃcient generative large language model serving: A survey from algorithms to systems. ACM Computing Surveys, 58(1):1–37, 2025. 48 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics

work page 2025
[23]

Resource- eﬃcient algorithms and systems of foundation models: A survey

Mengwei Xu, Dongqi Cai, Wangsong Yin, Shangguang Wang, Xin Jin, and Xuanzhe Liu. Resource- eﬃcient algorithms and systems of foundation models: A survey. ACM Computing Surveys, 57(5):1– 39, 2025

work page 2025
[24]

AI agents under threat: A survey of key security challenges and future pathways

Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. AI agents under threat: A survey of key security challenges and future pathways. ACM Computing Surveys, 57(7):1–36, 2025

work page 2025
[25]

The emerged security and privacy of llm agent: A survey with case studies

Feng He, Tianqing Zhu, Dayong Ye, Bo Liu, Wanlei Zhou, and Philip S Yu. The emerged security and privacy of llm agent: A survey with case studies. ACM Computing Surveys, 58(6):1–36, 2025

work page 2025
[26]

T utoring eﬃcacy , household substitution, and student achievement: Experimental evidence from an after-school tutoring program in rural china

Jere R Behrman, C Simon Fan, Naijia Guo, Xiangdong Wei, Hongliang Zhang, and Junsen Zhang. T utoring eﬃcacy , household substitution, and student achievement: Experimental evidence from an after-school tutoring program in rural china. International Economic Review, 65(1):149–189, 2024

work page 2024
[27]

Testing the production approach to markup estimation

Devesh Raval. Testing the production approach to markup estimation. Review of Economic Studies , 90(5):2592–2611, 2023

work page 2023
[28]

T oolformer: Language models can teach them- selves to use tools

Timo Schick, Jane Dwivedi- Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. T oolformer: Language models can teach them- selves to use tools. In Advances in Neural Information Processing Systems, 2023

work page 2023
[29]

The neoclassical theory of ﬁrm investment and taxes: A reassessment

Gabriel Chodorow-Reich. The neoclassical theory of ﬁrm investment and taxes: A reassessment. Technical report, National Bureau of Economic Research, 2025

work page 2025
[30]

Not a typical ﬁrm: Capital–labor substitution and ﬁrms’ labor shares

Joachim Hubmer and Pascual Restrepo. Not a typical ﬁrm: Capital–labor substitution and ﬁrms’ labor shares. American Economic Journal: Macroeconomics , 18(2):34–71, 2026

work page 2026
[31]

Firm performance in digitally integrated supply chains: a combined perspective of transaction cost economics and relational exchange theory

Kiran Patil, Vipul Garg, Janeth Gabaldon, Himali Patil, Suman Niranjan, and Timothy Hawkins. Firm performance in digitally integrated supply chains: a combined perspective of transaction cost economics and relational exchange theory. Journal of Enterprise Information Management , 37(2):381– 413, 2024

work page 2024
[32]

Principal-agent vcg contracts

Ron Lavi and Elisheva S Shamash. Principal-agent vcg contracts. Journal of Economic Theory , 201:105443, 2022

work page 2022
[33]

Stablefees: A predictable fee market for cryptocurrencies

Soumya Basu, David Easley , Maureen OHara, and Emin Gün Sirer. Stablefees: A predictable fee market for cryptocurrencies. Management Science, 69(11):6508–6524, 2023

work page 2023
[34]

A theory of simplicity in games and mechanism design.Econometrica, 91(4):1495–1526, 2023

Marek Pycia and Peter Troyan. A theory of simplicity in games and mechanism design.Econometrica, 91(4):1495–1526, 2023

work page 2023
[35]

Variety-based congestion in online markets: Evidence from mobile apps

Daniel Ershov . Variety-based congestion in online markets: Evidence from mobile apps. American Economic Journal: Microeconomics , 16(2):180–203, 2024

work page 2024
[36]

A neural probabilistic language model

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of machine learning research , 3:1137–1155, 2003

work page 2003
[37]

Eﬃcient estimation of word represen- tations in vector space

T omas Mikolov , Kai Chen, Greg Corrado, and Jeﬀrey Dean. Eﬃcient estimation of word represen- tations in vector space. In International Conference on Learning Representations Workshop, 2013

work page 2013
[38]

Glove: Global vectors for word representation

Jeﬀrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the conference on empirical methods in natural language processing, 2014. 49 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics

work page 2014
[39]

Neural machine translation of rare words with subword units

Rico Sennrich, Barry Haddow , and Alexandra Birch. Neural machine translation of rare words with subword units. In Proceedings of the annual meeting of the association for computational linguistics , 2016

work page 2016
[40]

Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

Taku Kudo and John Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the conference on empirical methods in natural language processing: System demonstrations , 2018

work page 2018
[41]

Neural discrete representation learn- ing

Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learn- ing. In Advances in Neural Information Processing Systems, pages 6306–6315, 2017

work page 2017
[42]

Generating diverse high-ﬁdelity images with vq-vae-2

Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-ﬁdelity images with vq-vae-2. In Advances in Neural Information Processing Systems, 2019

work page 2019
[43]

and Van Durme, B

Jeﬀrey Cheng and Benjamin Van Durme. Compressed chain of thought: Eﬃcient reasoning through dense representations. arXiv preprint arXiv:2412.13171, 2024

work page arXiv 2024
[44]

CODI: compress- ing chain-of-thought into continuous space via self-distillation

Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, and Yulan He. CODI: compress- ing chain-of-thought into continuous space via self-distillation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2025

work page 2025
[45]

R1-compress: Long chain-of-thought compression via chunk compression and search.arXiv preprint arXiv:2505.16838, 2025

Yibo Wang, Haotian Luo, Huanjin Yao, Tiansheng Huang, Haiying He, Rui Liu, Naiqiang Tan, Jiax- ing Huang, Xiaochun Cao, Dacheng Tao, and Li Shen. R1-compress: Long chain-of-thought com- pression via chunk compression and search. arXiv preprint arXiv:2505.16838, 2025

work page arXiv 2025
[46]

T okenskip: Controllable chain-of-thought compression in llms

Heming Xia, Chak T ou Leong, Wenjie Wang, Yongqi Li, and Wenjie Li. T okenskip: Controllable chain-of-thought compression in llms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2025

work page 2025
[47]

Training large language models to reason in a continuous latent space

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason E Weston, and Yuandong Tian. Training large language models to reason in a continuous latent space. In Second Conference on Language Modeling, 2025

work page 2025
[48]

Dynamic early exit in reasoning models

Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Qiaowei Li, Minghui Chen, Zheng Lin, and Weiping Wang. Dynamic early exit in reasoning models. In International Conference on Learning Representations, 2026

work page 2026
[49]

arXiv preprint arXiv:2505.13949 , year=

Guochao Jiang, Guofeng Quan, Zepeng Ding, Ziqin Luo, Dixuan Wang, and Zheng Hu. Flashthink: An early exit method for eﬃcient reasoning. arXiv preprint arXiv:2505.13949, 2025

work page arXiv 2025
[50]

Flashattention: Fast and memory-eﬃcient exact attention with io-awareness

Tri Dao, Daniel Y Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and memory-eﬃcient exact attention with io-awareness. In Advances in Neural Information Processing Systems, 2022

work page 2022
[51]

Big bird: Trans- formers for longer sequences

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey , Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. Big bird: Trans- formers for longer sequences. In Advances in Neural Information Processing Systems, 2020

work page 2020
[52]

Transformers are RNNs: Fast autoregressive transformers with linear attention

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Proceedings of the International Conference on Machine Learning, 2020. 50 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics

work page 2020
[53]

Snapkv: Llm knows what you are looking for before generation

Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, and Deming Chen. Snapkv: Llm knows what you are looking for before generation. In Advances in Neural Information Processing Systems, 2024

work page 2024
[54]

H2o: Heavy-hitter oracle for eﬃcient generative inference of large language models

Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang "Atlas" Wang, and Beidi Chen. H2o: Heavy-hitter oracle for eﬃcient generative inference of large language models. In Advances in Neural Information Processing Systems, 2023

work page 2023
[55]

OPTQ: Accurate quantization for generative pre-trained transformers

Elias Frantar, Saleh Ashkboos, T orsten Hoeﬂer, and Dan Alistarh. OPTQ: Accurate quantization for generative pre-trained transformers. In International Conference on Learning Representations, 2023

work page 2023
[56]

Victor Sanh, Thomas Wolf, and Alexander M. Rush. Movement pruning: Adaptive sparsity by ﬁne-tuning. In Hugo Larochelle, Marc’ Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems, 2020

work page 2020
[57]

Switch transformers: Scaling to trillion parameter models with simple and eﬃcient sparsity

William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and eﬃcient sparsity. Journal of Machine Learning Research , 23(120):1–39, 2022

work page 2022
[58]

ST-MoE: Designing Stable and Transferable Sparse Expert Models

Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeﬀ Dean, Noam Shazeer, and William Fedus. St-moe: Designing stable and transferable sparse expert models. arXiv preprint arXiv:2202.08906, 2022

work page internal anchor Pith review arXiv 2022
[59]

Fast inference from transformers via speculative decoding

Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative decoding. In Proceedings of the International Conference on Machine Learning , 2023

work page 2023
[60]

Draft & verify: Lossless large language model acceleration via self-speculative decoding

Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Gang Chen, and Sharad Mehrotra. Draft & verify: Lossless large language model acceleration via self-speculative decoding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics , 2024

work page 2024
[61]

LLMLingua: Compress- ing prompts for accelerated inference of large language models

Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LLMLingua: Compress- ing prompts for accelerated inference of large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2023

work page 2023
[62]

LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024

work page 2024
[63]

Learning to compress prompts with gist tokens

Jesse Mu, Xiang Li, and Noah Goodman. Learning to compress prompts with gist tokens. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Informa- tion Processing Systems, volume 36, pages 19327–19352. Curran Associates, Inc., 2023

work page 2023
[64]

Adapting language models to compress contexts

Alexis Chevalier, Alexander Wettig, Anirudh Ajith, and Danqi Chen. Adapting language models to compress contexts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[65]

Compressing context to enhance inference eﬃciency of large language models

Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. Compressing context to enhance inference eﬃciency of large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[66]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Memgpt: T owards llms as operating systems. arXiv preprint arXiv:2310.08560, 2024. 51 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics

work page internal anchor Pith review Pith/arXiv arXiv 2024
[67]

Bernstein

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the Annual ACM Symposium on User Interface Software and T echnology, 2023

work page 2023
[68]

Reﬂexion: language agents with verbal reinforcement learning

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reﬂexion: language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems, 2023

work page 2023
[69]

A-mem: Agentic memory for llm agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents. Advances in Neural Information Processing Systems, 2026

work page 2026
[70]

Mem0: Building production-ready AI agents with scalable long-term memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav . Mem0: Building production-ready AI agents with scalable long-term memory. In Proceedings of the European Confer- ence on Artiﬁcial Intelligence , 2025

work page 2025
[71]

Patil, Tianjun Zhang, Xin Wang, and Joseph E

Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive apis. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2024

work page 2024
[72]

T oolLLM: Facilitating large language models to master 16000+ real-world APIs

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. T oolLLM: Facilitating large language models to master 16000+ real-world APIs. In International Conference on Learning Represent...

work page 2024
[73]

Anytool: Self-reﬂective, hierarchical agents for large- scale API calls

Yu Du, Fangyun Wei, and Hongyang Zhang. Anytool: Self-reﬂective, hierarchical agents for large- scale API calls. In Proceedings of the International Conference on Machine Learning , 2024

work page 2024
[74]

T oolRL: Reward is all tool learning needs

Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-T ur, Gokhan T ur, and Heng Ji. T oolRL: Reward is all tool learning needs. In Advances in Neural Information Processing Systems, 2025

work page 2025
[75]

Model context protocol (mcp): Land- scape, security threats, and future research directions

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (mcp): Land- scape, security threats, and future research directions. ACM T ransactions on Software Engineering and Methodology, 2026

work page 2026
[76]

Self-rag: Learning to retrieve, generate, and critique through self-reﬂection

Akari Asai, Zeqiu Wu, Yizhong Wang, Avi Sil, and Hannaneh Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reﬂection. In International Conference on Learning Repre- sentations, 2024

work page 2024
[77]

Corrective Retrieval Augmented Generation

Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. Corrective retrieval augmented generation. arXiv preprint arXiv:2401.15884, 2024

work page internal anchor Pith review arXiv 2024
[78]

Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity

Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies, 2024

work page 2024
[79]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley , Alex Chao, Apurva Mody , Steven Truitt, and Jonathan Larson. From local to global: A graph RAG approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024. 52 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics

work page internal anchor Pith review Pith/arXiv arXiv 2024
[80]

Raptor: Recursive abstractive processing for tree-organized retrieval

Parth Sarthi, Salman Abdullah, Aditi T uli, Shubh Khanna, Anna Goldie, and Christopher Manning. Raptor: Recursive abstractive processing for tree-organized retrieval. In International Conference on Learning Representations, 2024

work page 2024

Showing first 80 references.