Recognition: 2 theorem links
· Lean TheoremToken Economics for LLM Agents: A Dual-View Study from Computing and Economics
Pith reviewed 2026-05-12 02:30 UTC · model grok-4.3
The pith
Tokens act as production factors, exchange mediums, and units of account in LLM agents, enabling a four-level economic taxonomy to organize optimization of quality against computational cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that tokens function as production factors, exchange mediums, and units of account, and that the scattered literature on LLM agent systems can be synthesized into a four-dimensional taxonomy. Micro-level analysis applies neoclassical firm theory to let single agents substitute token uses under fixed budgets. Meso-level analysis draws on transaction cost and principal-agent theories to lower friction in multi-agent collaboration. Macro-level analysis employs mechanism design to resolve congestion externalities and set prices across agent ecosystems. Security analysis recasts adversarial threats as endogenous economic constraints that must be internalized. The resulting view,
What carries the argument
The four-dimensional taxonomy that reframes tokens as economic primitives and maps literature to micro (single-agent), meso (multi-agent), macro (ecosystem), and security levels using firm theory, transaction costs, principal-agent models, and mechanism design.
If this is right
- Single agents can substitute among token-consuming actions under budget limits to raise output quality per unit cost.
- Multi-agent systems can be structured to cut transaction costs and align incentives between collaborating agents.
- Agent ecosystems can apply pricing rules to reduce congestion externalities and allocate tokens more efficiently.
- Security defenses can be designed as ways to make adversarial attacks carry direct economic costs for the attacker.
- Future systems could incorporate differentiable token budgets that respond to changing economic conditions and dynamic token markets.
Where Pith is reading between the lines
- Agent designers could test whether embedding the taxonomy improves measurable performance metrics such as task completion rate per token spent in controlled benchmarks.
- The framework suggests that large-scale agent deployments might require new market mechanisms for token allocation, similar to how compute resources are traded today.
- Security research could gain from treating attack surfaces as budget items whose costs agents must internalize in their planning loops.
- The approach opens a path to agent regulations that price aggregate token consumption as an externality with societal impact.
Load-bearing premise
The fragmented research on LLM agent optimization, architecture, and security fits accurately and without major omissions into the proposed four-dimensional economic taxonomy.
What would settle it
A systematic scan of recent LLM agent papers that identifies a substantial cluster of techniques or problems that cannot be placed in any of the four taxonomy categories would falsify the claim of comprehensive synthesis.
read the original abstract
As LLM agents evolve, tokens have emerged as the core economic primitives of Agentic AI. However, their exponential consumption introduces severe computational, collaborative, and security bottlenecks. Current surveys remain fragmented across system optimization, architecture design, and trust, lacking a unified framework to evaluate the fundamental trade-off between output quality and economic cost. To bridge this gap, this survey presents the first comprehensive survey of Token Economics. By unifying computer science and economics, we conceptualize tokens as production factors, exchange mediums, and units of account. We synthesize existing literature across a four-dimensional taxonomy: (1) Micro-level (Single Agent): Optimizing budget-constrained factor substitution via neoclassical firm theory. (2) Meso-level (Multi-Agent Systems): Minimizing collaboration friction using transaction cost and principal-agent theories. (3) Macro-level (Agent Ecosystems): Addressing congestion externalities and pricing via mechanism design. (4) Security: Internalizing adversarial threats as endogenous economic constraints. Finally, we outline frontier directions, including differentiable token budgets and dynamic markets, to lay the theoretical foundation for scalable next-generation agent systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey paper that unifies computer science and economics perspectives on token economics for LLM agents. It conceptualizes tokens as production factors, exchange mediums, and units of account, then synthesizes fragmented literature on system optimization, architecture design, and trust into a four-dimensional taxonomy: (1) micro-level single-agent optimization of budget-constrained factor substitution using neoclassical firm theory; (2) meso-level multi-agent collaboration minimizing friction via transaction cost and principal-agent theories; (3) macro-level agent ecosystems addressing congestion via mechanism design; and (4) security internalizing adversarial threats as endogenous economic constraints. It concludes by outlining future directions such as differentiable token budgets and dynamic markets.
Significance. If the taxonomy provides an accurate, non-overlapping, and comprehensive synthesis of the literature without forcing concepts into categories, the work could offer a useful interdisciplinary lens for analyzing quality-cost trade-offs in scalable LLM agent systems. As a conceptual survey without new derivations, empirical results, or machine-checked proofs, its primary value would lie in organizing existing work to guide future research on efficient and secure agentic AI.
major comments (2)
- [Abstract / Taxonomy description] Abstract and taxonomy overview: The claim that the four dimensions are exhaustive and non-overlapping is not yet substantiated; for example, security considerations (dimension 4) appear relevant to micro-level optimization and meso-level collaboration as well, raising the risk of conceptual overlap that could undermine the taxonomy's utility as a clean organizing framework.
- [Abstract] The synthesis of literature: The abstract asserts a mapping of CS concerns (optimization, architecture, trust) onto specific economic theories without providing concrete citations or examples in the provided text; this makes it impossible to verify whether high-impact strands (e.g., specific works on LLM inference optimization or multi-agent coordination) fit without misalignment or omission, which is load-bearing for the central claim of a 'comprehensive' unification.
minor comments (2)
- [Abstract] The abstract is information-dense; consider using enumerated sub-bullets or a table to present the four taxonomy dimensions for improved readability.
- [Future directions] Clarify the scope of 'frontier directions' section by distinguishing speculative ideas (e.g., differentiable token budgets) from those with preliminary supporting references.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments, which help clarify the presentation of our taxonomy and the substantiation of our literature synthesis. We address each point below and propose targeted revisions to strengthen the manuscript's clarity as an interdisciplinary organizing framework.
read point-by-point responses
-
Referee: [Abstract / Taxonomy description] Abstract and taxonomy overview: The claim that the four dimensions are exhaustive and non-overlapping is not yet substantiated; for example, security considerations (dimension 4) appear relevant to micro-level optimization and meso-level collaboration as well, raising the risk of conceptual overlap that could undermine the taxonomy's utility as a clean organizing framework.
Authors: We acknowledge that security threats can intersect with micro-level optimization (e.g., adversarial inputs affecting single-agent token budgets) and meso-level collaboration (e.g., trust issues in multi-agent interactions). Our taxonomy, however, is organized by primary analytical lens rather than strict exclusivity: the micro dimension applies neoclassical firm theory to budget-constrained factor substitution under standard conditions; the meso dimension uses transaction cost and principal-agent theories to address collaboration frictions; and the security dimension specifically models adversarial behavior as endogenous economic constraints at the system level. This structure draws from established economic practice of separating normal-operation analysis from threat internalization. To address the concern, we will revise the taxonomy overview section to explicitly discuss inter-dimensional interactions, provide boundary examples, and include a clarifying diagram or table. We do not claim mathematical exhaustiveness but rather a comprehensive synthesis of existing literature strands. revision: partial
-
Referee: [Abstract] The synthesis of literature: The abstract asserts a mapping of CS concerns (optimization, architecture, trust) onto specific economic theories without providing concrete citations or examples in the provided text; this makes it impossible to verify whether high-impact strands (e.g., specific works on LLM inference optimization or multi-agent coordination) fit without misalignment or omission, which is load-bearing for the central claim of a 'comprehensive' unification.
Authors: Abstracts conventionally omit citations to maintain brevity. The full manuscript substantiates the mappings in dedicated sections with concrete citations and examples. The micro-level section maps LLM inference optimization literature (e.g., works on token-efficient serving and budget-constrained decoding) to neoclassical production theory. The meso-level section aligns multi-agent coordination papers with transaction cost economics and principal-agent models. The macro and security dimensions similarly reference mechanism design and adversarial economics literature. These mappings are load-bearing and are detailed with references throughout the body. We will not alter the abstract but will ensure the introduction and taxonomy sections cross-reference the detailed syntheses more explicitly for readers who encounter only the abstract. revision: no
Circularity Check
No circularity: conceptual survey with external theoretical grounding
full rationale
The paper is a literature survey that organizes existing work on token usage in LLM agents into a four-dimensional taxonomy by mapping CS concerns onto standard neoclassical economics, transaction-cost economics, mechanism design, and principal-agent theory. No equations, fitted parameters, predictions, or derivations appear in the abstract or described content. The taxonomy is presented as a synthesis framework rather than a result derived from the paper's own inputs or prior self-citations. No self-definitional loops, fitted-input predictions, uniqueness theorems, or ansatz smuggling are present. The central claim rests on external literature and established economic primitives, making the work self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (4)
- domain assumption Neoclassical firm theory applies to optimizing budget-constrained factor substitution for single LLM agents.
- domain assumption Transaction cost theory and principal-agent theory can minimize collaboration friction in multi-agent LLM systems.
- domain assumption Mechanism design addresses congestion externalities and pricing in agent ecosystems.
- domain assumption Adversarial threats can be treated as endogenous economic constraints for security.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We synthesize existing literature across a four-dimensional taxonomy: (1) Micro-level (Single Agent): Optimizing budget-constrained factor substitution via neoclassical firm theory. (2) Meso-level (Multi-Agent Systems): Minimizing collaboration friction using transaction cost and principal-agent theories. (3) Macro-level (Agent Ecosystems): Addressing congestion externalities and pricing via mechanism design.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Y = A · [δK^ρ + (1 − δ)M^ρ]^θ/ρ · L^β · e^ϵ and min TC s.t. Y ≥ Z
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Zhenglun Kong, Yize Li, Fanhu Zeng, Lei Xin, Shvat Messica, Xue Lin, Pu Zhao, Manolis Kellis, Hao Tang, and Marinka Zitnik. T oken reduction should go beyond efficiency in generative models–from vision, language to multimodality. arXiv preprint arXiv:2505.18227, 2025
-
[2]
Memorybank: Enhancing large language models with long-term memory
Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024
work page 2024
-
[3]
MasRouter: Learning to route LLMs for multi-agent systems
Yanwei Yue, Guibin Zhang, Boyang Liu, Guancheng Wan, Kun Wang, Dawei Cheng, and Yiyan Qi. MasRouter: Learning to route LLMs for multi-agent systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics, 2025
work page 2025
-
[4]
TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing
Zhuohang Bian, Feiyang Wu, Chengrui Zhang, Hangcheng Dong, Yun Liang, and Youwei Zhuo. T okendance: Scaling multi-agent llm serving via collective kv cache sharing. arXiv preprint arXiv:2604.03143, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[5]
Subbalakshmi, Jimin Huang, Lingfei Qian, Xueqing Peng, Jordan W
Haohang Li, Yupeng Cao, Yangyang Yu, Shashidhar Reddy Javaji, Zhiyang Deng, Yueru He, Yuechen Jiang, Zining Zhu, K.p. Subbalakshmi, Jimin Huang, Lingfei Qian, Xueqing Peng, Jordan W. Suchow , and Qianqian Xie. INVESTORBENCH: A benchmark for financial decision-making tasks with LLM-based agent. In Proceedings of the Annual Meeting of the Association for Com...
work page 2025
-
[6]
Mariam Barry , Gaetan Caillaut, Pierre Halftermeyer, Raheel Qader, Mehdi Mouayad, Fabrice Le Deit, Dimitri Cariolaro, and Joseph Gesnouin. GraphRAG: Leveraging graph-based efficiency to minimize hallucinations in LLM-driven RAG for finance data. In Proceedings of the Workshop on Generative AI and Knowledge Graphs, 2025
work page 2025
-
[7]
Time travel is cheating: Going live with deepfund for real-time fund investment benchmarking
Changlun Li, Yao Shi, Chen Wang, Qiqi Duan, Runke Ruan, Weijie Huang, Haonan Long, Lijun Huang, Nan Tang, and Yuyu Luo. Time travel is cheating: Going live with deepfund for real-time fund investment benchmarking. In Advances in Neural Information Processing Systems Datasets and Benchmarks T rack, 2025
work page 2025
-
[8]
NitiBench: Benchmark- ing LLM frameworks on Thai legal question answering capabilities
Pawitsapak Akarajaradwong, Pirat Pothavorn, Chompakorn Chaksangchaichot, Panuthep Ta- sawong, Thitiwat Nopparatbundit, Keerakiat Pratai, and Sarana Nutanong. NitiBench: Benchmark- ing LLM frameworks on Thai legal question answering capabilities. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2025
work page 2025
-
[9]
Legalagentbench: Evaluating llm agents in legal domain
Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, Wuyue Wang, Yiqun Liu, and Minlie Huang. Legalagentbench: Evaluating llm agents in legal domain. In Proceedings of the Annual Meeting of the Association for Computational Lin- guistics, 2025
work page 2025
-
[10]
Lexrag: Benchmarking retrieval-augmented generation in multi-turn legal consultation conversation
Haitao Li, Yifan Chen, Yiran Hu, Qingyao Ai, Junjie Chen, Xiaoyu Yang, Jianhui Yang, Yueyue Wu, Zeyang Liu, and Yiqun Liu. Lexrag: Benchmarking retrieval-augmented generation in multi-turn legal consultation conversation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval , 2025. 47 T oken Economi...
work page 2025
-
[11]
De novo design of gpcr exoframe modulators
Shizhuo Cheng, Jia Guo, Yun-li Zhou, Xumei Luo, Gufang Zhang, Ya-zhi Zhang, Yixin Yang, Jiannan Xie, Ping Xu, Dan-dan Shen, Shaokun Zang, Huicui Yang, Xuechu Zhen, Min Zhang, and Yan Zhang. De novo design of gpcr exoframe modulators. Nature, 651:1–9, 2026
work page 2026
-
[12]
Ask patients with patience: Enabling LLMs for human-centric medical dialogue with grounded reasoning
Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu, Fenglin Liu, and Junde Wu. Ask patients with patience: Enabling LLMs for human-centric medical dialogue with grounded reasoning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2025
work page 2025
-
[13]
Sijing Li, Zhongwei Qiu, Jiang Liu, Wenqiao Zhang, Tianwei Lin, Yihan Xie, Jianxiang An, Boxiang Yun, Chenglin Yang, Jun Xiao, Guangyu Guo, Jiawen Yao, Wei Liu, Yuan Gao, Ke Yan, Weiwei Cao, Zhilin Zheng, T ony C. W. MOK, Kai Cao, Yu Shi, Jiuyu Zhang, Jian Zhou, Beng Chin Ooi, Yingda Xia, and Ling Zhang. T umorchain: Interleaved multimodal chain-of-though...
work page 2026
- [14]
-
[15]
Transcend- ing cost-quality tradeoff in agent serving via session-awareness
Yanyu Ren, Li Chen, Dan Li, Xizheng Wang, Zhiyuan Wu, Yukai Miao, and Yu Bai. Transcend- ing cost-quality tradeoff in agent serving via session-awareness. In Advances in Neural Information Processing Systems, 2026
work page 2026
-
[16]
TTD-SQL: Tree-guided token decoding for efficient and schema-aware SQL gen- eration
Chetan Sharma, Ramasuri Narayanam, Soumyabrata Pal, Kalidas Yeturu, Shiv Kumar Saini, and Koyel Mukherjee. TTD-SQL: Tree-guided token decoding for efficient and schema-aware SQL gen- eration. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: Industry T rack, 2025
work page 2025
-
[17]
Flow: Modu- larized agentic workflow automation
Boye Niu, Yiliao Song, Kai Lian, Yifan Shen, Yu Yao, Kun Zhang, and T ongliang Liu. Flow: Modu- larized agentic workflow automation. In International Conference on Learning Representations , 2025
work page 2025
-
[18]
AMAS: Adaptively determining communication topology for LLM-based multi-agent system
Hui Yi Leong, Yuheng Li, Yuqing Wu, Wenwen Ouyang, Wei Zhu, Jiechao Gao, and Wei Han. AMAS: Adaptively determining communication topology for LLM-based multi-agent system. InProceedings of the Conference on Empirical Methods in Natural Language Processing: Industry T rack, 2025
work page 2025
-
[19]
Cut the crap: An economical communication pipeline for llm-based multi-agent systems
Guibin Zhang, Yanwei Yue, Zhixun Li, Sukwon Yun, Guancheng Wan, Kun Wang, Dawei Cheng, Jeffrey Yu, and Tianlong Chen. Cut the crap: An economical communication pipeline for llm-based multi-agent systems. In International Conference on Learning Representations, 2025
work page 2025
-
[20]
A survey on llm-based multi-agent systems: workflow , infrastructure, and challenges
Xinyi Li, Sai Wang, Siqi Zeng, Yu Wu, and Yi Yang. A survey on llm-based multi-agent systems: workflow , infrastructure, and challenges. Vicinagearth, 1(1):9, 2024
work page 2024
-
[21]
The rise and potential of large language model based agents: A survey
Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongx- iang Weng, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Qi Zhang, and Tao Gui. ...
work page 2025
-
[22]
T owards efficient generative large language model serving: A survey from algorithms to systems
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, and Zhihao Jia. T owards efficient generative large language model serving: A survey from algorithms to systems. ACM Computing Surveys, 58(1):1–37, 2025. 48 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics
work page 2025
-
[23]
Resource- efficient algorithms and systems of foundation models: A survey
Mengwei Xu, Dongqi Cai, Wangsong Yin, Shangguang Wang, Xin Jin, and Xuanzhe Liu. Resource- efficient algorithms and systems of foundation models: A survey. ACM Computing Surveys, 57(5):1– 39, 2025
work page 2025
-
[24]
AI agents under threat: A survey of key security challenges and future pathways
Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. AI agents under threat: A survey of key security challenges and future pathways. ACM Computing Surveys, 57(7):1–36, 2025
work page 2025
-
[25]
The emerged security and privacy of llm agent: A survey with case studies
Feng He, Tianqing Zhu, Dayong Ye, Bo Liu, Wanlei Zhou, and Philip S Yu. The emerged security and privacy of llm agent: A survey with case studies. ACM Computing Surveys, 58(6):1–36, 2025
work page 2025
-
[26]
Jere R Behrman, C Simon Fan, Naijia Guo, Xiangdong Wei, Hongliang Zhang, and Junsen Zhang. T utoring efficacy , household substitution, and student achievement: Experimental evidence from an after-school tutoring program in rural china. International Economic Review, 65(1):149–189, 2024
work page 2024
-
[27]
Testing the production approach to markup estimation
Devesh Raval. Testing the production approach to markup estimation. Review of Economic Studies , 90(5):2592–2611, 2023
work page 2023
-
[28]
T oolformer: Language models can teach them- selves to use tools
Timo Schick, Jane Dwivedi- Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. T oolformer: Language models can teach them- selves to use tools. In Advances in Neural Information Processing Systems, 2023
work page 2023
-
[29]
The neoclassical theory of firm investment and taxes: A reassessment
Gabriel Chodorow-Reich. The neoclassical theory of firm investment and taxes: A reassessment. Technical report, National Bureau of Economic Research, 2025
work page 2025
-
[30]
Not a typical firm: Capital–labor substitution and firms’ labor shares
Joachim Hubmer and Pascual Restrepo. Not a typical firm: Capital–labor substitution and firms’ labor shares. American Economic Journal: Macroeconomics , 18(2):34–71, 2026
work page 2026
-
[31]
Kiran Patil, Vipul Garg, Janeth Gabaldon, Himali Patil, Suman Niranjan, and Timothy Hawkins. Firm performance in digitally integrated supply chains: a combined perspective of transaction cost economics and relational exchange theory. Journal of Enterprise Information Management , 37(2):381– 413, 2024
work page 2024
-
[32]
Ron Lavi and Elisheva S Shamash. Principal-agent vcg contracts. Journal of Economic Theory , 201:105443, 2022
work page 2022
-
[33]
Stablefees: A predictable fee market for cryptocurrencies
Soumya Basu, David Easley , Maureen OHara, and Emin Gün Sirer. Stablefees: A predictable fee market for cryptocurrencies. Management Science, 69(11):6508–6524, 2023
work page 2023
-
[34]
A theory of simplicity in games and mechanism design.Econometrica, 91(4):1495–1526, 2023
Marek Pycia and Peter Troyan. A theory of simplicity in games and mechanism design.Econometrica, 91(4):1495–1526, 2023
work page 2023
-
[35]
Variety-based congestion in online markets: Evidence from mobile apps
Daniel Ershov . Variety-based congestion in online markets: Evidence from mobile apps. American Economic Journal: Microeconomics , 16(2):180–203, 2024
work page 2024
-
[36]
A neural probabilistic language model
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of machine learning research , 3:1137–1155, 2003
work page 2003
-
[37]
Efficient estimation of word represen- tations in vector space
T omas Mikolov , Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word represen- tations in vector space. In International Conference on Learning Representations Workshop, 2013
work page 2013
-
[38]
Glove: Global vectors for word representation
Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the conference on empirical methods in natural language processing, 2014. 49 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics
work page 2014
-
[39]
Neural machine translation of rare words with subword units
Rico Sennrich, Barry Haddow , and Alexandra Birch. Neural machine translation of rare words with subword units. In Proceedings of the annual meeting of the association for computational linguistics , 2016
work page 2016
-
[40]
Taku Kudo and John Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the conference on empirical methods in natural language processing: System demonstrations , 2018
work page 2018
-
[41]
Neural discrete representation learn- ing
Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learn- ing. In Advances in Neural Information Processing Systems, pages 6306–6315, 2017
work page 2017
-
[42]
Generating diverse high-fidelity images with vq-vae-2
Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with vq-vae-2. In Advances in Neural Information Processing Systems, 2019
work page 2019
-
[43]
Jeffrey Cheng and Benjamin Van Durme. Compressed chain of thought: Efficient reasoning through dense representations. arXiv preprint arXiv:2412.13171, 2024
-
[44]
CODI: compress- ing chain-of-thought into continuous space via self-distillation
Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, and Yulan He. CODI: compress- ing chain-of-thought into continuous space via self-distillation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2025
work page 2025
-
[45]
Yibo Wang, Haotian Luo, Huanjin Yao, Tiansheng Huang, Haiying He, Rui Liu, Naiqiang Tan, Jiax- ing Huang, Xiaochun Cao, Dacheng Tao, and Li Shen. R1-compress: Long chain-of-thought com- pression via chunk compression and search. arXiv preprint arXiv:2505.16838, 2025
-
[46]
T okenskip: Controllable chain-of-thought compression in llms
Heming Xia, Chak T ou Leong, Wenjie Wang, Yongqi Li, and Wenjie Li. T okenskip: Controllable chain-of-thought compression in llms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2025
work page 2025
-
[47]
Training large language models to reason in a continuous latent space
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason E Weston, and Yuandong Tian. Training large language models to reason in a continuous latent space. In Second Conference on Language Modeling, 2025
work page 2025
-
[48]
Dynamic early exit in reasoning models
Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Qiaowei Li, Minghui Chen, Zheng Lin, and Weiping Wang. Dynamic early exit in reasoning models. In International Conference on Learning Representations, 2026
work page 2026
-
[49]
arXiv preprint arXiv:2505.13949 , year=
Guochao Jiang, Guofeng Quan, Zepeng Ding, Ziqin Luo, Dixuan Wang, and Zheng Hu. Flashthink: An early exit method for efficient reasoning. arXiv preprint arXiv:2505.13949, 2025
-
[50]
Flashattention: Fast and memory-efficient exact attention with io-awareness
Tri Dao, Daniel Y Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and memory-efficient exact attention with io-awareness. In Advances in Neural Information Processing Systems, 2022
work page 2022
-
[51]
Big bird: Trans- formers for longer sequences
Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey , Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. Big bird: Trans- formers for longer sequences. In Advances in Neural Information Processing Systems, 2020
work page 2020
-
[52]
Transformers are RNNs: Fast autoregressive transformers with linear attention
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Proceedings of the International Conference on Machine Learning, 2020. 50 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics
work page 2020
-
[53]
Snapkv: Llm knows what you are looking for before generation
Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, and Deming Chen. Snapkv: Llm knows what you are looking for before generation. In Advances in Neural Information Processing Systems, 2024
work page 2024
-
[54]
H2o: Heavy-hitter oracle for efficient generative inference of large language models
Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang "Atlas" Wang, and Beidi Chen. H2o: Heavy-hitter oracle for efficient generative inference of large language models. In Advances in Neural Information Processing Systems, 2023
work page 2023
-
[55]
OPTQ: Accurate quantization for generative pre-trained transformers
Elias Frantar, Saleh Ashkboos, T orsten Hoefler, and Dan Alistarh. OPTQ: Accurate quantization for generative pre-trained transformers. In International Conference on Learning Representations, 2023
work page 2023
-
[56]
Victor Sanh, Thomas Wolf, and Alexander M. Rush. Movement pruning: Adaptive sparsity by fine-tuning. In Hugo Larochelle, Marc’ Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems, 2020
work page 2020
-
[57]
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity
William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research , 23(120):1–39, 2022
work page 2022
-
[58]
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. St-moe: Designing stable and transferable sparse expert models. arXiv preprint arXiv:2202.08906, 2022
work page internal anchor Pith review arXiv 2022
-
[59]
Fast inference from transformers via speculative decoding
Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative decoding. In Proceedings of the International Conference on Machine Learning , 2023
work page 2023
-
[60]
Draft & verify: Lossless large language model acceleration via self-speculative decoding
Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Gang Chen, and Sharad Mehrotra. Draft & verify: Lossless large language model acceleration via self-speculative decoding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics , 2024
work page 2024
-
[61]
LLMLingua: Compress- ing prompts for accelerated inference of large language models
Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LLMLingua: Compress- ing prompts for accelerated inference of large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2023
work page 2023
-
[62]
LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression
Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024
work page 2024
-
[63]
Learning to compress prompts with gist tokens
Jesse Mu, Xiang Li, and Noah Goodman. Learning to compress prompts with gist tokens. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Informa- tion Processing Systems, volume 36, pages 19327–19352. Curran Associates, Inc., 2023
work page 2023
-
[64]
Adapting language models to compress contexts
Alexis Chevalier, Alexander Wettig, Anirudh Ajith, and Danqi Chen. Adapting language models to compress contexts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2023
work page 2023
-
[65]
Compressing context to enhance inference efficiency of large language models
Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. Compressing context to enhance inference efficiency of large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2023
work page 2023
-
[66]
MemGPT: Towards LLMs as Operating Systems
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Memgpt: T owards llms as operating systems. arXiv preprint arXiv:2310.08560, 2024. 51 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [67]
-
[68]
Reflexion: language agents with verbal reinforcement learning
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems, 2023
work page 2023
-
[69]
A-mem: Agentic memory for llm agents
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents. Advances in Neural Information Processing Systems, 2026
work page 2026
-
[70]
Mem0: Building production-ready AI agents with scalable long-term memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav . Mem0: Building production-ready AI agents with scalable long-term memory. In Proceedings of the European Confer- ence on Artificial Intelligence , 2025
work page 2025
-
[71]
Patil, Tianjun Zhang, Xin Wang, and Joseph E
Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive apis. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2024
work page 2024
-
[72]
T oolLLM: Facilitating large language models to master 16000+ real-world APIs
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. T oolLLM: Facilitating large language models to master 16000+ real-world APIs. In International Conference on Learning Represent...
work page 2024
-
[73]
Anytool: Self-reflective, hierarchical agents for large- scale API calls
Yu Du, Fangyun Wei, and Hongyang Zhang. Anytool: Self-reflective, hierarchical agents for large- scale API calls. In Proceedings of the International Conference on Machine Learning , 2024
work page 2024
-
[74]
T oolRL: Reward is all tool learning needs
Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-T ur, Gokhan T ur, and Heng Ji. T oolRL: Reward is all tool learning needs. In Advances in Neural Information Processing Systems, 2025
work page 2025
-
[75]
Model context protocol (mcp): Land- scape, security threats, and future research directions
Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (mcp): Land- scape, security threats, and future research directions. ACM T ransactions on Software Engineering and Methodology, 2026
work page 2026
-
[76]
Self-rag: Learning to retrieve, generate, and critique through self-reflection
Akari Asai, Zeqiu Wu, Yizhong Wang, Avi Sil, and Hannaneh Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection. In International Conference on Learning Repre- sentations, 2024
work page 2024
-
[77]
Corrective Retrieval Augmented Generation
Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. Corrective retrieval augmented generation. arXiv preprint arXiv:2401.15884, 2024
work page internal anchor Pith review arXiv 2024
-
[78]
Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies, 2024
work page 2024
-
[79]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley , Alex Chao, Apurva Mody , Steven Truitt, and Jonathan Larson. From local to global: A graph RAG approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024. 52 T oken Economics for LLM Agents: A Dual-View Study from Computing and Economics
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[80]
Raptor: Recursive abstractive processing for tree-organized retrieval
Parth Sarthi, Salman Abdullah, Aditi T uli, Shubh Khanna, Anna Goldie, and Christopher Manning. Raptor: Recursive abstractive processing for tree-organized retrieval. In International Conference on Learning Representations, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.