Latent Action Reparameterization for Efficient Agent Inference
Pith reviewed 2026-05-20 10:41 UTC · model grok-4.3
The pith
Reparameterizing LLM agent actions into learned latent units shortens the effective decision horizon while preserving expressiveness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose Latent Action Reparameterization (LAR), a framework that learns a compact latent action space in which each latent action corresponds to a multi-step semantic behavior. By reparameterizing agent actions into latent units, LAR enables decision making over a shorter effective horizon while preserving the expressiveness of the original action space. Unlike hand-crafted macros or hierarchical controllers, latent actions are learned from agent trajectories and integrated directly into the model, allowing both planning and execution to operate over abstract action representations.
What carries the argument
Latent Action Reparameterization (LAR), which learns compact latent actions from trajectories to encode multi-step semantic behaviors and reparameterizes the original action space for higher-level decision making.
If this is right
- The effective action horizon shortens because each latent action covers multiple original steps.
- Inference runs with fewer action tokens and lower wall-clock time under fixed compute budgets.
- Task success rates remain stable or rise across LLM-based agent benchmarks.
- Planning and execution both shift to operate over the abstract latent representations.
- Action representation learning emerges as a distinct factor that complements model and hardware improvements for scaling agent inference.
Where Pith is reading between the lines
- The same trajectory-derived latent actions might transfer to related but unseen tasks, reducing the need for fresh learning on each new problem.
- Combining the reparameterization with existing prompt or system optimizations could produce additive reductions in inference cost.
- If latent units prove stable, future agents could discover and refine them continuously from ongoing interactions rather than from fixed training trajectories.
Load-bearing premise
Latent actions learned from observed trajectories will stay sufficiently expressive and generalize across new tasks without hand-crafted macros or external hierarchical controllers.
What would settle it
Measure whether task success rates fall significantly on held-out benchmarks when agents operate exclusively over the learned latent actions rather than the original low-level action sequences.
Figures
read the original abstract
Large language model (LLM) agents often rely on long sequences of low-level textual actions, resulting in large effective decision horizons and high inference cost. While prior work has focused on improving inference efficiency through system-level optimizations or prompt engineering, we argue that a key bottleneck lies in the representation of the action space itself. We propose Latent Action Reparameterization (LAR), a framework that learns a compact latent action space in which each latent action corresponds to a multi-step semantic behavior. By reparameterizing agent actions into latent units, LAR enables decision making over a shorter effective horizon while preserving the expressiveness of the original action space. Unlike hand-crafted macros or hierarchical controllers, latent actions are learned from agent trajectories and integrated directly into the model, allowing both planning and execution to operate over abstract action representations. Across a range of LLM-based agent benchmarks, LAR significantly reduces the effective action horizon and improves inference efficiency under fixed compute budgets. As a consequence, our approach achieves substantial reductions in action tokens and corresponding wall-clock inference time, while maintaining or improving task success rates. These results suggest that action representation learning is a critical and underexplored factor in scaling efficient LLM agent inference, complementary to advances in model architecture and hardware.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Latent Action Reparameterization (LAR) for LLM-based agents. It learns a compact latent action space from observed trajectories such that each latent unit encodes a multi-step semantic behavior. Reparameterizing the original action space into these latents allows planning and execution over a shorter effective horizon while aiming to retain the expressiveness of the full action space. The method is integrated directly into the model without hand-crafted macros or external hierarchies, yielding fewer action tokens, lower wall-clock inference time, and maintained or improved success rates on agent benchmarks.
Significance. If the central claims hold, the work is significant for identifying action representation learning as a complementary lever to system-level optimizations and prompt engineering in scaling LLM agent inference. Learning latents directly from trajectories avoids reliance on external controllers and demonstrates concrete efficiency gains under fixed compute. This framing of the action space as a learnable bottleneck is a useful perspective for the field.
major comments (2)
- [§3.2] §3.2 (latent learning objective): the claim that latent actions preserve full expressiveness of the original space is load-bearing for the shorter-horizon benefit, yet the objective appears to optimize only reconstruction on training trajectories; without an explicit coverage or diversity term, it is possible that fine-grained distinctions required for novel tasks are collapsed.
- [Experiments] Experiments (benchmark results): success rates are reported as maintained or improved, but no ablation isolates whether gains arise from true semantic abstraction versus trajectory-specific correlations; if the latter, the method would not deliver the claimed generalization across new tasks without retraining.
minor comments (2)
- [Abstract] Abstract: the phrase 'substantial reductions' would be clearer if accompanied by concrete factors or percentages for token count and wall-clock time.
- [§3.1] Notation: the mapping from latent to low-level actions during execution should be stated explicitly to clarify any added decoding overhead.
Simulated Author's Rebuttal
We thank the referee for the constructive and insightful comments on our manuscript. We address each major comment point by point below, providing clarifications and indicating revisions where the feedback identifies areas for improvement.
read point-by-point responses
-
Referee: [§3.2] §3.2 (latent learning objective): the claim that latent actions preserve full expressiveness of the original space is load-bearing for the shorter-horizon benefit, yet the objective appears to optimize only reconstruction on training trajectories; without an explicit coverage or diversity term, it is possible that fine-grained distinctions required for novel tasks are collapsed.
Authors: We agree that the reconstruction objective in §3.2 does not include an explicit diversity or coverage regularizer, which leaves open the theoretical possibility of collapsed distinctions on out-of-distribution tasks. Our defense rests on the fact that the training trajectories are drawn from a broad distribution of successful agent executions across multiple environments and task types, which in practice encourages the latent space to encode semantically distinct behaviors rather than collapsing them. The decoder is trained to reconstruct the original action sequences, preserving the ability to express any observed behavior. That said, we acknowledge this is an assumption rather than a formally proven guarantee. In the revised manuscript we will expand §3.2 with a dedicated paragraph discussing this limitation, add a quantitative analysis of latent-space coverage (e.g., entropy of the latent distribution and nearest-neighbor distances between distinct original actions), and clarify that novel tasks are handled by composing multiple latent actions. We will also note the absence of an explicit diversity term as a direction for future work. revision: partial
-
Referee: Experiments (benchmark results): success rates are reported as maintained or improved, but no ablation isolates whether gains arise from true semantic abstraction versus trajectory-specific correlations; if the latter, the method would not deliver the claimed generalization across new tasks without retraining.
Authors: This is a fair critique of our experimental design. While the reported benchmarks already include tasks held out from the latent-learning phase, we did not include an explicit ablation that contrasts semantic abstraction against mere trajectory-specific correlations. To address the concern directly, the revised version will add a new ablation subsection. We will (1) retrain LAR on a deliberately restricted trajectory set that lacks diversity and evaluate generalization on disjoint task families, and (2) compare against a non-semantic baseline that performs simple action clustering without the learned latent objective. The results of these controls will be reported transparently; if they support the semantic-abstraction interpretation we will highlight them, and if they reveal limitations we will discuss them as such. revision: yes
Circularity Check
No circularity: framework proposal with empirical claims, no derivation chain or self-referential reductions present
full rationale
The provided abstract and description outline a proposed framework (LAR) that learns compact latent actions from trajectories to enable shorter-horizon planning while preserving expressiveness. No equations, derivations, or mathematical steps are shown. Claims rest on empirical outcomes (reduced tokens, maintained success rates across benchmarks) rather than any closed-form prediction or first-principles result that reduces to its inputs by construction. No self-citations, fitted parameters renamed as predictions, or uniqueness theorems are referenced in the text. The learning-from-trajectories step is presented as an input to the method, not as a tautological output. This is a standard non-circular empirical proposal.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
latent actions learned from agent trajectories... approximated through the next-token entropy of a candidate segment H(s)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
reparameterizing agent actions into latent units... shorter effective horizon while preserving expressiveness
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Mostafa Al-Emran. Hierarchical reinforcement learning: a survey.International journal of computing and digital systems, 4(02), 2015
work page 2015
-
[2]
Christopher Amato, George Konidaris, Leslie P Kaelbling, and Jonathan P How. Modeling and planning with macro-actions in decentralized pomdps.Journal of Artificial Intelligence Research, 64:817–859, 2019
work page 2019
-
[3]
Program Synthesis with Large Language Models
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. Program synthesis with large language models.arXiv preprint arXiv:2108.07732, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[4]
The option-critic architecture
Pierre-Luc Bacon, Jean Harb, and Doina Precup. The option-critic architecture. InProceedings of the AAAI conference on artificial intelligence, volume 31, 2017
work page 2017
-
[5]
Yuxuan Cai, Xiaozhuan Liang, Xinghua Wang, Jin Ma, Haijin Liang, Jinwen Luo, Xinyu Zuo, Lisheng Duan, Yuyang Yin, and Xi Chen. Fastmtp: Accelerating llm inference with enhanced multi-token prediction.arXiv preprint arXiv:2509.18362, 2025
-
[6]
Dingyang Chen, Qi Zhang, and Yinglun Zhu. Efficient sequential decision making with large language models.arXiv preprint arXiv:2406.12125, 2024
-
[7]
Hao Mark Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I Venieris, and Hongxiang Fan. Hardware-aware parallel prompt decoding for memory-efficient acceleration of llm inference.arXiv preprint arXiv:2405.18628, 2024
-
[8]
Evaluating Large Language Models Trained on Code
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[9]
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, and Wanxiang Che. Towards reasoning era: A survey of long chain-of-thought for reasoning large language models.arXiv preprint arXiv:2503.09567, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
A comprehensive survey of prompt engineering techniques in large language models.TechRxiv, 2025
Tonmoy Debnath, Md Nurul Absar Siddiky, Muhammad Enayetur Rahman, Prosenjit Das, Antu Kumar Guha, Muhammad Rezaur Rahman, and HM Kabir. A comprehensive survey of prompt engineering techniques in large language models.TechRxiv, 2025
work page 2025
-
[11]
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web.Advances in Neural Information Processing Systems, 36:28091–28114, 2023
work page 2023
-
[12]
The llama 3 herd of models.arXiv e-prints, pages arXiv–2407, 2024
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv e-prints, pages arXiv–2407, 2024
work page 2024
-
[13]
In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong. Prompt cache: Modular attention reuse for low-latency inference.Proceedings of Machine Learning and Systems, 6:325–338, 2024. 10
work page 2024
-
[14]
Luca Gioacchini, Giuseppe Siracusano, Davide Sanvito, Kiril Gashteovski, David Friede, Roberto Bifulco, and Carolin Lawrence. Agentquest: A modular benchmark framework to measure progress and improve llm agents.arXiv preprint arXiv:2404.06411, 2024
-
[15]
Robotouille: An asynchronous planning benchmark for llm agents.arXiv preprint arXiv:2502.05227, 2025
Gonzalo Gonzalez-Pumariega, Leong Su Yean, Neha Sunkara, and Sanjiban Choudhury. Robotouille: An asynchronous planning benchmark for llm agents.arXiv preprint arXiv:2502.05227, 2025
-
[16]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension.arXiv preprint arXiv:1705.03551, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[18]
A., Wutschitz, L., Chen, Y ., Sim, R., and Rajmohan, S
Minki Kang, Wei-Ning Chen, Dongge Han, Huseyin A Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, and Saravan Rajmohan. Acon: Optimizing context compression for long-horizon llm agents.arXiv preprint arXiv:2510.00615, 2025
-
[19]
Jeonghye Kim, Sojeong Rhee, Minbeom Kim, Dohyung Kim, Sangmook Lee, Youngchul Sung, and Kyomin Jung. Reflact: World-grounded decision making in llm agents via goal-state reflection.arXiv preprint arXiv:2505.15182, 2025
-
[20]
Critic-guided decoding for controlled text generation
Minbeom Kim, Hwanhee Lee, Kang Min Yoo, Joonsuk Park, Hwaran Lee, and Kyomin Jung. Critic-guided decoding for controlled text generation. InFindings of the Association for Computational Linguistics: ACL 2023, pages 4598–4612, 2023
work page 2023
-
[21]
Fast inference from transformers via speculative decoding
Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative decoding. InInternational Conference on Machine Learning, pages 19274–19286. PMLR, 2023
work page 2023
-
[22]
Zekun Li, Baolin Peng, Pengcheng He, Michel Galley, Jianfeng Gao, and Xifeng Yan. Guiding large language models via directional stimulus prompting.Advances in Neural Information Processing Systems, 36:62630–62656, 2023
work page 2023
-
[23]
Jiayu Liu, Cheng Qian, Zhaochen Su, Qing Zong, Shijue Huang, Bingxiang He, and Yi R Fung. Costbench: Evaluating multi-turn cost-optimal planning and adaptation in dynamic environments for llm tool-use agents.arXiv preprint arXiv:2511.02734, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
AgentBench: Evaluating LLMs as Agents
Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. Agentbench: Evaluating llms as agents.arXiv preprint arXiv:2308.03688, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[25]
Memgpt: Towards llms as operating systems
Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonza- lez. Memgpt: Towards llms as operating systems. 2023
work page 2023
-
[26]
Generative agents: Interactive simulacra of human behavior
Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1–22, 2023
work page 2023
-
[27]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems, 36: 68539–68551, 2023
work page 2023
-
[28]
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework.arXiv preprint arXiv: 2409.19256, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[29]
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36:8634–8652, 2023. 11
work page 2023
-
[30]
Kumar Shridhar, Alessandro Stolfo, and Mrinmaya Sachan. Distilling reasoning capabilities into smaller language models.Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073, 2023
work page 2023
-
[31]
Siao Tang, Xinyin Ma, Gongfan Fang, and Xinchao Wang. Concisehint: Boosting efficient reasoning via continuous concise hints during generation.arXiv preprint arXiv:2506.18810, 2025
-
[32]
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Musique: Multihop questions via single-hop question composition.Transactions of the Association for Computational Linguistics, 10:539–554, 2022
work page 2022
-
[33]
Efficient large language models: A survey.arXiv preprint arXiv:2312.03863, 1, 2023
Zhongwei Wan, Xin Wang, Che Liu, Samiul Alam, Yu Zheng, Jiachen Liu, Zhongnan Qu, Shen Yan, Yi Zhu, Quanlu Zhang, et al. Efficient large language models: A survey.arXiv preprint arXiv:2312.03863, 2023
-
[34]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024
work page 2024
-
[35]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models.arXiv preprint arXiv:2203.11171, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[36]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022
work page 2022
-
[37]
Tokenskip: Controllable chain-of-thought compression in llms.arXiv preprint arXiv:2502.12067,
Heming Xia, Chak Tou Leong, Wenjie Wang, Yongqi Li, and Wenjie Li. Tokenskip: Controllable chain-of-thought compression in llms.arXiv preprint arXiv:2502.12067, 2025
-
[38]
Zhangchen Xu, Yang Liu, Yueqin Yin, Mingyuan Zhou, and Radha Poovendran. Kodcode: A di- verse, challenging, and verifiable synthetic dataset for coding.arXiv preprint arXiv:2503.02951, 2025
-
[39]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
Cheng Yang, Jiaxuan Lu, Haiyuan Wan, Junchi Yu, and Feiwei Qin. From what to why: A multi-agent system for evidence-based chemical reaction condition reasoning.arXiv preprint arXiv:2509.23768, 2025
-
[41]
Aria: Training language agents with intention-driven reward aggregation
Ruihan Yang, Yikai Zhang, Aili Chen, Xintao Wang, Siyu Yuan, Jiangjie Chen, Deqing Yang, and Yanghua Xiao. Aria: Training language agents with intention-driven reward aggregation. arXiv preprint arXiv:2506.00539, 2025
-
[42]
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022
work page 2022
-
[43]
Dual latent memory for visual multi-agent system.arXiv preprint arXiv:2602.00471, 2026
Xinlei Yu, Chengming Xu, Zhangquan Chen, Bo Yin, Cheng Yang, Yongbo He, Yihao Hu, Jiangning Zhang, Cheng Tan, Xiaobin Hu, et al. Dual latent memory for visual multi-agent system.arXiv preprint arXiv:2602.00471, 2026
-
[44]
Enhancing decision-making for llm agents via step-level q-value models
Yuanzhao Zhai, Tingkai Yang, Kele Xu, Dawei Feng, Cheng Yang, Bo Ding, and Huaimin Wang. Enhancing decision-making for llm agents via step-level q-value models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27161–27169, 2025
work page 2025
-
[45]
AFlow: Automating Agentic Workflow Generation
Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, et al. Aflow: Automating agentic workflow generation.arXiv preprint arXiv:2410.10762, 2024. 12
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[46]
Jiayi Zhang, Yiran Peng, Fanqi Kong, Yang Cheng, Yifan Wu, Zhaoyang Yu, Jinyu Xiang, Jian- hao Ruan, Jinlin Wang, Maojia Song, et al. Autoenv: Automated environments for measuring cross-environment agent learning.arXiv preprint arXiv:2511.19304, 2025
-
[47]
Jiayi Zhang, Yongfeng Gu, Jianhao Ruan, Maojia Song, Yiran Peng, Zhiguang Han, Jinyu Xiang, Zhitao Wang, Caiyin Yang, Yixi Ouyang, Bang Liu, Chenglin Wu, and Yuyu Luo. Harnessing agentic evolution.arXiv preprint arXiv:2605.13821, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[48]
A survey on the memory mechanism of large language model-based agents
Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model-based agents. ACM Transactions on Information Systems, 43(6):1–47, 2025
work page 2025
-
[49]
Ruijie Zheng, Ching-An Cheng, Hal Daumé III, Furong Huang, and Andrey Kolobov. Prise: Llm-style sequence compression for learning temporal action abstractions in control.arXiv preprint arXiv:2402.10450, 2024
-
[50]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et al. Least-to-most prompting enables complex reasoning in large language models.arXiv preprint arXiv:2205.10625, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[51]
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, et al. A survey on efficient inference for large language models, 2024.URL https://arxiv. org/abs/2404.14294, 2024. A Detailed Experimental Setup and Design Rationale A.1 Agent Models and Training Protocol We conduct experiments usingMet...
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.