Recognition: unknown
TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation
Pith reviewed 2026-05-10 16:45 UTC · model grok-4.3
The pith
TSUBASA improves long-horizon personalization in language models by evolving memory dynamically and using self-learning with context distillation to internalize user experiences.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TSUBASA is a two-pronged method that enhances memory writing through dynamic evolution of stored user information and improves memory reading through self-learning driven by a context distillation objective, allowing the model to internalize extensive user histories without external labels and thereby close the train-inference gap for long-horizon personalization.
What carries the argument
Dynamic memory evolution for updating stored experiences combined with a context distillation objective that drives self-supervised internalization of user history during reading.
If this is right
- TSUBASA outperforms competitive memory-augmented systems such as Mem0 and Memory-R1 on long-horizon personalization benchmarks.
- The approach achieves Pareto improvements by delivering higher fidelity personalization at a reduced token budget.
- Effectiveness holds across the Qwen-3 model family ranging from 4B to 32B parameters.
- Self-learning bridges the train-inference gap, enabling adaptation without additional labeled data.
Where Pith is reading between the lines
- The memory evolution component could be tested in domains requiring ongoing tracking, such as multi-session project assistance or longitudinal health coaching.
- Context distillation might be combined with retrieval-augmented generation to further lower token costs in very long contexts.
- If the self-learning step generalizes, it could reduce the frequency of explicit user feedback needed to maintain personalization over time.
Load-bearing premise
Self-learning via context distillation can reliably internalize evolving user experiences and close the train-inference gap without introducing inconsistencies or requiring labeled data.
What would settle it
On the same long-horizon benchmarks and model sizes, a version of TSUBASA without the context distillation self-learning component would fail to exceed the performance of Mem0 or Memory-R1 or would require equal or greater token budgets for comparable fidelity.
Figures
read the original abstract
Personalized large language models (PLLMs) have garnered significant attention for their ability to align outputs with individual's needs and preferences. However, they still struggle with long-horizon tasks, such as tracking a user's extensive history of conversations or activities. Existing memory mechanisms often fail to capture evolving behaviors, and RAG paradigms are trapped by a quality-efficiency tradeoff. Meanwhile, parametric adaptation is bottlenecked by train-inference gap due to the scarcity of labeled data. To enhance the long-horizon capabilities of PLLMs, we introduce TSUBASA, a two-pronged approach designed to improve memory writing via dynamic memory evolution, and memory reading via self-learning with a context distillation objective to internalize user experiences. Extensive evaluations on long-horizon benchmarks using the Qwen-3 model family (4B to 32B) validate the effectiveness of TSUBASA, surpassing competitive memory-augmented systems that rely primarily on memory writing, such as Mem0 and Memory-R1. Our analyses further confirms that TSUBASA breaks the quality-efficiency barrier to achieve Pareto improvements, delivering robust, high-fidelity personalization with a reduced token budget.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TSUBASA, a two-pronged approach for long-horizon personalization in PLLMs. It improves memory writing via dynamic memory evolution and memory reading via self-learning with a context distillation objective to internalize user experiences without labeled data. Evaluations on long-horizon benchmarks with Qwen-3 models (4B-32B) claim to surpass memory-augmented baselines such as Mem0 and Memory-R1 while achieving Pareto improvements that break the quality-efficiency tradeoff.
Significance. If the results hold under rigorous scrutiny, TSUBASA would represent a meaningful advance in personalized LLMs by closing the train-inference gap and handling evolving user histories more effectively than prior memory-writing systems, with potential for more efficient deployment in long-horizon conversational applications.
major comments (2)
- [Method section on self-learning and context distillation] The central claim that self-learning with context distillation internalizes evolving user experiences and closes the train-inference gap without labeled data or new inconsistencies is load-bearing. The method description provides no explicit mechanism (consistency verification, uncertainty estimation, or iterative validation) to prevent drift or compounding errors across extended sequences—the precise regime where long-horizon gains are asserted.
- [Abstract and Experimental Evaluation] The abstract states that extensive evaluations on long-horizon benchmarks validate effectiveness and superiority over Mem0 and Memory-R1, yet no details appear on benchmark definitions, datasets, exact implementation, baseline configurations, statistical significance, or error bars. This absence prevents assessment of whether the data actually supports the surpassing and Pareto-improvement claims.
minor comments (1)
- [Method] Notation for the two components (memory evolution and context distillation) could be clarified with a diagram or pseudocode to improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, providing clarifications where possible and committing to revisions that strengthen the manuscript without altering its core contributions.
read point-by-point responses
-
Referee: [Method section on self-learning and context distillation] The central claim that self-learning with context distillation internalizes evolving user experiences and closes the train-inference gap without labeled data or new inconsistencies is load-bearing. The method description provides no explicit mechanism (consistency verification, uncertainty estimation, or iterative validation) to prevent drift or compounding errors across extended sequences—the precise regime where long-horizon gains are asserted.
Authors: We agree that an explicit discussion of safeguards would strengthen the presentation. The context distillation objective functions as the primary mechanism by training the model to generate and internalize compressed, high-fidelity representations of user experiences in a self-supervised manner; this process inherently prioritizes consistent patterns over transient noise, as the distillation loss penalizes deviations from the evolving context. Our long-horizon experiments empirically support stability, but we acknowledge the value of additional exposition. We will revise the method section to include a dedicated paragraph describing the iterative nature of the distillation loop and add supporting analysis on sequence-length scaling to demonstrate the absence of measurable drift. revision: yes
-
Referee: [Abstract and Experimental Evaluation] The abstract states that extensive evaluations on long-horizon benchmarks validate effectiveness and superiority over Mem0 and Memory-R1, yet no details appear on benchmark definitions, datasets, exact implementation, baseline configurations, statistical significance, or error bars. This absence prevents assessment of whether the data actually supports the surpassing and Pareto-improvement claims.
Authors: We appreciate this point on reporting completeness. The experimental section of the manuscript defines the long-horizon benchmarks as multi-turn interaction traces drawn from extended user histories, specifies the Qwen-3 model variants, and describes baseline adaptations for Mem0 and Memory-R1. To ensure full transparency and allow independent verification of the Pareto improvements, we will expand the experimental section with a new subsection containing precise benchmark definitions, dataset sources and preprocessing, exact hyperparameter and implementation details for all systems, and updated results tables that include error bars and statistical significance tests. revision: yes
Circularity Check
No circularity: empirical claims rest on external benchmarks, not self-referential definitions or fitted predictions.
full rationale
The paper introduces TSUBASA as a two-component system (dynamic memory evolution for writing; self-learning via context distillation for reading) and validates it through empirical evaluations on long-horizon benchmarks using Qwen-3 models. No equations, derivations, or parameter-fitting steps are described that would reduce predictions to inputs by construction. Central claims of surpassing Mem0 and Memory-R1 and breaking the quality-efficiency barrier are supported by direct comparisons to independent external systems rather than self-citations or renamed patterns. Any references to prior memory work function as background, not load-bearing justifications for uniqueness or correctness. The approach is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, and 1 others. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Chris Alberti, Daniel Andor, Emily Pitler, Jacob Devlin, and Michael Collins. 2019. https://doi.org/10.18653/v1/P19-1620 Synthetic QA corpora generation with roundtrip consistency . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6168--6173, Florence, Italy. Association for Computational Linguistics
-
[3]
Giuseppe Amato and Umberto Straccia. 1999. https://doi.org/10.1007/3-540-48155-9\_13 User profile modeling and applications to digital libraries . In Research and Advanced Technology for Digital Libraries, Third European Conference, ECDL'99, Paris, France, September 22-24, 1999, Proceedings, volume 1696 of Lecture Notes in Computer Science, pages 184--197...
-
[4]
R. C. Atkinson and R. M. Shiffrin. 1968. Human memory: A proposed system and its control processes. In K. W. Spence and J. T. Spence, editors, The Psychology of Learning and Motivation, volume 2, pages 89--195. Academic Press, New York
1968
-
[5]
Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, and Wanxiang Che. 2025. Towards reasoning era: A survey of long chain-of-thought for reasoning large language models. arXiv preprint arXiv:2503.09567
work page internal anchor Pith review arXiv 2025
-
[6]
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. 2025. Mem0: Building production-ready ai agents with scalable long-term memory. arXiv preprint arXiv:2504.19413
work page internal anchor Pith review arXiv 2025
-
[7]
Eunbi Choi, Yongrae Jo, Joel Jang, Joonwon Jang, and Minjoon Seo. 2023. https://doi.org/10.18653/v1/2023.findings-acl.533 Fixed input parameterization for efficient prompting . In Findings of the Association for Computational Linguistics: ACL 2023, pages 8428--8441, Toronto, Canada. Association for Computational Linguistics
-
[8]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, and 1 others. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
Thomas M Cover. 1999. Elements of information theory. John Wiley & Sons
1999
-
[10]
Naihao Deng, Xinliang Zhang, Siyang Liu, Winston Wu, Lu Wang, and Rada Mihalcea. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.832 You are what you annotate: Towards better models through annotator representations . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12475--12498, Singapore. Association for Computationa...
-
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...
-
[12]
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. https://arxiv.org/abs/2401.08281 The faiss library
work page internal anchor Pith review arXiv 2024
-
[13]
Yiming Du, Wenyu Huang, Danna Zheng, Zhaowei Wang, Sebastien Montella, Mirella Lapata, Kam-Fai Wong, and Jeff Z Pan. 2025. Rethinking memory in ai: Taxonomy, operations, topics, and future directions. arXiv e-prints, pages arXiv--2505
2025
- [14]
-
[15]
Aamir Fareed, Saima Hassan, Samir Brahim Belhaouari, and Zahid Halim. 2023. https://doi.org/10.1016/j.mlwa.2023.100495 A collaborative filtering recommendation framework utilizing social networks . Machine Learning with Applications, 14:100495
-
[16]
Gerhard Fischer. 2001. User modeling in human--computer interaction. User modeling and user-adapted interaction, 11(1):65--86
2001
-
[17]
Rui Gao, Bibo Hao, Shuotian Bai, Lin Li, Ang Li, and Tingshao Zhu. 2013. https://doi.org/10.1145/2507157.2507219 Improving user profile with personality traits predicted from social media content . In Seventh ACM Conference on Recommender Systems, RecSys '13, Hong Kong, China, October 12-16, 2013 , pages 355--358. ACM
-
[18]
Liang Gou, Michelle X. Zhou, and Huahai Yang. 2014. https://doi.org/10.1145/2556288.2557398 Knowme and shareme: understanding automatically discovered personality traits from social media and user sharing preferences . In CHI Conference on Human Factors in Computing Systems, CHI'14, Toronto, ON, Canada - April 26 - May 01, 2014 , pages 955--964. ACM
-
[19]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web, pages 173--182
2017
-
[20]
Stephen J Hoch and George F Loewenstein. 1991. Time-inconsistent preferences and consumer self-control. Journal of consumer research, 17(4):492--507
1991
-
[21]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, and 1 others. 2022. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3
2022
-
[22]
Dana Hughes, Akshat Agarwal, Yue Guo, and Katia Sycara. 2020. Inferring non-stationary human preferences for human-agent teams. In 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pages 1178--1185. IEEE
2020
-
[23]
Meng Jiang, Peng Cui, Fei Wang, Wenwu Zhu, and Shiqiang Yang. 2014. https://doi.org/10.1109/TKDE.2014.2300487 Scalable recommendation with social contextual information . IEEE Trans. Knowl. Data Eng. , 26(11):2789--2802
-
[24]
Wang - Cheng Kang, Jianmo Ni, Nikhil Mehta, Maheswaran Sathiamoorthy, Lichan Hong, Ed H. Chi, and Derek Zhiyuan Cheng. 2023. https://doi.org/10.48550/ARXIV.2305.06474 Do llms understand user preferences? evaluating llms on user rating prediction . CoRR, abs/2305.06474
-
[25]
Jieun Kim, Ahreum Lee, and Hokyoung Ryu. 2013. Personality and its effects on learning performance: Design guidelines for an adaptive e-learning system based on a user model. International Journal of Industrial Ergonomics, 43(5):450--461
2013
- [26]
-
[27]
Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. https://doi.org/10.1109/MC.2009.263 Matrix factorization techniques for recommender systems . Computer, 42(8):30--37
-
[28]
Jaehyeok Lee, Keisuke Sakaguchi, and JinYeong Bak. 2025. https://doi.org/10.18653/v1/2025.naacl-long.528 Self-training meets consistency: Improving LLM s' reasoning with consistency-driven rationale evaluation . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Tec...
-
[29]
u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \"u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \"a schel, and 1 others. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459--9474
2020
-
[30]
Minchong Li, Feng Zhou, and Xiaohui Song. 2025. https://aclanthology.org/2025.coling-main.78/ B i LD : Bi-directional logits difference loss for large language model distillation . In Proceedings of the 31st International Conference on Computational Linguistics, pages 1168--1182, Abu Dhabi, UAE. Association for Computational Linguistics
2025
- [31]
-
[32]
Junling Liu, Chao Liu, Renjie Lv, Kang Zhou, and Yan Zhang. 2023. https://doi.org/10.48550/ARXIV.2304.10149 Is chatgpt a good recommender? A preliminary study . CoRR, abs/2304.10149
-
[33]
Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. https://doi.org/10.1162/tacl_a_00638 Lost in the middle: How language models use long contexts . Transactions of the Association for Computational Linguistics, 12:157--173
-
[34]
Jinghao Luo, Yuchen Tian, Chuxue Cao, Ziyang Luo, Hongzhan Lin, Kaixin Li, Chuyi Kong, Ruichao Yang, and Jing Ma. 2026. From storage to experience: A survey on the evolution of llm agent memory mechanisms
2026
-
[35]
Aman Madaan, Niket Tandon, Peter Clark, and Yiming Yang. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.183 Memory-assisted prompt editing to improve GPT -3 after deployment . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2833--2861, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics
-
[36]
Lucie Charlotte Magister, Katherine Metcalf, Yizhe Zhang, and Maartje ter Hoeve. 2024. https://doi.org/10.48550/ARXIV.2411.13405 On the way to LLM personalization: Learning to remember user conversations . CoRR, abs/2411.13405
-
[37]
Lucie Charlotte Magister, Katherine Metcalf, Yizhe Zhang, and Maartje Ter Hoeve. 2025. https://doi.org/10.18653/v1/2025.l2m2-1.5 On the way to LLM personalization: Learning to remember user conversations . In Proceedings of the First Workshop on Large Language Model Memorization (L2M2), pages 61--77, Vienna, Austria. Association for Computational Linguistics
-
[38]
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. https://doi.org/10.18653/v1/2024.acl-long.747 Evaluating very long-term conversational memory of LLM agents . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13851--13870, Bangkok...
-
[39]
Sheshera Mysore, Zhuoran Lu, Mengting Wan, Longqi Yang, Bahareh Sarrafzadeh, Steve Menezes, Tina Baghaee, Emmanuel Barajas Gonzalez, Jennifer Neville, and Tara Safavi. 2024. https://doi.org/10.18653/v1/2024.customnlp4u-1.16 Pearl: Personalizing large language model writing assistants with generation-calibrated retrievers . In Proceedings of the 1st Worksh...
-
[40]
Jiayan Nan, Wenquan Ma, Wenlong Wu, and Yize Chen. 2025. Nemori: Self-organizing agent memory inspired by cognitive science. arXiv preprint arXiv:2508.03341
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology, pages 1--22
2023
-
[42]
Aleksandr V. Petrov and Craig Macdonald. 2023. https://doi.org/10.48550/ARXIV.2306.11114 Generative sequential recommendation with gptrec . CoRR, abs/2306.11114
- [43]
-
[44]
Bhawna Piryani, Abdelrahman Abdullah, Jamshid Mozafari, Avishek Anand, and Adam Jatowt. 2025. It's high time: A survey of temporal information retrieval and question answering. arXiv e-prints, pages arXiv--2505
2025
-
[45]
Erasmo Purificato, Ludovico Boratto, and Ernesto William De Luca. 2024. https://doi.org/10.48550/ARXIV.2402.09660 User modeling and user profiling: A comprehensive survey . CoRR, abs/2402.09660
-
[46]
Ruiyang Qin, Jun Xia, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Peipei Zhou, Jingtong Hu, and Yiyu Shi. 2024. Enabling on-device large language model personalization with self-supervised data selection and synthesis. In Proceedings of the 61st ACM/IEEE design automation conference, pages 1--6
2024
-
[47]
Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yimeng Bai, Wenjie Wang, Hong Cheng, Fuli Feng, and Tat-Seng Chua. 2025. https://doi.org/10.18653/v1/2025.findings-acl.1095 Measuring what makes you unique: Difference-aware user modeling for enhancing LLM personalization . In Findings of the Association for Computational Linguistics: ACL 2025, pages 21258--21277, Vien...
-
[48]
Zhaopeng Qiu, Xian Wu, Jingyue Gao, and Wei Fan. 2021. https://doi.org/10.1609/AAAI.V35I5.16557 U-BERT: pre-training user representations for improved recommendation . In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Ed...
-
[49]
Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. 2025. Zep: a temporal knowledge graph architecture for agent memory. arXiv preprint arXiv:2501.13956
work page internal anchor Pith review arXiv 2025
-
[50]
Christopher Richardson, Yao Zhang, Kellen Gillespie, Sudipta Kar, Arshdeep Singh, Zeynab Raeesy, Omar Zia Khan, and Abhinav Sethy. 2023. https://doi.org/10.48550/ARXIV.2310.20081 Integrating summarization and retrieval for enhanced personalization via large language models . CoRR, abs/2310.20081
-
[51]
Evan F Risko and Sam J Gilbert. 2016. Cognitive offloading. Trends in cognitive sciences, 20(9):676--688
2016
-
[52]
The probabilistic relevance framework: Bm25 and beyond
Stephen E. Robertson and Hugo Zaragoza. 2009. https://doi.org/10.1561/1500000019 The probabilistic relevance framework: BM25 and beyond . Found. Trends Inf. Retr., 3(4):333--389
-
[53]
Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, and Yassine Benajiba. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1683 M em I nsight: Autonomous memory augmentation for LLM agents . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33136--33152, Suzhou, China. Association f...
-
[54]
Alireza Salemi, Sheshera Mysore, Michael Bendersky, and Hamed Zamani. 2024. https://doi.org/10.18653/v1/2024.acl-long.399 L a MP : When large language models meet personalization . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7370--7392, Bangkok, Thailand. Association for Computa...
-
[55]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[56]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, and 1 others. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[57]
Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, and 1 others. 2025. Openai gpt-5 system card. arXiv preprint arXiv:2601.03267
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [58]
-
[59]
John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive science, 12(2):257--285
1988
-
[60]
Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong. 2025. https://doi.org/10.18653/v1/2025.findings-acl.989 M em B ench: Towards more comprehensive evaluation on the memory of LLM -based agents . In Findings of the Association for Computational Linguistics: ACL 2025, pages 19336--19352, Vienna, Austria. Association for Computational Li...
-
[61]
Qingyu Tan, Hwee Tou Ng, and Lidong Bing. 2023. https://doi.org/10.18653/v1/2023.acl-long.828 Towards benchmarking and improving the temporal reasoning capability of large language models . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14820--14835, Toronto, Canada. Association fo...
-
[62]
Zhaoxuan Tan, Zheyuan Liu, and Meng Jiang. 2024 a . https://doi.org/10.18653/v1/2024.emnlp-main.371 Personalized pieces: Efficient personalized large language models through collaborative efforts . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 6459--6475, Miami, Florida, USA. Association for Computational...
-
[63]
Zhaoxuan Tan, Qingkai Zeng, Yijun Tian, Zheyuan Liu, Bing Yin, and Meng Jiang. 2024 b . https://doi.org/10.18653/v1/2024.emnlp-main.372 Democratizing large language models via personalized parameter-efficient fine-tuning . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 6476--6491, Miami, Florida, USA. Asso...
-
[64]
Xiangru Tang, Anni Zou, Zhuosheng Zhang, Ziming Li, Yilun Zhao, Xingyao Zhang, Arman Cohan, and Mark Gerstein. 2024. https://doi.org/10.18653/v1/2024.findings-acl.33 M ed A gents: Large language models as collaborators for zero-shot medical reasoning . In Findings of the Association for Computational Linguistics: ACL 2024, pages 599--621, Bangkok, Thailan...
-
[65]
Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, and Yun-Nung Chen. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.969 Two tales of persona in LLM s: A survey of role-playing and personalization . In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 16612--16631, Miami, Florida, USA. Ass...
-
[66]
Endel Tulving. 1985. How many memory systems are there? American psychologist, 40(4):385
1985
-
[67]
Endel Tulving and 1 others. 1972. Episodic and semantic memory. Organization of memory, 1(381-403):1
1972
-
[68]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html Attention is all you need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Proces...
2017
-
[69]
Haoming Wang, Boyuan Yang, Xiangyu Yin, and Wei Gao. 2025 a . Never start from scratch: Expediting on-device llm personalization via explainable model selection. In Proceedings of the 23rd Annual International Conference on Mobile Systems, Applications and Services, pages 154--168
2025
-
[70]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, and 1 others. 2024. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):186345
2024
- [71]
-
[72]
Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H Chi, and 1 others. 2025. Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory. arXiv preprint arXiv:2511.20857
work page internal anchor Pith review arXiv 2025
-
[73]
Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai - Wei Chang, and Dong Yu. 2025 a . https://openreview.net/forum?id=pZiyCaVuti Longmemeval: Benchmarking chat assistants on long-term interactive memory . In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025 . OpenReview.net
2025
- [74]
-
[75]
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. 2025. A-mem: Agentic memory for llm agents. arXiv preprint arXiv:2502.12110
work page internal anchor Pith review arXiv 2025
-
[76]
Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Jinhe Bi, Kristian Kersting, Jeff Z. Pan, Hinrich Schuetze, Volker Tresp, and Yunpu Ma. 2025. Memory-R1 : Enhancing large language model agents to manage and utilize memories via reinforcement learning. arXiv preprint arXiv:2508.19828
work page internal anchor Pith review arXiv 2025
-
[77]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, and 1 others. 2025. Qwen3 technical report. arXiv preprint arXiv:2505.09388
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[78]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, and 22 others. 2024. https://doi.org/10.48550/ARXIV.2412.15115 Qwen2.5 technical report . CoRR, abs/2412.15115
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2024
-
[79]
Kai Zhang, Yangyang Kang, Fubang Zhao, and Xiaozhong Liu. 2024 a . https://doi.org/10.18653/v1/2024.naacl-long.132 LLM -based medical assistant personalization with short- and long-term memory coordination . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vo...
-
[80]
Kai Zhang, Lizhi Qing, Yangyang Kang, and Xiaozhong Liu. 2024 b . https://doi.org/10.48550/ARXIV.2404.03565 Personalized LLM response generation with parameterized memory injection . CoRR, abs/2404.03565
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.