Improve Large Language Model Systems with User Logs
Pith reviewed 2026-05-21 14:30 UTC · model grok-4.3
The pith
UNO turns noisy user logs into rules and preferences that let LLM systems adaptively improve responses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UNO distills unstructured user logs into semi-structured rules and preference pairs, applies query-and-feedback-driven clustering to handle data heterogeneity, quantifies the cognitive gap between the model's prior knowledge and the log content, and uses that assessment to filter noisy feedback while constructing distinct modules for primary and reflective experiences extracted from the logs, thereby improving future LLM system responses.
What carries the argument
The UNO framework, which distills logs into rules and preferences, clusters them by query and feedback, and quantifies the cognitive gap to adaptively filter noise and build experience modules.
If this is right
- LLM systems using UNO achieve higher effectiveness and efficiency than both retrieval-augmented generation and memory-based methods on the tested tasks.
- Cognitive-gap measurement allows the system to discard portions of user feedback judged too far from the model's existing knowledge.
- Primary and reflective experience modules can be constructed separately from the same log stream to handle different types of user signals.
- The off-policy optimization problem between log collection and model updates is addressed through the distillation and clustering pipeline.
Where Pith is reading between the lines
- Production LLM services could shift from periodic retraining to continuous, log-driven updates that require far less new human annotation.
- The same distillation-plus-gap approach might transfer to non-LLM agents that maintain long interaction histories with users.
- If the clustering step proves robust, similar log-processing pipelines could be applied to other noisy human-generated data streams such as customer support transcripts.
Load-bearing premise
User logs contain extractable, authentic human feedback signals that can be reliably distilled into rules and preference pairs without introducing new biases or noise that the clustering and cognitive-gap steps cannot handle.
What would settle it
Run the same set of user logs through UNO and through a standard RAG baseline while deliberately adding increasing levels of random or contradictory feedback entries, then measure whether UNO's accuracy and efficiency gains disappear once noise exceeds a measurable threshold.
Figures
read the original abstract
Scaling training data and model parameters has long driven progress in large language models (LLMs), but this paradigm is increasingly constrained by the scarcity of high-quality data and diminishing returns from rising computational costs. As a result, recent work is increasing the focus on continual learning from real-world deployment, where user interaction logs provide a rich source of authentic human feedback and procedural knowledge. However, learning from user logs is challenging due to their unstructured and noisy nature. Vanilla LLM systems often struggle to distinguish useful feedback signals from noisy user behavior, and the disparity between user log collection and model optimization (e.g., the off-policy optimization problem) further strengthens the problem. To this end, we propose UNO (User log-driveN Optimization), a unified framework for improving LLM systems (LLMsys) with user logs. UNO first distills logs into semi-structured rules and preference pairs, then employs query-and-feedback-driven clustering to manage data heterogeneity, and finally quantifies the cognitive gap between the model's prior knowledge and the log data. This assessment guides the LLMsys to adaptively filter out noisy feedback and construct different modules for primary and reflective experiences extracted from user logs, thereby improving future responses. Extensive experiments show that UNO achieves state-of-the-art effectiveness and efficiency, significantly outperforming Retrieval Augmented Generation (RAG) and memory-based baselines. We have open-sourced our code at https://github.com/bebr2/UNO .
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes UNO (User log-driveN Optimization), a unified framework for improving LLM systems via user interaction logs. UNO distills unstructured logs into semi-structured rules and preference pairs, applies query-and-feedback-driven clustering to address data heterogeneity, and quantifies the cognitive gap between the model's prior knowledge and the log data. This gap assessment is used to adaptively filter noisy feedback and construct separate modules for primary and reflective experiences, with the goal of improving future responses. The authors report that extensive experiments demonstrate SOTA effectiveness and efficiency, significantly outperforming RAG and memory-based baselines, and have open-sourced the code.
Significance. If the central claims hold after addressing validation concerns, the work could meaningfully advance continual learning from real-world deployment logs by turning noisy user interactions into usable signals for LLM optimization. The open-sourced code supports reproducibility, which strengthens the contribution in an empirical field.
major comments (2)
- [§3.3] §3.3 (Cognitive Gap Quantification): The pipeline relies on the cognitive-gap metric to guide adaptive filtering and module construction, yet the manuscript provides no independent validation (e.g., human annotation, held-out oracle, or correlation with downstream usefulness) that this metric extracts genuine signal rather than model self-consistency. If the gap is derived from the target LLM's own embeddings or outputs, the filtering step risks circularity, which would undermine the reported gains over RAG and memory baselines.
- [§4] §4 (Experiments): The SOTA claim is load-bearing for the paper's contribution, but the experimental section does not include ablation studies isolating the contribution of the cognitive-gap step versus the distillation and clustering stages alone. Without these controls, it is unclear whether the outperformance is attributable to the proposed framework or to other implementation details.
minor comments (2)
- [Abstract] Abstract: The claim of 'significantly outperforming' RAG and memory baselines would be strengthened by reporting concrete metrics (e.g., accuracy deltas, latency reductions) rather than qualitative language.
- [§2] §2 (Related Work): The discussion of off-policy optimization challenges could more explicitly contrast UNO with prior log-based continual learning methods to clarify novelty.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The comments highlight important aspects of validation and experimental rigor that we address below. We have revised the manuscript to incorporate additional analyses and clarifications.
read point-by-point responses
-
Referee: [§3.3] §3.3 (Cognitive Gap Quantification): The pipeline relies on the cognitive-gap metric to guide adaptive filtering and module construction, yet the manuscript provides no independent validation (e.g., human annotation, held-out oracle, or correlation with downstream usefulness) that this metric extracts genuine signal rather than model self-consistency. If the gap is derived from the target LLM's own embeddings or outputs, the filtering step risks circularity, which would undermine the reported gains over RAG and memory baselines.
Authors: We acknowledge the validity of this concern about potential circularity and the absence of explicit independent validation in the original submission. The cognitive gap is computed as the divergence between the base model's prior knowledge representations and the knowledge encoded in the distilled log data, using embedding similarity from the target LLM. To strengthen this, the revised manuscript adds a new subsection in §3.3 reporting Pearson correlations between cognitive-gap scores and downstream task improvements on held-out queries. We also include a small-scale human annotation study (n=200 samples) where annotators rate the usefulness of filtered vs. unfiltered logs, showing statistically significant alignment with the metric. To further reduce any self-consistency risk, we now compute the gap using a separate, frozen embedding model distinct from the target LLM. These additions demonstrate that the metric captures genuine signal beyond model-internal consistency. revision: yes
-
Referee: [§4] §4 (Experiments): The SOTA claim is load-bearing for the paper's contribution, but the experimental section does not include ablation studies isolating the contribution of the cognitive-gap step versus the distillation and clustering stages alone. Without these controls, it is unclear whether the outperformance is attributable to the proposed framework or to other implementation details.
Authors: We agree that isolating the contribution of the cognitive-gap quantification is necessary to substantiate the framework's gains. The revised §4 now includes dedicated ablation experiments: (1) UNO without cognitive-gap filtering (replaced by fixed-threshold or random filtering), (2) distillation + clustering only, and (3) full UNO. Results on the primary benchmarks show that removing the cognitive-gap step reduces performance by 4-7% relative to full UNO while still outperforming RAG and memory baselines, confirming its additive value. We also report efficiency metrics for each ablation to address the efficiency claims. These controls clarify that the reported SOTA results stem from the integrated framework rather than isolated implementation choices. revision: yes
Circularity Check
Empirical pipeline with no load-bearing circular derivation or self-referential reduction
full rationale
The paper describes UNO as a three-stage empirical processing pipeline (distillation of logs into rules/preference pairs, query-feedback clustering, and cognitive-gap quantification to guide adaptive filtering and module construction). No equations, fitted parameters, or derivations are presented that reduce the claimed SOTA gains to a self-referential definition or construction. The central claims rest on experimental comparisons to RAG and memory baselines rather than a mathematical chain that collapses to its inputs. Any potential self-bias in the gap metric would require explicit confirmation from the full text that the quantification uses only the target LLM's own outputs in a closed loop; absent such a quoted reduction, the framework remains self-contained as an applied method.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
UNO first distills logs into semi-structured rules and preference pairs, then employs query-and-feedback-driven clustering to manage data heterogeneity, and finally quantifies the cognitive gap between the model’s prior knowledge and the log data.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We quantify the cognitive gap through three steps: Rule Prediction... Gap Quantification: ... g_i = Dist(R_LLM_i, R_i)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Skill Retrieval Augmentation for Agentic AI
Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.
Reference graph
Works this paper leans on
-
[1]
Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, et al . 2024. Phi-4 Technical Report. arXiv:2412.08905 [cs.CL] https://arxiv.org/abs/2412.08905
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[2]
Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, and Yiqun Liu. 2025. MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems. arXiv:2510.17281 [cs.LG] https://arxiv.org/abs/2510. 17281
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Huan ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, et al . 2025. A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence. arXiv:2507.21046 [cs.AI] https: //arxiv.org/abs/2507.21046
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav
-
[5]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory.arXiv preprint arXiv:2504.19413(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
2010.Search engines: Information retrieval in practice
W Bruce Croft, Donald Metzler, Trevor Strohman, et al. 2010.Search engines: Information retrieval in practice. Vol. 520. Addison-Wesley Reading
work page 2010
-
[7]
John Dewey. 2012.Experience and nature. Courier Corporation
work page 2012
-
[8]
Qian Dong, Qingyao Ai, Hongning Wang, Yiding Liu, Haitao Li, Weihang Su, Yiqun Liu, Tat-Seng Chua, and Shaoping Ma. 2025. Decoupling Knowledge and Context: An Efficient and Effective Retrieval Augmented Generation Framework via Cross Attention. InProceedings of the ACM on Web Conference 2025. 4386– 4395
work page 2025
-
[9]
Yan Fang, Jingtao Zhan, Qingyao Ai, Jiaxin Mao, Weihang Su, Jia Chen, and Yiqun Liu. 2024. Scaling laws for dense retrieval. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1339–1349
work page 2024
-
[10]
Tongtong Feng, Xin Wang, Zekai Zhou, Ren Wang, Yuwei Zhan, Guangyao Li, Qing Li, and Wenwu Zhu. 2025. EvoAgent: Self-evolving Agent with Continual World Model for Long-Horizon Tasks. arXiv:2502.05907 [cs.RO] https://arxiv. org/abs/2502.05907
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [11]
-
[12]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=nZeVKeeFYf9
work page 2022
-
[13]
Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, et al. 2025. Memory in the Age of AI Agents. arXiv:2512.13564 [cs.CL] https://arxiv.org/abs/2512.13564
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(Edmonton, Alberta, Canada)(KDD ’02). Association for Computing Machinery, New York, NY, USA, 133–142. doi:10.1145/775047. 775067
-
[15]
Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory OS of AI Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Lan- guage Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou, China, 25961–25970. doi:10.18653/v1/2025.emnlp...
-
[16]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models. arXiv:2001.08361 [cs.LG] https: //arxiv.org/abs/2001.08361
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[17]
Diane Kelly and Jaime Teevan. 2003. Implicit feedback for inferring user prefer- ence: a bibliography.SIGIR Forum37, 2 (Sept. 2003), 18–28. doi:10.1145/959258. 959260
-
[18]
Gonzalez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Mem- ory Management for Large Language Model Serving with PagedAttention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles
work page 2023
-
[19]
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline Re- inforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv:2005.01643 [cs.LG] https://arxiv.org/abs/2005.01643
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[20]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems33 (2020), 9459–9474
work page 2020
-
[21]
Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, and Huan Liu. 2025. From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge. arXiv:2411.16594 [cs.AI] https://arxiv.org/ abs/2411.16594
-
[22]
Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, and Yiqun Liu. 2024. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods. arXiv:2412.05579 [cs.CL] https://arxiv.org/abs/2412.05579
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Zhiyu Li, Chenyang Xi, Chunyu Li, Ding Chen, Boyu Chen, Shichao Song, Simin Niu, Hanyu Wang, Jiawei Yang, Chen Tang, Qingchen Yu, Jihao Zhao, Yezhaohui Wang, Peng Liu, Zehao Lin, Pengyuan Wang, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, et al. 2025. MemOS: A Memory OS for AI System. arXiv:2507.03724 [cs.CL] https://arxiv.org/abs/2507.03724
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating Very Long-Term Conversational Memory of LLM Agents. InProceedings of the 62nd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Com...
-
[25]
Daniel Müllner. 2011. Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378 [stat.ML] https://arxiv.org/abs/1109.2378 Improve Large Language Model Systems with User Logs Conference, , Arxiv
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[26]
Alexander Novikov, Ngân V ˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. 2025. AlphaEvolve: A coding agent for scientific an...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. InProceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Pierre Isabelle, Eugene Charniak, and Dekang Lin (Eds.). Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 31...
-
[28]
Manning, Stefano Ermon, and Chelsea Finn
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 1...
work page 2023
-
[29]
Stephen Robertson, Hugo Zaragoza, et al . 2009. The probabilistic relevance framework: BM25 and beyond.Foundations and Trends®in Information Retrieval 3, 4 (2009), 333–389
work page 2009
-
[31]
Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Zifeng Wang, Sayna Ebrahimi, and Hao Wang. 2025. Continual Learning of Large Language Models: A Comprehensive Survey.ACM Comput. Surv.(May 2025). doi:10.1145/3735633 Just Accepted
-
[32]
Student. 1908. The probable error of a mean.Biometrika(1908), 1–25
work page 1908
-
[33]
Weihang Su, Qian Dong, Qingyao Ai, and Yiqun Liu. 2025. Dynamic and Para- metric Retrieval Augmented Generation. InProceedings of the 2025 Annual In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. 453–458
work page 2025
-
[34]
Weihang Su, Yichen Tang, Qingyao Ai, Changyue Wang, Zhijing Wu, and Yiqun Liu. 2024. Mitigating entity-level hallucination in large language models. In Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. 23–31
work page 2024
-
[35]
Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, and Yiqun Liu. 2024. DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 12991–13013
work page 2024
-
[36]
Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, and Yiqun Liu. 2025. Parametric retrieval augmented generation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1240–1250
work page 2025
-
[37]
Yiteng Tu, Weihang Su, Yujia Zhou, Yiqun Liu, and Qingyao Ai. 2025. Robust Fine-tuning for Retrieval Augmented Generation against Retrieval Defects. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1272–1282
work page 2025
-
[38]
Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, and Marius Hobbhahn. 2024. Position: will we run out of data? limits of LLM scaling based on human-generated data. InProceedings of the 41st International Confer- ence on Machine Learning(Vienna, Austria)(ICML’24). JMLR.org, Article 2024, 22 pages
work page 2024
-
[39]
Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tris- tan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gal- louédec. 2020. TRL: Transformer Reinforcement Learning. https://github.com/ huggingface/trl
work page 2020
-
[40]
Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, et al. 2025. Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory. arXiv:2511.20857 [cs.CL] https://arxiv.org/abs/2511.20857
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, and Yongfeng Zhang
-
[42]
A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, et al . 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https://arxiv.org/abs/2505.09388
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, et al. 2024. Qwen2.5 Technical Report.arXiv preprint arXiv:2412.15115(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [45]
-
[46]
Yunpeng Zhai, Shuchang Tao, Cheng Chen, Anni Zou, Ziqian Chen, Qingxu Fu, Shinji Mai, Li Yu, Jiaji Deng, Zouying Cao, Zhaoyang Liu, Bolin Ding, and Jingren Zhou. 2025. AgentEvolver: Towards Efficient Self-Evolving Agent System. arXiv:2511.10395 [cs.LG] https://arxiv.org/abs/2511.10395
-
[47]
Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, and Muning Wen
-
[48]
MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory
MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory. arXiv:2601.03192 [cs.CL] https://arxiv.org/abs/2601.03192
work page internal anchor Pith review Pith/arXiv arXiv
-
[49]
Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep Learning Based Recommender System: A Survey and New Perspectives.ACM Comput. Surv.52, 1, Article 5 (Feb. 2019), 38 pages. doi:10.1145/3285029
-
[50]
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou
-
[51]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models. arXiv:2506.05176 [cs.CL] https://arxiv.org/abs/2506.05176
work page internal anchor Pith review Pith/arXiv arXiv
-
[52]
Junhao Zheng, Shengjie Qiu, Chengming Shi, and Qianli Ma. 2025. Towards Lifelong Learning of Large Language Models: A Survey.ACM Comput. Surv.57, 8, Article 193 (March 2025), 35 pages. doi:10.1145/3716629
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.