arxiv: 2605.00702 · v1 · submitted 2026-05-01 · 💻 cs.CL

Recognition: unknown

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

Derong Xu , Shuochen Liu , Pengfei Luo , Pengyue Jia , Yingyi Zhang , Yi Wen , Yimin Deng , Wenlin Zhang

show 3 more authors

Enhong Chen Xiangyu Zhao Tong Xu

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:07 UTC · model grok-4.3

classification 💻 cs.CL

keywords memory evolutionLLM agentsreinforcement learningpersonalizationtextual gradientsprocess rewardscognition-inspired

0 comments

The pith

Cognition-inspired two-stage optimization learns memory guidelines then policies for evolving LLM personalization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the challenge of maintaining consistent, evolving user preferences in LLM agents across long interactions where context windows are limited. Existing static rules lack adaptability while direct RL suffers from weak sparse rewards that destabilize learning. MemCoE addresses this by first inducing a global memory guideline from contrastive feedback treated as textual gradients, then applying that guideline to generate structured process rewards for multi-turn RL training of a memory-update policy. A sympathetic reader would care because reliable long-term memory could make personalization more stable and less dependent on hand-engineering or short-horizon signals.

Core claim

We introduce MemCoE, a cognition-inspired two-stage optimization framework that learns how memory should be organized and what information to update. In the first stage, Memory Guideline Induction optimizes a global guideline via contrastive feedback interpreted as textual gradients; in the second stage, Guideline-Aligned Memory Policy Optimization uses the induced guideline to define structured process rewards and performs multi-turn RL to learn a guideline-following memory evolution policy. We evaluate on three personalization memory benchmarks, covering explicit/implicit preference and different sizes and noise, and observe consistent improvements over strong baselines with favorable ro

What carries the argument

MemCoE's two-stage framework: Memory Guideline Induction, which derives a global memory organization rule from contrastive textual feedback, followed by Guideline-Aligned Memory Policy Optimization, which converts the guideline into process rewards to guide RL-based learning of memory update actions.

Load-bearing premise

That contrastive feedback interpreted as textual gradients can reliably induce an optimal global memory guideline, and that this guideline then supplies sufficiently informative process rewards to stabilize multi-turn RL for the memory evolution policy.

What would settle it

An ablation experiment in which the guideline induction stage is removed or replaced with random guidelines, resulting in no performance gain or greater instability during RL training on the same benchmarks, would falsify the claim that the two-stage separation is necessary for the observed improvements.

Figures

Figures reproduced from arXiv: 2605.00702 by Derong Xu, Enhong Chen, Pengfei Luo, Pengyue Jia, Shuochen Liu, Tong Xu, Wenlin Zhang, Xiangyu Zhao, Yimin Deng, Yingyi Zhang, Yi Wen.

**Figure 1.** Figure 1: Top: With a limited context window, the agent fails to capture preferences. Bottom: Inspired by Prefrontal → Hippocampus, we decouple agent memory into Memory Guideline (for organizing) → Agent Memory (for updating). user’s evolving preferences and habits (Jiang et al., 2025a,b). However, the context window prevents LLMs from retaining and exploiting the full history of dialogue (Yen et al., 2024), and sim… view at source ↗

**Figure 2.** Figure 2: Overview of our proposed MemCoE. It performs two-stage optimization for evolving user memory: (1) Memory Guideline Induction (MGI) iteratively refines a natural-language guideline; (2) Guideline-Aligned Memory Policy Optimization (GMPO) fixes the induced guideline to define guideline-aligned rewards and applies multi-turn GRPO to learn what information to update in evolving memory bank. 4.2 Guideline-Align… view at source ↗

**Figure 3.** Figure 3: Efficiency analysis on PersonaMem. We re view at source ↗

**Figure 4.** Figure 4: Retrieval Top-K on PersonaMem (32K). sults show that, across all four LLMs, both MGI variants consistently outperform the baselines (RAG and A-Mem), indicating that the learned guideline captures model-agnostic memory-update principles rather than overfitting to a specific LLM. Notably, optimizing with gpt-4o-mini generalizes strongly and achieves the best numbers on three backbones, including GPT-5 and g… view at source ↗

**Figure 6.** Figure 6: Impact of prompt quality on PersonaMem, averaged over three random runs with different seeds; error bars indicate standard deviation. 6 Conclusion Inspired by memory schema theory that highlights prefrontal regions and hippocampus regions, we present MemCoE, a two-stage optimization framework that decouples how to organize memory from what to store. Specifically, MemCoE first induces a transferable, schem… view at source ↗

**Figure 7.** Figure 7: Preference retention during multi-round memory evolution on PrefEval (Explicit). We insert a user view at source ↗

**Figure 8.** Figure 8: Error analysis decomposing failures into the view at source ↗

**Figure 9.** Figure 9: Scaling analysis on PersonaMem (128K). We increase the total dialogue tokens (each evolution round view at source ↗

**Figure 10.** Figure 10: Hyperparameter analysis of optimization steps. The relative gains ( view at source ↗

**Figure 11.** Figure 11: Test set Accuracy of RL-based baselines across RL training steps. imental setting: 300 sampled PersonaMem training examples, batch size 4, and evaluation on PersonaMem-32K, varying only the number of RL update steps up to 200. As shown in view at source ↗

**Figure 12.** Figure 12: PersonaMem case study (MCQ). Our method matches the ground-truth option (d), while all baselines select different options. CASE STUDY (PersonaMem-2): Suggesting New Ideas for Music Expression (MCQ) User question: How can I find a more fulfilling way to express my love for music? Question type: suggest_new_ideas Topic: musicRecommendation Options: (a) You might consider exploring different avenues like wri… view at source ↗

**Figure 13.** Figure 13: PersonaMem case study (MCQ). Our method selects the correct option (a), whereas each baseline chooses a different option view at source ↗

**Figure 14.** Figure 14: PersonaBench case study (open-form QA). Our method outputs the correct age (39); baselines answer with missing/unknown information. CASE STUDY (PersonaBench-2): Work Location (Open-form QA) User question: What is the address of my work location? Question type: Basic information Correct answer: University Predictions: Ours: University; MemAgent: Unknown; Mem-α: Not specified; A-Mem: Not specified; LightMem… view at source ↗

**Figure 15.** Figure 15: PersonaBench case study (open-form QA). Our method recovers the correct work location (University), while baselines report missing or unknown information. mary, resolving inconsistencies and retaining the most consistent actionable points view at source ↗

**Figure 16.** Figure 16: Meta prompt for generating contrastive feedback by comparing a correct and an incorrect memory-update view at source ↗

**Figure 17.** Figure 17: Meta prompt for synthesizing multiple analysis outputs into a single consolidated feedback summary view at source ↗

**Figure 18.** Figure 18: Meta prompt for updating template_evolve using aggregated feedback while strictly preserving required placeholders and forbidding new ones (TEMPLATE_OPTIM). TEMPLATE_EVOLVE_STEP50: Evidence-Bound Memory Update from a New Chunk TEMPLATE_EVOLVE_STEP50 = """ You are given a question with options, some new memory, and previous user memory. Read the new section and update the user memory by prioritizing recent… view at source ↗

**Figure 19.** Figure 19: Meta prompt for updating the user memory profile from a newly observed dialogue chunk with view at source ↗

**Figure 20.** Figure 20: Shared prompt for final answer generation on multiple-choice benchmarks (PersonaMem, and PrefEval). view at source ↗

**Figure 21.** Figure 21: Shared prompt for final answer generation on PersonaBench, constraining outputs to only the relevant view at source ↗

read the original abstract

Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memory updates, sparse outcome rewards provide weak supervision, resulting in unstable long-horizon optimization. Drawing on memory schema theory and the functional division between prefrontal regions and hippocampus regions, we introduce MemCoE, a cognition-inspired two-stage optimization framework that learns how memory should be organized and what information to update. In the first stage, we propose Memory Guideline Induction to optimize a global guideline via contrastive feedback interpreted as textual gradients; in the second stage, Guideline-Aligned Memory Policy Optimization uses the induced guideline to define structured process rewards and performs multi-turn RL to learn a guideline-following memory evolution policy. We evaluate on three personalization memory benchmarks, covering explicit/implicit preference and different sizes and noise, and observe consistent improvements over strong baselines with favorable robustness, transferability, and efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's two-stage MemCoE framework separates guideline induction from RL policy learning for LLM memory evolution, but the abstract supplies no numbers or ablations to show the split actually fixes sparse rewards.

read the letter

The core idea is straightforward: first induce a global memory guideline from contrastive feedback treated as textual gradients, then feed that guideline into process rewards for multi-turn RL so the policy learns stable memory updates over long interactions. This decomposition is the main novelty relative to prior static-rule or plain-RL memory agents. It draws on a cognitive split between organization and content selection, which gives the method a clean structure and directly targets the personalization consistency problem in LLM agents with limited context windows. The evaluation claim covers three benchmarks with explicit and implicit preferences plus noise variations, and the abstract says the results show gains in robustness and efficiency over strong baselines. That framing is useful for anyone building interactive agents that need to track evolving user preferences without hand-crafted rules. The soft spots are exactly where the stress-test note flags them. No quantitative results, error bars, or baseline descriptions appear in the abstract, and there is no ablation or derivation showing that the induced guideline measurably densifies the RL rewards or reduces variance in the second stage. The circularity risk is also present: the same contrastive signal shapes both the guideline and the process rewards, so without an external anchor it is hard to tell whether the system is learning a general rule or just echoing the feedback loop. If the full paper contains the missing tables, ablations, and reward-density measurements, those gaps close; otherwise the central empirical claim stays under-supported. This work is aimed at researchers on agent memory and long-horizon RL for personalization. A reader already working on those topics would get value from the explicit two-stage design even if the experiments need tightening. It deserves peer review because the problem is practical and the proposed structure is coherent, though any referee will need to see the actual numbers and controls before the robustness claims can be taken at face value.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes MemCoE, a cognition-inspired two-stage optimization framework for evolving memory in LLM agents. Stage 1 induces a global memory guideline from contrastive feedback interpreted as textual gradients. Stage 2 uses the guideline to supply structured process rewards for multi-turn RL, training a guideline-following memory evolution policy. The approach is evaluated on three personalization memory benchmarks covering explicit/implicit preferences and varying sizes/noise levels, with claims of consistent improvements over strong baselines along with favorable robustness, transferability, and efficiency.

Significance. If the two-stage separation reliably converts contrastive feedback into stable, dense process rewards that resolve the sparse-outcome instability identified in prior RL memory work, the framework could advance long-horizon memory management for personalized LLM agents by providing both interpretability (via the explicit guideline) and empirical gains. The cognitive analogy and explicit decoupling of guideline induction from policy optimization are conceptually attractive strengths.

major comments (3)

[Abstract] Abstract: the central claims of 'consistent improvements over strong baselines with favorable robustness, transferability, and efficiency' are stated without any quantitative metrics, error bars, baseline descriptions, or ablation results, preventing assessment of effect sizes or whether the two-stage procedure actually outperforms direct RL or static rules.
[Method] Description of the two-stage framework: the guideline produced by contrastive feedback in stage 1 is used directly to define the process rewards in stage 2, creating a circular dependency in which the RL optimization target is shaped by the same learned component without an independent external benchmark or validation that the guideline is optimal or general.
[Experiments] Evaluation section: no ablation, convergence argument, or analysis is supplied demonstrating that the induced guideline measurably reduces reward sparsity or variance in the multi-turn RL stage, despite the abstract explicitly identifying sparse outcome rewards as the source of instability in prior work.

minor comments (1)

The term 'textual gradients' is used without a formal definition or worked example showing how contrastive pairs are converted into an update rule for the guideline.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments and positive recognition of the conceptual contributions. We address each major comment point by point below, clarifying our approach where needed and committing to revisions that strengthen the presentation without altering the core claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims of 'consistent improvements over strong baselines with favorable robustness, transferability, and efficiency' are stated without any quantitative metrics, error bars, baseline descriptions, or ablation results, preventing assessment of effect sizes or whether the two-stage procedure actually outperforms direct RL or static rules.

Authors: We agree that the abstract would benefit from greater specificity to convey effect sizes. In the revised version we will insert concise quantitative highlights drawn from the experimental results (e.g., average accuracy gains across the three benchmarks and reference to statistical robustness), while remaining within the abstract length constraint. Baseline families and the two-stage versus direct-RL comparison will be mentioned at a high level. revision: yes
Referee: [Method] Description of the two-stage framework: the guideline produced by contrastive feedback in stage 1 is used directly to define the process rewards in stage 2, creating a circular dependency in which the RL optimization target is shaped by the same learned component without an independent external benchmark or validation that the guideline is optimal or general.

Authors: The two stages are strictly sequential and non-circular. Stage 1 performs an independent optimization of the guideline using only contrastive textual feedback; once induced, the guideline is frozen and supplied as a fixed reward-shaping function to Stage 2. The RL policy is then optimized against this fixed external signal. We will add explicit wording and a diagram annotation in the method section to emphasize the one-way information flow and note that the guideline itself can be inspected or validated on held-out data independently of the RL stage. revision: partial
Referee: [Experiments] Evaluation section: no ablation, convergence argument, or analysis is supplied demonstrating that the induced guideline measurably reduces reward sparsity or variance in the multi-turn RL stage, despite the abstract explicitly identifying sparse outcome rewards as the source of instability in prior work.

Authors: We acknowledge that a direct quantitative demonstration of reduced reward sparsity or variance would strengthen the link to the motivating problem. Although overall performance gains are reported, we did not include a dedicated ablation isolating this mechanism. In the revision we will add a short analysis (main text or appendix) comparing reward variance and convergence behavior between the guideline-aligned RL and a direct-outcome-reward baseline, using the same multi-turn setup. revision: yes

Circularity Check

0 steps flagged

No significant circularity: two-stage framework validated on external benchmarks

full rationale

The paper proposes MemCoE as a two-stage process where stage 1 induces a global guideline from contrastive feedback and stage 2 uses it to shape process rewards for RL-based policy learning. No equations, definitions, or self-citations in the provided text reduce the final performance claims to the inputs by construction. The central results rest on empirical improvements over baselines across three independent personalization memory benchmarks with varying preference types, sizes, and noise levels, providing external falsifiability outside the fitted guideline itself.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The framework rests on a domain assumption from cognitive science and introduces new learned components whose values are fitted during the two optimization stages.

free parameters (2)

guideline induction parameters
Parameters controlling contrastive feedback and textual gradient updates in stage one.
RL process reward scaling
Scaling factors that turn the induced guideline into structured rewards for the policy optimization.

axioms (1)

domain assumption Memory schema theory and functional division between prefrontal regions and hippocampus regions
Invoked to justify the separation into guideline induction and policy optimization stages.

invented entities (1)

MemCoE two-stage framework no independent evidence
purpose: To jointly learn memory organization and update policy
New composite system whose components are defined and optimized within the paper.

pith-pipeline@v0.9.0 · 5514 in / 1283 out tokens · 50167 ms · 2026-05-09T19:07:22.493037+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

154 extracted references · 61 canonical work pages · 23 internal anchors

[2]

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation , author=. arXiv preprint arXiv:2402.03216 , year=

work page internal anchor Pith review arXiv
[3]

Automating customer service using langchain: Building custom open-source gpt chatbot for organizations.arXiv preprint arXiv:2310.05421, 2023

Automating Customer Service using LangChain: Building custom open-source GPT Chatbot for organizations , author=. arXiv preprint arXiv:2310.05421 , year=

work page arXiv
[4]

The Thirteenth International Conference on Learning Representations , year=

SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents , author=. The Thirteenth International Conference on Learning Representations , year=
[5]

Neurocomputing , pages=

Recursively summarizing enables long-term dialogue memory in large language models , author=. Neurocomputing , pages=. 2025 , publisher=

2025
[6]

Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10) , pages=

PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Fusion in Question Answering , author=. Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10) , pages=
[7]

arXiv preprint arXiv:2409.15240 , year=

MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation , author=. arXiv preprint arXiv:2409.15240 , year=

work page arXiv
[8]

arXiv preprint arXiv:2409.20163 , year=

Memsim: A bayesian simulator for evaluating memory of llm-based personal assistants , author=. arXiv preprint arXiv:2409.20163 , year=

work page arXiv
[9]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , year=

Beyond Goldfish Memory: Long-Term Open-Domain Conversation , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , year=
[10]

Proceedings of the 31st International Conference on Computational Linguistics: Industry Track , pages=

CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding , author=. Proceedings of the 31st International Conference on Computational Linguistics: Industry Track , pages=
[11]

Foundations and Trends

The probabilistic relevance framework: BM25 and beyond , author=. Foundations and Trends. 2009 , publisher=

2009
[13]

LangChain Team , title =
[15]

arXiv preprint arXiv:2210.08750 , year=

Keep me updated! memory management in long-term conversations , author=. arXiv preprint arXiv:2210.08750 , year=

work page arXiv
[16]

Towards General Text Embeddings with Multi-stage Contrastive Learning

Towards general text embeddings with multi-stage contrastive learning , author=. arXiv preprint arXiv:2308.03281 , year=

work page internal anchor Pith review arXiv
[17]

arXiv preprint arXiv:2402.11573 , year=

Bge landmark embedding: A chunking-free embedding method for retrieval augmented long-context large language models , author=. arXiv preprint arXiv:2402.11573 , year=

work page arXiv
[18]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Text embeddings by weakly-supervised contrastive pre-training , author=. arXiv preprint arXiv:2212.03533 , year=

work page internal anchor Pith review arXiv
[19]

Unsupervised Dense Information Retrieval with Contrastive Learning

Unsupervised dense information retrieval with contrastive learning , author=. arXiv preprint arXiv:2112.09118 , year=

work page internal anchor Pith review arXiv
[20]

, author=

Dense Passage Retrieval for Open-Domain Question Answering. , author=. EMNLP (1) , pages=
[21]

Proceedings of the 31st International Conference on Computational Linguistics , pages=

Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=
[22]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024
[23]

arXiv e-prints , pages=

Theanine: Revisiting memory management in long-term conversations with timeline-augmented response generation , author=. arXiv e-prints , pages=
[25]

Towards Lifelong Dialogue Agents via Timeline-based Memory Management , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025
[26]

arXiv preprint arXiv:2410.17509 , year=

WAGLE: Strategic weight attribution for effective and modular unlearning in large language models , author=. arXiv preprint arXiv:2410.17509 , year=

work page arXiv
[29]

and Stoica, Ion and Gonzalez, Joseph E

Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G. and Stoica, Ion and Gonzalez, Joseph E. , journal=
[30]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Memorybank: Enhancing large language models with long-term memory , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[31]

Hipporag: Neurobiologically inspired long-term memory for large language models , author=
[32]

arXiv preprint arXiv:2404.07103 , year=

Graph chain-of-thought: Augmenting large language models by reasoning on graphs , author=. arXiv preprint arXiv:2404.07103 , year=

work page arXiv
[35]

Advances in Neural Information Processing Systems , volume=

G-retriever: Retrieval-augmented generation for textual graph understanding and question answering , author=. Advances in Neural Information Processing Systems , volume=
[36]

Advances in Neural Information Processing Systems , volume=

Tablerag: Million-token table understanding with language models , author=. Advances in Neural Information Processing Systems , volume=
[37]

Parth Sarthi and Salman Abdullah and Aditi Tuli and Shubh Khanna and Anna Goldie and Christopher D Manning , booktitle=
[38]

Lightrag: Simple and fast retrieval-augmented generation , author=
[39]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

From local to global: A graph rag approach to query-focused summarization , author=. arXiv preprint arXiv:2404.16130 , year=

work page internal anchor Pith review arXiv
[40]

arXiv preprint arXiv:2410.08815 , year=

Structrag: Boosting knowledge intensive reasoning of llms via inference-time hybrid information structurization , author=. arXiv preprint arXiv:2410.08815 , year=

work page arXiv
[42]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Evaluating Very Long-Term Conversational Memory of LLM Agents , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[43]

The Thirteenth International Conference on Learning Representations , year=

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory , author=. The Thirteenth International Conference on Learning Representations , year=
[45]

Findings of the Association for Computational Linguistics: ACL 2023 , pages=

Prompted LLMs as Chatbot Modules for Long Open-domain Conversation , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=

2023
[47]

Frontiers of Computer Science , volume=

A survey on large language model based autonomous agents , author=. Frontiers of Computer Science , volume=. 2024 , publisher=

2024
[48]

arXiv preprint arXiv:2504.10147 , year=

A Survey of Personalization: From RAG to Agent , author=. arXiv preprint arXiv:2504.10147 , year=

work page arXiv
[50]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

ChatDev: Communicative Agents for Software Development , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[51]

Lost in the Middle: How Language Models Use Long Contexts

Lost in the middle: How language models use long contexts , author=. arXiv preprint arXiv:2307.03172 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[52]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
[53]

Advances in neural information processing systems , volume=

Mpnet: Masked and permuted pre-training for language understanding , author=. Advances in neural information processing systems , volume=
[54]

P er LTQA : A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Fusion in Question Answering

Du, Yiming and Wang, Hongru and Zhao, Zhengyi and Liang, Bin and Wang, Baojun and Zhong, Wanjun and Wang, Zezhong and Wong, Kam-Fai. P er LTQA : A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Fusion in Question Answering. Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10). 2024

2024
[55]

Gonzalez and Ion Stoica , booktitle=

Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica , booktitle=. Judging
[56]

, author=

The hippocampal memory indexing theory. , author=. Behavioral neuroscience , volume=. 1986 , publisher=

1986
[57]

2025 , eprint=

MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation , author=. 2025 , eprint=

2025
[58]

Advances in neural information processing systems , volume=

The infinite Gaussian mixture model , author=. Advances in neural information processing systems , volume=
[59]

IEEE Transactions on Knowledge and Data Engineering , year=

Efficient algorithms for personalized pagerank computation: A survey , author=. IEEE Transactions on Knowledge and Data Engineering , year=
[60]

Proceedings of the 11th international conference on World Wide Web , pages=

Topic-sensitive pagerank , author=. Proceedings of the 11th international conference on World Wide Web , pages=
[61]

arXiv preprint arXiv:2312.17257 , year=

Personalized Large Language Model Assistant with Evolving Conditional Memory , author=. arXiv preprint arXiv:2312.17257 , year=

work page arXiv
[62]

Colbertv2: Eﬀective and eﬃcient retrieval via light weight late interaction

Colbertv2: Effective and efficient retrieval via lightweight late interaction , author=. arXiv preprint arXiv:2112.01488 , year=

work page arXiv
[63]

A survey of graph retrieval-augmented generation for customized large language models

A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models , author=. arXiv preprint arXiv:2501.13958 , year=

work page arXiv
[70]

arXiv preprint arXiv:2507.22925 , year=

Hierarchical memory for high-efficiency long-term reasoning in llm agents , author=. arXiv preprint arXiv:2507.22925 , year=

work page arXiv
[71]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Long-context language modeling with parallel context encoding , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[78]

First Workshop on Multi-Turn Interactions in Large Language Models , year=

PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time , author=. First Workshop on Multi-Turn Interactions in Large Language Models , year=
[81]

Psychological Bulletin , volume=

Is memory schematic? , author=. Psychological Bulletin , volume=. 1983 , publisher=

1983
[82]

2025 , eprint=

Memory in the Age of AI Agents , author=. 2025 , eprint=

2025
[83]

arXiv preprint arXiv:2505.15456 , year=

Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment , author=. arXiv preprint arXiv:2505.15456 , year=

work page arXiv
[85]

Forty-second International Conference on Machine Learning , year=

Agent Workflow Memory , author=. Forty-second International Conference on Machine Learning , year=
[88]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Expel: Llm agents are experiential learners , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[89]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[91]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Personabench: Evaluating ai models on understanding personal information through accessing (synthetic) private user data , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025
[92]

Advances in neural information processing systems , volume=

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers , author=. Advances in neural information processing systems , volume=
[94]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[95]

2025 , eprint=

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey , author=. 2025 , eprint=

2025
[96]

Evoking User Memory: Personalizing

Yingyi Zhang and Junyi Li and Wenlin Zhang and Pengyue Jia and Xianneng Li and Yichao Wang and Derong Xu and Yi Wen and Huifeng Guo and Yong Liu and Xiangyu Zhao , booktitle=. Evoking User Memory: Personalizing. 2026 , url=

2026
[97]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Personalize before retrieve: Llm-based personalized query expansion for user-centric retrieval , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[98]

The Fourteenth International Conference on Learning Representations , year=

From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents , author=. The Fourteenth International Conference on Learning Representations , year=
[99]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Harnessing large language models for knowledge graph question answering via adaptive multi-aspect retrieval-augmentation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[101]

Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models

Xu, Derong and Zhang, Ziheng and Lin, Zhenxi and Wu, Xian and Zhu, Zhihong and Xu, Tong and Zhao, Xiangyu and Zheng, Yefeng and Chen, Enhong. Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-CO...

2024
[102]

Proceedings of the ACM on Web Conference 2025 , pages=

Llm4rerank: Llm-based auto-reranking framework for recommendations , author=. Proceedings of the ACM on Web Conference 2025 , pages=

2025
[103]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Notellm-2: Multimodal large representation models for recommendation , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1 , pages=
[104]

ACM Transactions on Information Systems , volume=

A unified framework for multi-domain ctr prediction via large language models , author=. ACM Transactions on Information Systems , volume=. 2025 , publisher=

2025
[105]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Llm-powered user simulator for recommender system , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[106]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Large Language Model Enhanced Recommender Systems: Methods, Applications and Trends , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2 , pages=
[108]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Mill: Mutual verification with large language models for zero-shot query expansion , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2024
[109]

Nature , volume=

Optimizing generative ai by backpropagating language model feedback , author=. Nature , volume=. 2025 , publisher=

2025
[110]

The Twelfth International Conference on Learning Representations , year=

Large language models as optimizers , author=. The Twelfth International Conference on Learning Representations , year=
[111]

gradient descent

Automatic prompt optimization with “gradient descent” and beam search , author=. Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

2023
[112]

Advances in neural information processing systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=
[113]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Unleashing the potential of large language models as prompt optimizers: Analogical analysis with gradient-based model optimizers , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Showing first 80 references.