pith. machine review for the scientific record. sign in

arxiv: 2604.06845 · v1 · submitted 2026-04-08 · 💻 cs.CL · cs.AI

Recognition: 3 theorem links

· Lean Theorem

HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:50 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords long-term memorydialogue systemsevent segmentationquery-adaptive retrievalboundary-guided indexingscalable dialoguesmemory efficiency
0
0 comments X

The pith

HingeMem builds long-term dialogue memory by drawing boundaries at changes in person, time, location, or topic and adapts retrieval to the query type.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes HingeMem as a way to manage long-term memory in dialogue systems that handle ongoing conversations. Existing approaches either summarize continuously or build graphs with fixed retrieval, which limits flexibility and increases costs. HingeMem instead uses boundaries triggered by shifts in four elements to segment memory into interpretable parts. It then routes queries adaptively to decide what and how much to retrieve. This leads to better performance on benchmarks with lower computational demands, making sustained personalized interactions more feasible.

Core claim

HingeMem operationalizes event segmentation theory to construct an interpretable indexing interface using boundary-triggered hyperedges over four elements—person, time, location, and topic. Boundaries are drawn when any element changes, writing the current segment to reduce redundancy while preserving salient context. Query-adaptive retrieval then determines routing over the element-indexed memory and controls depth based on estimated query type, enabling efficient handling of diverse information needs.

What carries the argument

Boundary-triggered hyperedges indexed by changes in person, time, location, and topic, combined with query-conditioned routing and depth control for adaptive retrieval.

Load-bearing premise

That changes in the four elements reliably mark salient boundaries that preserve necessary context and that query-type estimation can be performed robustly enough to control retrieval depth without missing critical information.

What would settle it

A controlled experiment on dialogue sequences where an element change occurs but later queries require information from before the boundary, checking if retrieval accuracy drops compared to non-boundary methods.

Figures

Figures reproduced from arXiv: 2604.06845 by Haofen Wang, Yijie Zhong, Yunfan Gao.

Figure 1
Figure 1. Figure 1: Existing methods face two challenges when facing [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall Boundary-guided Long Term Memory construction and Query Adaptive Retrieval process of HingeMem. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Diagram of adaptive stop for different query types. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A comparative analysis of efficiency across different [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Analysis of the token count and its distribution [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The performance comparison of different model [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance of different values for 𝜆𝑘𝑛𝑒𝑒 C.1 𝜆𝑘𝑛𝑒𝑒 for Recall-Priority Query [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Impact of the scale for precision-priority query. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

Long-term memory is critical for dialogue systems that support continuous, sustainable, and personalized interactions. However, existing methods rely on continuous summarization or OpenIE-based graph construction paired with fixed Top-\textit{k} retrieval, leading to limited adaptability across query categories and high computational overhead. In this paper, we propose HingeMem, a boundary-guided long-term memory that operationalizes event segmentation theory to build an interpretable indexing interface via boundary-triggered hyperedges over four elements: person, time, location, and topic. When any such element changes, HingeMem draws a boundary and writes the current segment, thereby reducing redundant operations and preserving salient context. To enable robust and efficient retrieval under diverse information needs, HingeMem introduces query-adaptive retrieval mechanisms that jointly decide (a) \textit{what to retrieve}: determine the query-conditioned routing over the element-indexed memory; (b) \textit{how much to retrieve}: control the retrieval depth based on the estimated query type. Extensive experiments across LLM scales (from 0.6B to production-tier models; \textit{e.g.}, Qwen3-0.6B to Qwen-Flash) on LOCOMO show that HingeMem achieves approximately $20\%$ relative improvement over strong baselines without query categories specification, while reducing computational cost (68\%$\downarrow$ question answering token cost compared to HippoRAG2). Beyond advancing memory modeling, HingeMem's adaptive retrieval makes it a strong fit for web applications requiring efficient and trustworthy memory over extended interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes HingeMem, a boundary-guided long-term memory architecture for dialogue systems. It segments conversations by drawing boundaries on changes in any of four elements (person, time, location, topic) to form hyperedges, then applies query-adaptive retrieval that routes over element-indexed memory and controls depth via estimated query type. Experiments on LOCOMO across LLM scales (0.6B to production models) report ~20% relative gains over strong baselines and 68% lower QA token cost versus HippoRAG2.

Significance. If the empirical results prove robust, HingeMem would advance scalable long-term memory modeling by grounding segmentation in event theory and replacing fixed top-k retrieval with query-conditioned mechanisms, yielding both performance and efficiency benefits for extended interactions.

major comments (3)
  1. [§3.1] §3.1 (Boundary construction): The operationalization that a boundary is drawn exactly when any of the four elements changes is presented without human agreement studies or error analysis on cases of topic drift without element change; this assumption is load-bearing for the claim that salient context is preserved without loss.
  2. [§4] §4 (Experiments): The headline claims of ~20% relative improvement and 68% token reduction are reported without error bars, ablation controls on the query-type classifier, or data-exclusion rules, preventing assessment of whether the gains are reliable or sensitive to implementation details.
  3. [§3.2] §3.2 (Query-adaptive retrieval): No robustness checks or error analysis on query-type misclassification and resulting under-retrieval are provided; this directly affects the validity of the efficiency claims versus HippoRAG2.
minor comments (1)
  1. [Abstract] The abstract states gains 'without query categories specification' yet the method relies on LLM-based query-type estimation; a brief clarification of how types are derived without predefined categories would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback and the recommendation for major revision. We address each major comment point by point below, outlining our responses and the revisions we will incorporate to improve the manuscript.

read point-by-point responses
  1. Referee: [§3.1] §3.1 (Boundary construction): The operationalization that a boundary is drawn exactly when any of the four elements changes is presented without human agreement studies or error analysis on cases of topic drift without element change; this assumption is load-bearing for the claim that salient context is preserved without loss.

    Authors: We thank the referee for this observation. The boundary construction is explicitly motivated by event segmentation theory, where shifts in person, time, location, or topic serve as natural delimiters for dialogue events. While the manuscript does not include new human agreement studies, the four-element set is drawn from established dialogue and cognitive science literature. To directly address concerns about topic drift without element changes, we will add an error analysis subsection to §3.1 in the revision. This will examine instances from the LOCOMO dataset, quantify their occurrence, and assess impact on segmentation quality and downstream performance. We believe the reported gains provide supporting evidence, but the added analysis will strengthen the justification. revision: partial

  2. Referee: [§4] §4 (Experiments): The headline claims of ~20% relative improvement and 68% token reduction are reported without error bars, ablation controls on the query-type classifier, or data-exclusion rules, preventing assessment of whether the gains are reliable or sensitive to implementation details.

    Authors: We agree that error bars, ablations, and clearer data rules would improve assessment of reliability. In the revised manuscript, we will add error bars to the primary results (computed over multiple runs with varied seeds) and include a dedicated ablation on the query-type classifier to isolate its effect on both accuracy and token efficiency. We will also expand the experimental setup to explicitly state the data preprocessing steps and any exclusion criteria applied to LOCOMO. These changes will allow readers to better evaluate the robustness of the ~20% gains and 68% cost reduction. revision: yes

  3. Referee: [§3.2] §3.2 (Query-adaptive retrieval): No robustness checks or error analysis on query-type misclassification and resulting under-retrieval are provided; this directly affects the validity of the efficiency claims versus HippoRAG2.

    Authors: We acknowledge the value of analyzing query-type misclassification effects on retrieval. In the revision, we will add robustness checks and error analysis to §3.2, reporting the classifier's accuracy on held-out queries, characterizing misclassification patterns, and quantifying their influence on retrieval depth and token consumption. We will also include targeted comparisons showing performance under misclassified queries relative to HippoRAG2. This material will directly support the efficiency claims while highlighting any limitations of the adaptive mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on experimental outcomes from a heuristically defined method

full rationale

The paper defines HingeMem via an operationalization of event segmentation theory that draws boundaries on changes to four elements (person/time/location/topic) and adds query-type-based adaptive retrieval depth. No equations, derivations, or self-citations appear that reduce the reported 20% gains or 68% token reductions to fitted parameters or prior results by construction. Performance numbers are presented strictly as measured outcomes on the LOCOMO benchmark against baselines, leaving the core argument self-contained and externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The method rests on the domain assumption that event segmentation theory supplies usable boundaries and introduces the new hyperedge structure without external falsifiable evidence beyond the proposed system itself.

axioms (1)
  • domain assumption Event segmentation theory can be operationalized by drawing boundaries on changes in person, time, location, or topic to create interpretable memory segments.
    Directly invoked to justify the indexing interface and segment writing process.
invented entities (1)
  • Boundary-triggered hyperedges no independent evidence
    purpose: To index and store dialogue segments over the four elements for efficient retrieval.
    New data structure introduced by the paper; no independent evidence outside the proposed method is supplied.

pith-pipeline@v0.9.0 · 5584 in / 1232 out tokens · 33611 ms · 2026-05-10T18:50:33.856274+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 19 canonical work pages · 7 internal anchors

  1. [1]

    Layla El Asri, Hannes Schulz, Shikhar Sharma, Jeremie Zumer, Justin Harris, Emery Fine, Rahul Mehrotra, and Kaheer Suleman. 2017. Frames: a corpus for adding memory to goal-oriented dialogue systems. InSIGDIAL Conference. Association for Computational Linguistics, 207–219

  2. [2]

    Sanghwan Bae, Dong-Hyun Kwak, Soyoung Kang, Min Young Lee, Sungdong Kim, Yuin Jeong, Hyeri Kim, Sang-Woo Lee, Woo-Myoung Park, and Nako Sung

  3. [3]

    In EMNLP (Findings)

    Keep Me Updated! Memory Management in Long-term Conversations. In EMNLP (Findings). Association for Computational Linguistics, 3769–3787

  4. [4]

    Christopher Baldassano, Janice Chen, Asieh Zadbood, Jonathan W Pillow, Uri Hasson, and Kenneth A Norman. 2017. Discovering event structure in continuous narrative perception and memory.Neuron95, 3 (2017), 709–721

  5. [5]

    Alexander J Barnett, Mitchell Nguyen, James Spargo, Reesha Yadav, Brendan I Cohn-Sheehy, and Charan Ranganath. 2024. Hippocampal-cortical interactions during event boundaries support retention of complex narrative events.Neuron 112, 2 (2024), 319–330

  6. [6]

    Yuxi Bi, Yunfan Gao, and Haofen Wang. 2025. StePO-Rec: Towards Personalized Outfit Styling Assistant via Knowledge-Guided Multi-Step Reasoning.CoRR abs/2504.09915 (2025)

  7. [7]

    Irving Biederman. 1987. Recognition-by-components: a theory of human image understanding.Psychological review94, 2 (1987), 115

  8. [8]

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav

  9. [9]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory.CoRRabs/2504.19413 (2025)

  10. [10]

    Finetune Corp. 2024. Memary: The Open Source Memory Layer For Autonomous Agents. https://github.com/kingjulio8238/memary

  11. [11]

    Yiming Du, Wenyu Huang, Danna Zheng, Zhaowei Wang, Sébastien Montella, Mirella Lapata, Kam-Fai Wong, and Jeff Z. Pan. 2025. Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions.CoRRabs/2505.00675 (2025)

  12. [12]

    Yiming Du, Hongru Wang, Zhengyi Zhao, Bin Liang, Baojun Wang, Wanjun Zhong, Zezhong Wang, and Kam-Fai Wong. 2024. PerLTQA: A Personal Long- Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering.CoRRabs/2402.16288 (2024)

  13. [13]

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. 2024. From Local to Global: A Graph RAG Approach to Query-Focused Summarization.CoRRabs/2404.16130 (2024)

  14. [14]

    Institute for Basic Science. 2023. AI’s memory-forming mechanism found to be strikingly similar to that of the brain.ScienceDaily(2023)

  15. [15]

    Nicholas T Franklin, Kenneth A Norman, Charan Ranganath, Jeffrey M Zacks, and Samuel J Gershman. 2020. Structured Event Memory: A neuro-symbolic model of event cognition.Psychological review127, 3 (2020), 327

  16. [16]

    Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. 2023. Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System.CoRRabs/2303.14524 (2023)

  17. [17]

    Yubin Ge, Salvatore Romeo, Jason Cai, Raphael Shu, Yassine Benajiba, Mon- ica Sunkara, and Yi Zhang. 2025. TReMu: Towards Neuro-Symbolic Temporal Reasoning for LLM-Agents with Memory in Multi-Session Dialogues. InACL (Findings). Association for Computational Linguistics, 18974–18988

  18. [18]

    Linda Geerligs, Dora Gözükara, Djamari Oetringer, Karen L Campbell, Marcel van Gerven, and Umut Güçlü. 2022. A partially nested cortical hierarchy of neural states underlies event segmentation in the human brain.elife11 (2022), e77430

  19. [19]

    Gaurav Goswami. 2025. Dissecting the metrics: How different evaluation ap- proaches yield diverse results for conversational ai.Authorea Preprints(2025). TechRiv:26407 https://www.techrxiv.org/inst/26407

  20. [20]

    Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su

  21. [21]

    InNeurIPS

    HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. InNeurIPS

  22. [22]

    Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Yu Su. 2025. From RAG to Memory: Non-Parametric Continual Learning for Large Language Models.CoRRabs/2502.14802 (2025)

  23. [23]

    Junqing He, Liang Zhu, Rui Wang, Xi Wang, Gholamreza Haffari, and Jiax- ing Zhang. 2025. MADial-Bench: Towards Real-world Evaluation of Memory- Augmented Dialogue Generation. InNAACL (Long Papers). Association for Com- putational Linguistics, 9902–9921

  24. [24]

    Jihyoung Jang, Minseong Boo, and Hyounghun Kim. 2023. Conversation Chroni- cles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Con- versations. InEMNLP. Association for Computational Linguistics, 13584–13606

  25. [25]

    Eunwon Kim, Chanho Park, and Buru Chang. 2025. SHARE: Shared Memory- Aware Open-Domain Long-Term Dialogue Dataset Constructed from Movie Script. InACL (1). Association for Computational Linguistics, 14474–14498

  26. [26]

    Seo Hyun Kim, Keummin Ka, Yohan Jo, Seung-won Hwang, Dongha Lee, and Jinyoung Yeo. 2024. Ever-Evolving Memory by Blending and Refining the Past. CoRRabs/2403.04787 (2024)

  27. [27]

    Canny, and Ian Fischer

    Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John F. Canny, and Ian Fischer

  28. [28]

    A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts. InICML. OpenReview.net

  29. [29]

    Zhiyu Li, Shichao Song, Chenyang Xi, Hanyu Wang, Chen Tang, Simin Niu, Ding Chen, Jiawei Yang, Chunyu Li, Qingchen Yu, Jihao Zhao, Yezhaohui Wang, Peng Liu, Zehao Lin, Pengyuan Wang, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, Zhen Tao, Junpeng Ren, Huayi Lai, Hao Wu, Bo Tang, Zhenren Wang, Zhaoxin Fan, Ningyu Zhang, Linfeng Zhang, Junchi Yan, Mingchuan...

  30. [30]

    Lei Liu, Xiaoyan Yang, Yue Shen, Binbin Hu, Zhiqiang Zhang, Jinjie Gu, and Guannan Zhang. 2023. Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory.CoRRabs/2311.08719 (2023)

  31. [31]

    Junru Lu, Siyu An, Mingbao Lin, Gabriele Pergola, Yulan He, Di Yin, Xing Sun, and Yunsheng Wu. 2023. MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation.CoRRabs/2308.08239 (2023)

  32. [32]

    Qihong Lu, Uri Hasson, and Kenneth A Norman. 2022. A neural network model of when to retrieve and encode episodic memories.elife11 (2022), e74445

  33. [33]

    Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating Very Long-Term Conversational Memory of LLM Agents. InACL (1). Association for Computational Linguistics, 13851–13870

  34. [34]

    Jisoo Mok, Ik-hwan Kim, Sangkwon Park, and Sungroh Yoon. 2025. Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis. InACL (1). Association for Computational Linguistics, 10212–10239

  35. [35]

    Kai Tzu-iunn Ong, Namyoung Kim, Minju Gwak, Hyungjoo Chae, Taeyoon Kwon, Yohan Jo, Seung-won Hwang, Dongha Lee, and Jinyoung Yeo. 2025. Towards Lifelong Dialogue Agents via Timeline-based Memory Management. InNAACL (Long Papers). Association for Computational Linguistics, 8631–8661

  36. [36]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems.CoRR abs/2310.08560 (2023)

  37. [37]

    Advait Paliwal. 2024. Reminisc: Memory for LLMs. https://github.com/ advaitpaliwal/reminisc

  38. [38]

    Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. 2025. Zep: A Temporal Knowledge Graph Architecture for Agent Memory. CoRRabs/2501.13956 (2025)

  39. [39]

    Zachariah M Reagh, Angelique I Delarazan, Alexander Garber, and Charan Ran- ganath. 2020. Aging alters neural activity at event boundaries in the hippocampus and Posterior Medial network.Nature communications11, 1 (2020), 3980

  40. [40]

    Alireza Salemi, Sheshera Mysore, Michael Bendersky, and Hamed Zamani. 2024. LaMP: When Large Language Models Meet Personalization. InACL (1). Associa- tion for Computational Linguistics, 7370–7392

  41. [41]

    Gregor Sieber and Brigitte Krenn. 2010. Towards an Episodic Memory for Com- panion Dialogue. InIV A (Lecture Notes in Computer Science, Vol. 6356). Springer, 322–328

  42. [42]

    Arpita Soni, Rajeev Arora, Anoop Kumar, and Dheerendra Panwar. 2024. Evalu- ating Domain Coverage in Low-Resource Generative Chatbots: A Comparative Study of Open-Domain and Closed-Domain Approaches Using BLEU Scores. In 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT), Vol. 1. 1–6. doi:10.1109/ICEECT61758.2024.10738994

  43. [43]

    Hao Sun, Hengyi Cai, Bo Wang, Yingyan Hou, Xiaochi Wei, Shuaiqiang Wang, Yan Zhang, and Dawei Yin. 2024. Towards Verifiable Text Generation with Evolving Memory and Self-Reflection. InEMNLP. Association for Computational Linguistics, 8211–8227

  44. [44]

    Jingwei Sun, Zhixu Du, and Yiran Chen. 2024. Knowledge Graph Tuning: Real- time Large Language Model Personalization based on Human Feedback.CoRR abs/2405.19686 (2024)

  45. [45]

    Wang-Chiew Tan, Jane Dwivedi-Yu, Yuliang Li, Lambert Mathias, Marzieh Saeidi, Jing Nathan Yan, and Alon Y. Halevy. 2023. TimelineQA: A Benchmark for Ques- tion Answering over Timelines. InACL (Findings). Association for Computational Linguistics, 77–91

  46. [46]

    Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, Anand Rajan Iyer, Tianlong Chen, Huan Liu, Chen-Yu Lee, and Tomas Pfister

    Zhen Tan, Jun Yan, I-Hung Hsu, Rujun Han, Zifeng Wang, Long T. Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, Anand Rajan Iyer, Tianlong Chen, Huan Liu, Chen-Yu Lee, and Tomas Pfister. 2025. In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents. InACL (1). Association for Computational Linguistics, 8416–8439

  47. [47]

    Bing Wang, Xinnian Liang, Jian Yang, Hui Huang, Shuangzhi Wu, Peihao Wu, Lu Lu, Zejun Ma, and Zhoujun Li. 2025. SCM: Enhancing Large Language Model with Self-Controlled Memory Framework. InDASFAA. https://arxiv.org/abs/ 2304.13343

  48. [48]

    Qingyue Wang, Yanhe Fu, Yanan Cao, Shuai Wang, Zhiliang Tian, and Liang Ding. 2025. Recursively summarizing enables long-term dialogue memory in large language models.Neurocomputing639 (2025), 130193

  49. [49]

    Zheng Wang, Zhongyang Li, Zeren Jiang, Dandan Tu, and Wei Shi. 2024. Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Mem- ory Graphs. InEMNLP. Association for Computational Linguistics, 4891–4906

  50. [50]

    Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu

  51. [51]

    LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory. InICLR. OpenReview.net. WWW ’26, April 13–17, 2026, Dubai, United Arab Emirates. Yijie Zhong, Yunfan Gao, and Haofen Wang

  52. [52]

    Yaxiong Wu, Sheng Liang, Chen Zhang, Yichao Wang, Yongyue Zhang, Huifeng Guo, Ruiming Tang, and Yong Liu. 2025. From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs.CoRRabs/2504.15965 (2025)

  53. [53]

    Jing Xu, Arthur Szlam, and Jason Weston. 2022. Beyond Goldfish Memory: Long- Term Open-Domain Conversation. InACL (1). Association for Computational Linguistics, 5180–5197

  54. [54]

    Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang

  55. [55]

    A-MEM: Agentic Memory for LLM Agents.CoRRabs/2502.12110 (2025)

  56. [56]

    Xinchao Xu, Zhibin Gou, Wenquan Wu, Zheng-Yu Niu, Hua Wu, Haifeng Wang, and Shihang Wang. 2022. Long Time No See! Open-Domain Conversation with Long-Term Persona Memory. InACL (Findings). Association for Computational Linguistics, 2639–2650

  57. [57]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jian Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, Le Yu, Liangha...

  58. [58]

    Jeffrey M Zacks and Khena M Swallow. 2007. Event segmentation.Current directions in psychological science16, 2 (2007), 80–84

  59. [59]

    Jeffrey M Zacks, Barbara Tversky, and Gowri Iyer. 2001. Perceiving, remembering, and communicating structure in events.Journal of experimental psychology: General130, 1 (2001), 29

  60. [60]

    Zheng Zhang, Minlie Huang, Zhongzhou Zhao, Feng Ji, Haiqing Chen, and Xi- aoyan Zhu. 2019. Memory-Augmented Dialogue Management for Task-Oriented Dialogue Systems.ACM Trans. Inf. Syst.37, 3 (2019), 34:1–34:30

  61. [61]

    Jie Zheng, Andrea GP Schjetnan, Mar Yebra, Bernard A Gomes, Clayton P Mosher, Suneil K Kalia, Taufik A Valiante, Adam N Mamelak, Gabriel Kreiman, and Ueli Rutishauser. 2022. Neurons detect cognitive boundaries to structure episodic memories in humans.Nature neuroscience25, 3 (2022), 358–368

  62. [62]

    Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2024. Mem- oryBank: Enhancing Large Language Models with Long-Term Memory. InAAAI. AAAI Press, 19724–19731

  63. [63]

    Yijie Zhong, Yunfan Gao, Xiaolian Zhang, and Haofen Wang. 2025. ODDA: An OODA-Driven Diverse Data Augmentation Framework for Low-Resource Relation Extraction. InFindings of the Association for Computational Linguistics: ACL 2025. 267–285

  64. [64]

    conversation segmentation and element extractor

    Yijie Zhong, Feifan Wu, Mengying Guo, Xiaolian Zhang, Meng Wang, and Haofen Wang. 2025. Meta-PKE: Memory-Enhanced Task-Adaptive Personal Knowledge Extraction in Daily Life.Inf. Process. Manag.62, 4 (2025), 104097. A More Implementation Details To enable fair comparison, we integrate all baselines into our project based on the following open-source codes. ...

  65. [65]

    Identify event or memory boundaries: When there is an ob- vious change in **person/time/location/topic**, or a new **per- son/time/location/topic** appears, or when explicit transition words appear, start a new event; otherwise, merge it into the current memory data

  66. [66]

    persons": [ {

    Extract relations and events in a unified form. Fields: persons[], times[], locations[], topics[], description, bound- aryreasons[], start turn, end turn For relations: (optional) **Person - Person**: ... ; **Person - Time**: ... ; **Person - Location**: ... For events: Fill in the corresponding fields according to the events involved and summarize the co...