SaliMory: Orchestrating Cognitive Memory for Conversational Agents

Ahmed A Aly; Ann Lee; Anuj Kumar; Ejaz Ahmed; Hongda Jiang; Hyokun Yun; Kai Zhang; Raffay Hamid; Sanat Sharma; Shereen Oraby

arxiv: 2606.04120 · v1 · pith:B575CJGAnew · submitted 2026-06-02 · 💻 cs.CL · cs.AI

SaliMory: Orchestrating Cognitive Memory for Conversational Agents

Kai Zhang , Xinyuan Zhang , Hongda Jiang , Shiun-Zu Kuo , Hyokun Yun , Ejaz Ahmed , Shereen Oraby , Ziyun Li

show 6 more authors

Sanat Sharma Ann Lee Ahmed A Aly Anuj Kumar Raffay Hamid Xin Luna Dong

This is my paper

Pith reviewed 2026-06-28 10:07 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords conversational agentsmemory managementlanguage modelsreinforcement learningpersonalizationcognitive memoryprocess reward

0 comments

The pith

SALIMORY trains one language model to filter, consolidate, and recall user facts using hierarchical stage-wise rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SALIMORY as a way to give conversational agents persistent memory without the usual problems of long context or standard reinforcement learning. It trains a single language model on a cognitively structured memory that covers user facts, preferences, and working memory. The key step is adding a hierarchical stage-wise process reward plus reward-decomposed contrastive refinement so that supervision reaches the separate operations of selective filtering, consolidation, and cue-driven recall without credit-assignment collapse. If the method works, agents would maintain accurate personal details across many turns and produce measurably higher end-to-end accuracy and personalization quality.

Core claim

SALIMORY trains a single language model to manage cognitively-structured memory by supplying isolated supervision for selective filtering, consolidation, and cue-driven recall through a hierarchical stage-wise process reward and reward-decomposed contrastive refinement, which reduces memory-attributed failures by one-third, raises end-to-end accuracy more than 10 percent above prior systems, and more than doubles the Good Personalization rate.

What carries the argument

Hierarchical stage-wise process reward combined with reward-decomposed contrastive refinement, which isolates supervision signals for the three memory operations inside one model.

If this is right

Memory-attributed failures drop by roughly one-third compared with prior memory agents.
End-to-end task accuracy rises by more than 10 percent over existing state-of-the-art systems.
The rate of good personalization more than doubles.
A single model can replace separate modules for the distinct memory stages while still receiving usable learning signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition approach could be tested on other multi-stage language-model pipelines that suffer credit assignment problems.
If the isolated signals prove robust, agent designs might shift away from modular memory components toward unified models trained with staged rewards.
The method suggests a route for applying process supervision to any sequential decision task inside large language models.

Load-bearing premise

The stage-wise rewards and contrastive refinement actually give separate, non-confounded training signals for filtering, consolidation, and recall.

What would settle it

A controlled run in which the performance gains vanish when the reward decomposition is removed and replaced by a single end-to-end reward.

read the original abstract

Conversational agents that serve as lifelong companions must maintain persistent memory across all interactions. However, simply expanding context windows with raw retrieval degrades reasoning quality, while training memory agents via standard reinforcement learning creates a severe credit assignment bottleneck in a multi-stage pipeline. To solve this, we introduce SALIMORY, a framework that trains a single language model to manage a cognitively-structured memory-spanning user facts, preferences, and working memory. By introducing a hierarchical stage-wise process reward and reward-decomposed contrastive refinement, SALIMORY provides isolated supervision for distinct memory operations (selective filtering, consolidation, and cue-driven recall) end-to-end. SALIMORY cuts memory-attributed failures by one-third, outperforms the state-of-the-art by over 10% in end-to-end accuracy, and more than doubles the Good Personalization rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SALIMORY proposes staged rewards to isolate memory operations in chat agents, but the abstract gives no way to check if that isolation actually happens.

read the letter

The core idea is a single LM trained with hierarchical stage-wise process rewards plus reward-decomposed contrastive refinement so that filtering, consolidation, and recall each get their own supervision signal.

That combination is the main new piece. It directly targets the credit-assignment mess that comes from treating memory as one long RL pipeline.

The framing of the problem is clear and the proposed decomposition makes sense on paper. Persistent memory without context bloat is a practical pain point, and breaking the reward by stage is a reasonable attempt to fix it.

The problem is the abstract supplies none of the usual checks. No datasets, no baselines, no ablation tables, no per-stage accuracy numbers. The claimed one-third drop in memory failures and 10%+ accuracy lift therefore cannot be traced to the isolation mechanism rather than ordinary RL stabilization. The stress-test concern stands: without those disaggregated results it is impossible to know whether the rewards leak across stages or introduce new artifacts.

This is aimed at people already working on memory-augmented conversational systems. They might pick up the reward design if the full paper shows the controls.

Send it to review so the experiments can be examined properly; the abstract alone does not justify the performance claims.

Referee Report

2 major / 1 minor

Summary. The manuscript presents SALIMORY, a framework for training a single language model to manage cognitively-structured memory (user facts, preferences, working memory) in conversational agents. It introduces a hierarchical stage-wise process reward and reward-decomposed contrastive refinement to supply isolated supervision for distinct operations (selective filtering, consolidation, cue-driven recall) in an end-to-end manner, claiming this resolves credit-assignment issues in multi-stage RL pipelines. The central empirical claims are a one-third reduction in memory-attributed failures, over 10% improvement in end-to-end accuracy over state-of-the-art, and more than doubling the Good Personalization rate.

Significance. If the isolation of supervision is demonstrated and the quantitative gains are reproducible, the work would meaningfully advance reliable long-term memory management for conversational agents by decomposing RL signals across cognitive stages. The emphasis on operation-specific gradients without leakage addresses a recognized bottleneck in agent memory systems and could influence subsequent RLHF-style training for structured memory.

major comments (2)

[Abstract] Abstract: the performance claims (one-third failure reduction, >10% end-to-end accuracy lift, doubled Good Personalization rate) are presented without any description of datasets, baselines, statistical tests, ablation results, or per-operation metrics. This absence is load-bearing because the central thesis—that the two reward mechanisms deliver non-confounded, isolated supervision for filtering/consolidation/recall—cannot be evaluated from the given text.
[Abstract] Abstract: no equations, pseudocode, or algorithmic specification is supplied for the 'hierarchical stage-wise process reward' or 'reward-decomposed contrastive refinement.' Without such formalization or accompanying ablation tables showing that a change to the filtering reward affects only filtering accuracy (and not consolidation or recall), the claim of operation-specific gradients remains unverified and central to the credit-assignment solution.

minor comments (1)

[Abstract] Abstract: the metric 'Good Personalization rate' is introduced without definition or reference, which hinders immediate assessment of the reported doubling.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects of how the abstract presents our contributions. We address each major comment point by point below, clarifying the role of the abstract versus the full manuscript and indicating where revisions will be made.

read point-by-point responses

Referee: [Abstract] Abstract: the performance claims (one-third failure reduction, >10% end-to-end accuracy lift, doubled Good Personalization rate) are presented without any description of datasets, baselines, statistical tests, ablation results, or per-operation metrics. This absence is load-bearing because the central thesis—that the two reward mechanisms deliver non-confounded, isolated supervision for filtering/consolidation/recall—cannot be evaluated from the given text.

Authors: The abstract is intentionally concise to summarize the core problem, solution, and headline results within typical length limits. The full manuscript provides all requested details: Section 4 describes the datasets and conversational benchmarks used, the state-of-the-art baselines, and the statistical tests applied; Section 5 presents ablation results on the two reward mechanisms and per-operation metrics for selective filtering, consolidation, and cue-driven recall. These sections directly support the central thesis by showing non-confounded, isolated supervision effects. To address the referee's concern about self-contained evaluation from the abstract, we will revise the abstract to include a brief clause referencing the evaluation framework and the supporting per-operation metrics. revision: yes
Referee: [Abstract] Abstract: no equations, pseudocode, or algorithmic specification is supplied for the 'hierarchical stage-wise process reward' or 'reward-decomposed contrastive refinement.' Without such formalization or accompanying ablation tables showing that a change to the filtering reward affects only filtering accuracy (and not consolidation or recall), the claim of operation-specific gradients remains unverified and central to the credit-assignment solution.

Authors: Abstracts in this field standardly omit equations, pseudocode, and full algorithmic specifications to preserve readability; these are supplied in the main text (Section 3), which includes the formal definitions, equations, and pseudocode for the hierarchical stage-wise process reward and reward-decomposed contrastive refinement. The Experiments section (Section 5) contains the ablation tables demonstrating that modifying the filtering reward impacts only filtering accuracy without leakage to consolidation or recall, thereby verifying operation-specific gradients and resolving the credit-assignment bottleneck. We therefore disagree that the claim remains unverified, as the full manuscript supplies the required formalization and evidence. We will, however, add a short high-level phrase in the revised abstract to reference the reward design. revision: partial

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The abstract and description contain no equations, derivations, or first-principles claims that reduce to inputs by construction. Claims rest on empirical performance lifts from a new framework (hierarchical stage-wise process reward and reward-decomposed contrastive refinement) without any self-definitional mappings, fitted parameters renamed as predictions, or load-bearing self-citations. No mathematical chain exists to inspect for equivalence to inputs, making the result self-contained against external benchmarks as an empirical proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, background axioms, or newly postulated entities.

pith-pipeline@v0.9.1-grok · 5714 in / 1123 out tokens · 28359 ms · 2026-06-28T10:07:18.364561+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 19 canonical work pages · 12 internal anchors

[1]

Keep me updated! memory management in long-term conversations

Sanghwan Bae, Donghyun Kwak, Soyoung Kang, Myoung-Wan Lee, Sungdong Kim, Yun Jeong, Hyungjoo Kim, Eunho Lee, and Jungwoo Seo. Keep me updated! memory management in long-term conversations. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 3769–3781,

2022
[2]

Fireact: Toward language agent fine-tuning.arXiv preprint arXiv:2310.05915,

Baian Chen, Chang Shu, Ehsan Shareghi, Nigel Collier, Karthik Narasimhan, and Shunyu Yao. Fireact: Toward language agent fine-tuning.arXiv preprint arXiv:2310.05915,

work page arXiv
[3]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Aditya Khant, and Deshraj Yadav. Mem0: Building production-ready AI agents with scalable long-term memory.arXiv preprint arXiv:2504.19413,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, et al. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning.arXiv preprint arXiv:2501.12948,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Memory-qa: Answering recall questions based on multimodal memories

Hongda Jiang, Xinyuan Zhang, Siddhant Garg, Rishab Arora, Shiun-Zu Kuo, Jiayang Xu, Aaron Colak, and Xin Luna Dong. Memory-qa: Answering recall questions based on multimodal memories. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 24255–24277,

2025
[7]

Memory os of ai agent

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. Memory os of ai agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25972–25981,

2025
[8]

A human-inspired reading agent with gist memory of very long contexts.arXiv preprint arXiv:2402.09727,

Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, and Ian Fischer. A human-inspired reading agent with gist memory of very long contexts.arXiv preprint arXiv:2402.09727,

work page arXiv
[9]

Hello again! llm-powered personalized agent for long-term dialogue

Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, and Tat-Seng Chua. Hello again! llm-powered personalized agent for long-term dialogue. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5259–5276,

2025
[10]

LD-Agent: Longitudinal dialogue agent with self-evolving memory.arXiv preprint arXiv:2406.18484,

Jiying Li et al. LD-Agent: Longitudinal dialogue agent with self-evolving memory.arXiv preprint arXiv:2406.18484,

work page arXiv
[11]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. Memgpt: towards llms as operating systems.arXiv preprint arXiv:2310.08560,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. DeepSeekMath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

arXiv preprint arXiv:2603.08754 , year=

Hui-Ze Tan, Xiao-Wen Yang, Hao Chen, Jie-Jing Shao, Yi Wen, Yuteng Shen, Weihong Luo, Xiku Du, Lan-Zhe Guo, and Yu-Feng Li. Hindsight credit assignment for long-horizon llm agents.arXiv preprint arXiv:2603.08754,

work page arXiv
[14]

Qwen Team. Qwen3. 5-omni technical report.arXiv preprint arXiv:2604.15804,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Self-updatable large language models by integrating context into model parameters.arXiv preprint arXiv:2410.00487,

Yu Wang, Xinshuang Liu, Xiusi Chen, Sean O’Brien, Junda Wu, and Julian McAuley. Self-updatable large language models by integrating context into model parameters.arXiv preprint arXiv:2410.00487,

work page arXiv
[16]

Mem-{\alpha}: Learning Memory Construction via Reinforcement Learning

Yuxin Wang, Ryuichi Takanobu, and Minlie Huang. Mem-α: Learning memory construction via reinforcement learning. arXiv preprint arXiv:2509.25911,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Longmemeval: Benchmarking chat assistants on long-term interactive memory.arXiv preprint arXiv:2410.10813,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Towards multi-granularity memory association and selection for long-term conversational agents.arXiv preprint arXiv:2505.19549, 2025a

Derong Xu, Yi Wen, Pengyue Jia, Yingyi Zhang, Yichao Wang, Huifeng Guo, Ruiming Tang, Xiangyu Zhao, Enhong Chen, Tong Xu, et al. Towards multi-granularity memory association and selection for long-term conversational agents.arXiv preprint arXiv:2505.19549, 2025a. Junfeng Xu, Yuxiang Liang, and Qiaozhu Mei. A-Mem: Agentic memory for LLM agents.arXiv prepri...

work page arXiv 2022
[19]

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Ming Yan, Yiming Yang, and Qipeng Huang. Memory-R1: Enhancing large language model agents to manage and utilize memories via reinforcement learning.arXiv preprint arXiv:2508.19828,

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

arXiv preprint arXiv:2509.24704 , year=

Guibin Zhang, Muxin Fu, and Shuicheng Yan. Memgen: Weaving generative latent memory for self-evolving agents. arXiv preprint arXiv:2509.24704,

work page arXiv
[22]

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

Kai Zhang, Xinyuan Zhang, Ejaz Ahmed, Hongda Jiang, Caleb Kumar, Kai Sun, Zhaojiang Lin, Sanat Sharma, Shereen Oraby, Aaron Colak, et al. Assomem: Scalable memory qa with multi-signal associative retrieval. InThe Fourteenth International Conference on Learning Representations, 2026a. Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhu...

work page internal anchor Pith review Pith/arXiv arXiv
[23]

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents. arXiv preprint arXiv:2506.15841,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Keep me updated! memory management in long-term conversations

Sanghwan Bae, Donghyun Kwak, Soyoung Kang, Myoung-Wan Lee, Sungdong Kim, Yun Jeong, Hyungjoo Kim, Eunho Lee, and Jungwoo Seo. Keep me updated! memory management in long-term conversations. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 3769–3781,

2022

[2] [2]

Fireact: Toward language agent fine-tuning.arXiv preprint arXiv:2310.05915,

Baian Chen, Chang Shu, Ehsan Shareghi, Nigel Collier, Karthik Narasimhan, and Shunyu Yao. Fireact: Toward language agent fine-tuning.arXiv preprint arXiv:2310.05915,

work page arXiv

[3] [3]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Aditya Khant, and Deshraj Yadav. Mem0: Building production-ready AI agents with scalable long-term memory.arXiv preprint arXiv:2504.19413,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, et al. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning.arXiv preprint arXiv:2501.12948,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Memory-qa: Answering recall questions based on multimodal memories

Hongda Jiang, Xinyuan Zhang, Siddhant Garg, Rishab Arora, Shiun-Zu Kuo, Jiayang Xu, Aaron Colak, and Xin Luna Dong. Memory-qa: Answering recall questions based on multimodal memories. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 24255–24277,

2025

[7] [7]

Memory os of ai agent

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. Memory os of ai agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25972–25981,

2025

[8] [8]

A human-inspired reading agent with gist memory of very long contexts.arXiv preprint arXiv:2402.09727,

Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, and Ian Fischer. A human-inspired reading agent with gist memory of very long contexts.arXiv preprint arXiv:2402.09727,

work page arXiv

[9] [9]

Hello again! llm-powered personalized agent for long-term dialogue

Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, and Tat-Seng Chua. Hello again! llm-powered personalized agent for long-term dialogue. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5259–5276,

2025

[10] [10]

LD-Agent: Longitudinal dialogue agent with self-evolving memory.arXiv preprint arXiv:2406.18484,

Jiying Li et al. LD-Agent: Longitudinal dialogue agent with self-evolving memory.arXiv preprint arXiv:2406.18484,

work page arXiv

[11] [11]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. Memgpt: towards llms as operating systems.arXiv preprint arXiv:2310.08560,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. DeepSeekMath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

arXiv preprint arXiv:2603.08754 , year=

Hui-Ze Tan, Xiao-Wen Yang, Hao Chen, Jie-Jing Shao, Yi Wen, Yuteng Shen, Weihong Luo, Xiku Du, Lan-Zhe Guo, and Yu-Feng Li. Hindsight credit assignment for long-horizon llm agents.arXiv preprint arXiv:2603.08754,

work page arXiv

[14] [14]

Qwen Team. Qwen3. 5-omni technical report.arXiv preprint arXiv:2604.15804,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Self-updatable large language models by integrating context into model parameters.arXiv preprint arXiv:2410.00487,

Yu Wang, Xinshuang Liu, Xiusi Chen, Sean O’Brien, Junda Wu, and Julian McAuley. Self-updatable large language models by integrating context into model parameters.arXiv preprint arXiv:2410.00487,

work page arXiv

[16] [16]

Mem-{\alpha}: Learning Memory Construction via Reinforcement Learning

Yuxin Wang, Ryuichi Takanobu, and Minlie Huang. Mem-α: Learning memory construction via reinforcement learning. arXiv preprint arXiv:2509.25911,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Longmemeval: Benchmarking chat assistants on long-term interactive memory.arXiv preprint arXiv:2410.10813,

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Towards multi-granularity memory association and selection for long-term conversational agents.arXiv preprint arXiv:2505.19549, 2025a

Derong Xu, Yi Wen, Pengyue Jia, Yingyi Zhang, Yichao Wang, Huifeng Guo, Ruiming Tang, Xiangyu Zhao, Enhong Chen, Tong Xu, et al. Towards multi-granularity memory association and selection for long-term conversational agents.arXiv preprint arXiv:2505.19549, 2025a. Junfeng Xu, Yuxiang Liang, and Qiaozhu Mei. A-Mem: Agentic memory for LLM agents.arXiv prepri...

work page arXiv 2022

[19] [19]

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Ming Yan, Yiming Yang, and Qipeng Huang. Memory-R1: Enhancing large language model agents to manage and utilize memories via reinforcement learning.arXiv preprint arXiv:2508.19828,

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

arXiv preprint arXiv:2509.24704 , year=

Guibin Zhang, Muxin Fu, and Shuicheng Yan. Memgen: Weaving generative latent memory for self-evolving agents. arXiv preprint arXiv:2509.24704,

work page arXiv

[22] [22]

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

Kai Zhang, Xinyuan Zhang, Ejaz Ahmed, Hongda Jiang, Caleb Kumar, Kai Sun, Zhaojiang Lin, Sanat Sharma, Shereen Oraby, Aaron Colak, et al. Assomem: Scalable memory qa with multi-signal associative retrieval. InThe Fourteenth International Conference on Learning Representations, 2026a. Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhu...

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents. arXiv preprint arXiv:2506.15841,

work page internal anchor Pith review Pith/arXiv arXiv