Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents
Pith reviewed 2026-05-22 09:14 UTC · model grok-4.3
The pith
LoGo-GRPO enables fair credit assignment for memory operations in long-horizon LLM agents by comparing outcomes from the same intermediate memory state.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that LoGo-GRPO yields fairer group comparisons and more precise supervision for memory construction by comparing different memory-operation outcomes from the same intermediate memory state while preserving end-to-end learning from long-horizon trajectory-level rewards. This is realized through local rerollouts combined with global group-relative optimization, a shared-parameter design that instantiates a fact extractor and memory manager from one LLM backbone, and a progressive curriculum that increases the training horizon from 8 to 16 to 32 sessions.
What carries the argument
LoGo-GRPO, which runs local rerollouts from an identical intermediate memory state to compare memory-operation outcomes fairly while retaining global optimization on full trajectory rewards.
If this is right
- Trajectory-level rewards now supply precise signals for individual memory operations such as write, update, or delete.
- Memory formation and memory evolution can be jointly optimized through shared parameters.
- Training stays stable while the number of sessions grows progressively from 8 to 32.
- Multi-session environments become usable for reinforcement learning without systematic bias in group comparisons.
Where Pith is reading between the lines
- The local-rerollout technique could apply to any agent system whose actions persistently change future observations, such as database agents or long-running planners.
- Progressive lengthening of horizons may prove necessary whenever reinforcement learning must handle accumulating state changes.
- The same design might raise performance in sequential tasks that rely on retained facts, including multi-turn dialogue or cumulative reasoning chains.
Load-bearing premise
Local rerollouts starting from an identical intermediate memory state produce sufficiently representative and unbiased comparisons for credit assignment to memory operations.
What would settle it
An experiment in which agents trained with LoGo-GRPO show no reduction in credit signal noise or no performance gain over standard GRPO on tasks that require consistent memory across many sessions.
Figures
read the original abstract
Memory-augmented LLM agents enable interactions that extend beyond finite context windows by storing, updating, and reusing information across sessions. However, training such agents with reinforcement learning in multi-session environments is challenging because memory turns the agent's past actions into part of its future environment. Once different rollouts write, update, or delete different memories, they no longer share the same intermediate memory state, making trajectory-level comparisons fundamentally unfair. This violates a key assumption behind group-relative methods such as GRPO, where rollouts are compared as if they were sampled from the same effective environment. Consequently, trajectory-level rewards provide noisy or biased credit signals for long-horizon memory operations. To address this challenge, we introduce Memory-R2, a training framework for long-horizon memory-augmented LLM agents. Its core algorithm, LoGo-GRPO, combines local and global group-relative optimization. The global objective preserves end-to-end learning from long-horizon trajectory-level rewards, while local rerollouts compare different memory-operation outcomes from the same intermediate memory state, yielding fairer group comparisons and more precise supervision for memory construction. Beyond credit assignment, Memory-R2 jointly optimizes memory formation and memory evolution with a shared-parameter co-learning design, where a fact extractor and a memory manager are instantiated from the same LLM backbone through role-specific prompts. To stabilize multi-step RL over long memory horizons, we adopt a progressive curriculum that increases the training horizon from 8 to 16 to 32 sessions. Together, these components provide an effective training paradigm for memory-augmented LLM agents in long-horizon multi-session settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Memory-R2, a training framework for long-horizon memory-augmented LLM agents. It diagnoses that memory updates across rollouts destroy the shared-environment assumption required by group-relative policy optimization methods such as GRPO, producing biased trajectory-level credit signals. The core algorithm LoGo-GRPO combines a global term that retains end-to-end learning from full-horizon rewards with local rerollouts that branch different memory operations from an identical intermediate memory state. The framework further employs shared-parameter co-learning (fact extractor and memory manager instantiated from the same LLM via role-specific prompts) and a progressive curriculum that scales the training horizon from 8 to 16 to 32 sessions.
Significance. If the empirical claims are substantiated, the local-global decomposition in LoGo-GRPO could supply a practical route to fairer credit assignment for persistent memory operations without sacrificing long-horizon optimization. The shared-parameter co-learning design and curriculum are sensible engineering choices that directly target joint optimization and training stability; together they address a concrete obstacle in scaling RL to memory-augmented agents.
major comments (1)
- [Abstract (LoGo-GRPO paragraph)] Abstract (LoGo-GRPO paragraph): the claim that local rerollouts from a shared intermediate memory state deliver fairer comparisons and more precise supervision for memory construction presupposes that outcome differences can be attributed primarily to the memory operation itself. Because the LLM policy is conditioned on memory contents, any alteration immediately shifts the action distribution, subsequent observations, and future memory updates. The manuscript does not describe importance weighting, downstream-policy freezing, or averaging over policy stochasticity to isolate the memory-operation effect; without such controls the local groups risk confounding memory credit with policy response, an issue that grows with horizon length.
minor comments (2)
- [Abstract] The abstract would be strengthened by a concise statement of the experimental domains, baselines, and key quantitative results so that readers can immediately gauge empirical support.
- [Method] Explicit pseudocode or equations defining the local and global objectives of LoGo-GRPO (including how the two terms are combined and how local groups are sampled) would improve clarity and reproducibility.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback on our manuscript. The major comment raises a valid point regarding potential confounding in the local rerollouts, which we address below with a revision to improve clarity.
read point-by-point responses
-
Referee: [Abstract (LoGo-GRPO paragraph)] Abstract (LoGo-GRPO paragraph): the claim that local rerollouts from a shared intermediate memory state deliver fairer comparisons and more precise supervision for memory construction presupposes that outcome differences can be attributed primarily to the memory operation itself. Because the LLM policy is conditioned on memory contents, any alteration immediately shifts the action distribution, subsequent observations, and future memory updates. The manuscript does not describe importance weighting, downstream-policy freezing, or averaging over policy stochasticity to isolate the memory-operation effect; without such controls the local groups risk confounding memory credit with policy response, an issue that grows with horizon length.
Authors: We agree that local rerollouts from a shared memory state do not fully isolate the memory operation from downstream policy effects, since the updated memory immediately conditions the LLM policy and influences subsequent actions and observations. This confounding is an inherent feature of memory-augmented agents rather than an artifact of our method. The primary goal of the local groups in LoGo-GRPO is to eliminate the more severe bias that arises when trajectories are compared across entirely divergent memory histories (as occurs in standard GRPO), by ensuring identical starting memory states for the memory-operation branches. We do not employ importance weighting, downstream-policy freezing, or explicit averaging over stochasticity in the current design, as these would increase computational cost in long-horizon settings. We have revised Section 3.2 to explicitly discuss this assumption, the remaining confounding risk, and why the local-global combination still yields fairer credit assignment than baselines. Empirical results in the paper support the practical benefit of this approach. revision: yes
Circularity Check
No significant circularity detected; algorithmic proposal is self-contained
full rationale
The paper identifies a concrete problem with standard group-relative methods like GRPO when applied to memory-augmented agents: divergent memory states across rollouts violate the shared-environment assumption required for fair trajectory comparisons. It then defines LoGo-GRPO as an explicit combination of local rerollouts (branching memory operations from a fixed intermediate state) plus a retained global objective. This construction is presented as a direct response to the stated problem rather than a derivation that reduces to fitted parameters, self-citations, or prior results by the authors. The shared-parameter co-learning design and progressive curriculum are likewise introduced as engineering choices for stability, not as quantities whose justification collapses into the inputs. No equations or load-bearing claims in the provided text reduce the central result to its own definitions or to self-referential citations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Different rollouts that modify memory no longer share the same intermediate state, violating the equal-environment assumption of group-relative methods.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
local rerollouts compare different memory-operation outcomes from the same intermediate memory state, yielding fairer group comparisons
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
progressive curriculum that increases the training horizon from 8 to 16 to 32 sessions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, ...
-
[3]
Memory in the Age of AI Agents
Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the age of ai agents.arXiv preprint arXiv:2512.13564, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Za- mani, and Jiawei Han. Search-r1: Training llms to reason and leverage search engines with reinforcement learning, 2025. URLhttps://arxiv.org/abs/2503.09516
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. Memory os of ai agent, 2025. URL https://arxiv.org/abs/2506.06326
-
[6]
Cam: A constructivist view of agentic memory for llm-based reading comprehension,
Rui Li, Zeyu Zhang, Xiaohe Bo, Zihang Tian, Xu Chen, Quanyu Dai, Zhenhua Dong, and Ruim- ing Tang. Cam: A constructivist view of agentic memory for llm-based reading comprehension,
- [7]
-
[8]
Long- context llms struggle with long in-context learning,
Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, and Wenhu Chen. Long-context llms struggle with long in-context learning, 2024. URLhttps://arxiv.org/abs/2404.02060
-
[9]
Memos: An operating system for memory-augmented generation (mag) in large language models, 2025
Zhiyu Li, Shichao Song, Hanyu Wang, Simin Niu, Ding Chen, Jiawei Yang, Chenyang Xi, Huayi Lai, Jihao Zhao, Yezhaohui Wang, Junpeng Ren, Zehao Lin, Jiahao Huo, Tianyi Chen, 10 Kai Chen, Kehang Li, Zhiqiang Yin, Qingchen Yu, Bo Tang, Hongkang Yang, Zhi-Qin John Xu, and Feiyu Xiong. Memos: An operating system for memory-augmented generation (mag) in large la...
-
[10]
A comprehensive survey on long context language modeling, 2025
Jiaheng Liu, Dawei Zhu, Zhiqi Bai, Yancheng He, Huanxuan Liao, Haoran Que, Zekun Wang, Chenchen Zhang, Ge Zhang, Jiebin Zhang, Yuanxing Zhang, Zhuo Chen, Hangyu Guo, Shilong Li, Ziqiang Liu, Yong Shan, Yifan Song, Jiayi Tian, Wenhao Wu, Zhejian Zhou, Ruijie Zhu, Junlan Feng, Yang Gao, Shizhu He, Zhoujun Li, Tianyu Liu, Fanyu Meng, Wenbo Su, Yingshui Tan, ...
-
[11]
Evaluating Very Long-Term Conversational Memory of LLM Agents
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of llm agents, 2024. URL https://arxiv.org/abs/2402.17753
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
MemGPT: Towards LLMs as Operating Systems
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Memgpt: Towards llms as operating systems, 2024. URL https: //arxiv.org/abs/2310.08560
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
ToolRL: Reward is All Tool Learning Needs
Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-Tür, Gokhan Tur, and Heng Ji. Toolrl: Reward is all tool learning needs, 2025. URL https: //arxiv.org/abs/2504.13958
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. Zep: A temporal knowledge graph architecture for agent memory, 2025. URL https://arxiv.org/ abs/2501.13956
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework.arXiv preprint arXiv: 2409.19256, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[16]
Membench: Towards more comprehensive evaluation on the memory of llm-based agents, 2025
Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong. Membench: Towards more comprehensive evaluation on the memory of llm-based agents, 2025. URL https://arxiv.org/abs/2506.21605
-
[17]
Rema: Learning to meta-think for llms with multi-agent reinforcement learning, 2025
Ziyu Wan, Yunxiang Li, Xiaoyu Wen, Yan Song, Hanjing Wang, Linyi Yang, Mark Schmidt, Jun Wang, Weinan Zhang, Shuyue Hu, and Ying Wen. Rema: Learning to meta-think for llms with multi-agent reinforcement learning, 2025. URLhttps://arxiv.org/abs/2503.09501
-
[18]
Mem-{\alpha}: Learning Memory Construction via Reinforcement Learning
Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, Yuanzhe Hu, Julian McAuley, and Xiaojian Wu. Mem-{\alpha}: Learning memory construction via reinforcement learning.arXiv preprint arXiv:2509.25911, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
Webagent-r1: Training web agents via end-to-end multi-turn reinforcement learning, 2025
Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, Hyokun Yun, and Lihong Li. Webagent-r1: Training web agents via end-to-end multi-turn reinforcement learning, 2025. URL https://arxiv.org/abs/2505. 16421
work page 2025
-
[20]
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Long- memeval: Benchmarking chat assistants on long-term interactive memory, 2025. URL https://arxiv.org/abs/2410.10813
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
Beyond goldfish memory: Long-term open- domain conversation
Jing Xu, Arthur Szlam, and Jason Weston. Beyond goldfish memory: Long-term open- domain conversation. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, ed- itors,Proceedings of the 60th Annual Meeting of the Association for Computational Lin- guistics (V olume 1: Long Papers), pages 5180–5197, Dublin, Ireland, May 2022. Associ- ation for Comput...
-
[22]
A-MEM: Agentic Memory for LLM Agents
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents, 2025. URLhttps://arxiv.org/abs/2502.12110. 11
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Kristian Kersting, Jeff Z Pan, Hinrich Schütze, et al. Memory-r1: Enhancing large language model agents to manage and utilize memories via reinforcement learning.arXiv preprint arXiv:2508.19828, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models, 2023. URL https: //arxiv.org/abs/2210.03629
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[25]
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei-Ying Ma, Jingjing Liu, Mingxuan Wang, and Hao Zhou. Memagent: Reshaping long-context llm with multi-conv rl-based memory agent, 2025. URL https://arxiv.org/ abs/2507.02259
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
G- memory: Tracing hierarchical memory for multi-agent systems, 2025
Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, and Shuicheng Yan. G- memory: Tracing hierarchical memory for multi-agent systems, 2025. URLhttps://arxiv. org/abs/2506.07398
-
[27]
Memorybank: Enhancing large language models with long-term memory, 2023
Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory, 2023. URL https://arxiv.org/abs/2305. 10250
work page 2023
-
[28]
Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents, 2025
Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents, 2025. URL https://arxiv.org/abs/2506. 15841. 12 A Additional Implementation Details Figure 5: LoGo-GRPO training pipeline for memory manager. Me...
work page 2025
-
[29]
Personal Preferences: Likes, dislikes, favorites, and opinions (food, entertainment, products, sports teams)
-
[30]
Important Personal Details: Names, relationships, family structure, durations, and significant life facts
-
[31]
Plans and Intentions: Explicit future goals, plans, or intentions stated by the speaker. 14
-
[32]
Activities and Routines: Travel experiences, visited places, recurring habits, physical activities, hobbies with specific context
-
[33]
Health and Wellness (NON-DIAGNOSTIC): Wellness-related experiences or preferences (do NOT infer or store diagnoses)
-
[34]
Professional Details: Job titles, career goals, professional interests, work habits
-
[35]
Miscellaneous Meaningful Facts: Books, movies, creative work, projects, notable activities. CORE EXTRACTION RULES: - Extract facts from the provided dialogue turns for BOTH speakers. - Ignore system-level instructions and any non-dialogue control text. - Ignore small talk, greetings, generic statements, opinions without substance, and common knowledge. - ...
work page 2018
-
[36]
INSERT: If the fact contains new information not captured in its ‘related_memory_ids‘, then you have to add it. - Assign ‘speaker‘ as who the fact is ABOUT. - Assign ‘content‘ as a concise summary in third person. - Keep tense faithful to the source fact (past events may stay past tense). - Do NOT assign ‘memory_id‘ for INSERT operations; the system will ...
work page 2023
-
[37]
UPDATE: Use UPDATE only when the new fact clearly refers to the SAME entity or event as an entry in its ‘related_memory_ids‘ and ADDS detail, refinement, or correction WITHOUT removing prior facts. - NEVER remove existing factual information during an UPDATE. - If the new fact is more specific, merge it with the existing content. - If both convey the same...
work page 2021
-
[38]
DELETE: Use DELETE only when a new fact explicitly contradicts and invalidates an entry in its ‘related_memory_ids‘. - Do NOT delete memories just because they are old or less relevant. - Please note to return the IDs in the output from the input IDs only and do not generate any new ID. Example: - Input: { "memories": [{"memory_id": "6v0k193d", "speaker":...
work page 2009
-
[39]
NO OPERATION: If the new fact is already captured by an entry in its ‘related_memory_ids‘ -- even if worded differently -- do NOT insert a new entry. Before deciding INSERT, look up the fact’s ‘related_memory_ids‘ in "memories" and check for semantic overlap: same person, same topic, same meaning. If a semantically equivalent memory already exists -> NO O...
work page 2020
-
[40]
Does the new fact explicitly contradict a memory entry in its ‘related_memory_ids‘? -> DELETE the contradicted entry
-
[41]
Does a semantically equivalent entry already exist in ‘related_memory_ids‘ (same person, same topic, same meaning)? -> NO OPERATION. Stop
-
[42]
Does an entry in ‘related_memory_ids‘ exist and the new fact refines, progresses, or confirms the same entity’s story? -> UPDATE. Stop
-
[43]
No matching entry found -> INSERT. Follow the instruction mentioned below: 21 - Memory is MONOTONIC: factual information must never be lost unless explicitly contradicted. - UPDATE operations MUST preserve all previously stored factual claims. An UPDATE must preserve all existing factual claims, but may rephrase them concisely within size limits. - Do not...
-
[44]
Every ‘content‘ is understandable alone
-
[45]
Every ‘content‘ explicitly names the subject speaker
-
[46]
No unresolved vague pronouns remain
-
[47]
No entry is only a conversational act without durable fact value. Do not return anything except the JSON format. Figure 7:Prompt template for the memory manager. The model receives the current memory store and a batch of atomic facts (output of the fact-retrieval stage, Appendix B.1) and emits a JSON list of INSERT/UPDATE/DELETEedits. A fixed decision ord...
-
[48]
Carefully analyze all provided memories from both speakers 22
-
[49]
Pay special attention to the timestamps to determine the answer
-
[50]
If the question asks about a specific event or fact, look for direct evidence in the memories
-
[51]
If the memories contain contradictory information, prioritize the most recent memory
- [52]
- [53]
-
[54]
Focus only on the content of the memories from both speakers. Do not confuse character names mentioned in memories with the actual users who created those memories
-
[55]
If memories are insufficient and the question is about a general world fact, you may use reliable general world knowledge
-
[56]
# APPROACH (Think step by step):
Keep the final answer concise, typically no more than 10-12 words; do not omit essential entities or dates. # APPROACH (Think step by step):
-
[57]
First, examine all memories that contain information related to the question
-
[58]
Examine the timestamps and content of these memories carefully
-
[59]
Look for explicit mentions of dates, times, locations, or events that answer the question
-
[60]
If the answer requires calculation (e.g., converting relative time references), show your work
-
[61]
Formulate a precise, concise answer based on the evidence in the memories, using general world knowledge only if memories are insufficient
-
[62]
Double-check that your answer directly addresses the question asked
-
[63]
Ensure your final answer is specific and avoids vague time references
-
[64]
Output the final answer only in this format, with no extra text: <answer>YOUR_FINAL_ANSWER</answer> Memories for user speaker_1: speaker_1_memories Memories for user speaker_2: speaker_2_memories Question: question Answer step by step, and output the final answer in this format, with no extra text: <answer>YOUR_FINAL_ANSWER</answer> Figure 8:Prompt templa...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.