MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
Pith reviewed 2026-06-30 21:00 UTC · model grok-4.3
The pith
MemEye shows multimodal agents fail to preserve fine-grained visual details for state-change reasoning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MemEye measures memory along visual-evidence granularity and synthesis requirements; when applied to current architectures it shows they struggle to preserve pixel-level details and to reason about evolutionary state changes over time.
What carries the argument
MemEye framework, which scores memory by the granularity of decisive visual evidence (scene-level to pixel-level) and by the type of evidence synthesis required (single evidence to evolutionary synthesis).
If this is right
- Effective long-term multimodal memory requires explicit mechanisms for routing fine visual evidence.
- Temporal tracking of visual state changes must be strengthened in memory architectures.
- Detail extraction from stored visuals remains a primary performance bottleneck.
- Future benchmarks should adopt similar gates to block textual shortcuts.
Where Pith is reading between the lines
- Architectures that compress and index visual patches at multiple scales may reduce the observed detail loss.
- The same evaluation structure could be applied to video or embodied-agent tasks to test generalization.
- Training objectives that explicitly penalize loss of pixel-level information could be derived from the framework's axes.
Load-bearing premise
The ablation-driven validation gates in the benchmark correctly force questions to require stored visual evidence rather than allowing answers from captions or textual traces.
What would settle it
If agents achieve the same accuracy on the benchmark questions when visuals are withheld or replaced by captions alone, the claim that fine-grained visual preservation is necessary would be falsified.
read the original abstract
Long-term agent memory is increasingly multimodal, yet existing evaluations rarely test whether agents preserve the visual evidence needed for later reasoning. In prior work, many visually grounded questions can be answered using only captions or textual traces, allowing answers to be inferred without preserving the fine-grained visual evidence. Meanwhile, harder cases that require reasoning over changing visual states are largely absent. Therefore, we introduce MemEye, a framework that evaluates memory capabilities from two dimensions: one measures the granularity of decisive visual evidence (from scene-level to pixel-level evidence), and the other measures how retrieved evidence must be used (from single evidence to evolutionary synthesis). Under this framework, we construct a new benchmark across 8 life-scenario tasks, with ablation-driven validation gates for assessing answerability, shortcut resistance, visual necessity, and reasoning structure. By evaluating 13 memory methods across 4 VLM backbones, we show that current architectures still struggle to preserve fine-grained visual details and reason about state changes over time. Our findings show that long-term multimodal memory depends on evidence routing, temporal tracking, and detail extraction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MemEye, a visual-centric evaluation framework for multimodal agent memory. It defines two dimensions: granularity of decisive visual evidence (scene-level to pixel-level) and how retrieved evidence is used (single evidence to evolutionary synthesis). The authors construct a benchmark with 8 life-scenario tasks incorporating ablation-driven validation gates to ensure visual necessity and shortcut resistance. They evaluate 13 memory methods across 4 VLM backbones and conclude that current architectures struggle to preserve fine-grained visual details and reason about state changes over time.
Significance. If the benchmark's validation gates successfully enforce that questions require visual evidence rather than textual shortcuts, this work would provide a valuable new standard for assessing long-term memory in multimodal agents. It highlights specific architectural limitations in evidence routing, temporal tracking, and detail extraction, which could guide future research in the field.
major comments (2)
- [Abstract] Abstract: The abstract claims that the ablation-driven validation gates assess answerability, shortcut resistance, visual necessity, and reasoning structure, but supplies no quantitative pass rates, explicit ablation protocol details (such as caption-only, trace-only, or full removal conditions), or validation evidence. This is load-bearing for the central claim, as poor performance on fine-grained or evolutionary items could stem from incomplete resistance to textual shortcuts rather than true memory failure.
- [Evaluation (implied from abstract)] The results from evaluating 13 methods on 4 VLM backbones are presented without methodological details on data construction steps, how the memory methods are adapted, or the specific metrics used, making it impossible to assess the support for the findings.
minor comments (1)
- [Abstract] The phrasing 'life-scenario tasks' is vague; a more precise description of the 8 tasks would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for greater transparency in validation evidence and methodological details. We will revise the manuscript accordingly to strengthen these aspects.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract claims that the ablation-driven validation gates assess answerability, shortcut resistance, visual necessity, and reasoning structure, but supplies no quantitative pass rates, explicit ablation protocol details (such as caption-only, trace-only, or full removal conditions), or validation evidence. This is load-bearing for the central claim, as poor performance on fine-grained or evolutionary items could stem from incomplete resistance to textual shortcuts rather than true memory failure.
Authors: We agree that the abstract should provide quantitative validation evidence to support the central claims. In the revision, we will expand the abstract to include pass rates for the ablation conditions (caption-only, trace-only, and full removal) along with a concise description of the protocol. We will also ensure the main text includes the full validation results and evidence. revision: yes
-
Referee: [Evaluation (implied from abstract)] The results from evaluating 13 methods on 4 VLM backbones are presented without methodological details on data construction steps, how the memory methods are adapted, or the specific metrics used, making it impossible to assess the support for the findings.
Authors: We acknowledge that additional methodological details are required for reproducibility and assessment of the findings. In the revised manuscript, we will expand the relevant sections to describe the data construction steps in detail, how each of the 13 memory methods was adapted to the four VLM backbones, and the exact metrics employed. revision: yes
Circularity Check
No circularity: empirical benchmark evaluation with no self-referential derivations
full rationale
The paper presents an empirical evaluation framework and benchmark for multimodal agent memory. It constructs tasks, applies ablation-driven validation gates, and reports performance of 13 methods across 4 backbones. No equations, fitted parameters, predictions derived from inputs, or self-citation chains are present in the provided text. The central claims rest on external benchmark results rather than reducing to self-defined quantities or prior author work by construction. This matches the default expectation of no significant circularity for evaluation papers.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
How do Humans Process AI-generated Hallucination Contents: a Neuroimaging Study
EEG study of 27 participants reveals distinct neural patterns for AI-generated hallucinations, with misjudged ones failing to trigger standard fact verification pathways.
-
How do Humans Process AI-generated Hallucination Contents: a Neuroimaging Study
EEG study reveals distinct ERP patterns for AI hallucinations, with misjudged ones failing to trigger standard neurocognitive verification pathways.
Reference graph
Works this paper leans on
-
[1]
Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Mem-Gallery: Benchmarking multimodal long-term conversational memory for MLLM agents, 2026
Yuanchen Bei, Tianxin Wei, Xuying Ning, Yanjun Zhao, Zhining Liu, Xiao Lin, Yada Zhu, Hendrik Hamann, Jingrui He, and Hanghang Tong. Mem-gallery: Benchmarking multimodal long-term conver- sational memory for mllm agents.arXiv preprint arXiv:2601.03515, 2026
-
[3]
Visual long-term memory has a massive storage capacity for object details.Proceedings of the National Academy of Sciences, 105(38): 14325–14329, 2008
Timothy F Brady, Talia Konkle, George A Alvarez, and Aude Oliva. Visual long-term memory has a massive storage capacity for object details.Proceedings of the National Academy of Sciences, 105(38): 14325–14329, 2008
2008
-
[4]
Yurun Chen, Xavier Hu, Yuhan Liu, Keting Yin, Juncheng Li, Zhuosheng Zhang, and Shengyu Zhang. Harmonyguard: Toward safety and utility in web agents via adaptive policy enhancement and dual- objective optimization, 2025. URLhttps://arxiv.org/abs/2508.04010
-
[5]
Evaluating the robustness of multimodal agents against active environmental injection attacks
Yurun Chen, Xueyu Hu, Keting Yin, Juncheng Li, and Shengyu Zhang. Evaluating the robustness of multimodal agents against active environmental injection attacks. InProceedings of the 33rd ACM International Conference on Multimedia, MM ’25, page 11648–11656, New York, NY, USA, 2025. Association for Computing Machinery. ISBN 9798400720352. doi: 10.1145/37460...
-
[6]
Safepred: A predictive guardrail for computer-using agents via world models, 2026
Yurun Chen, Zeyi Liao, Ping Yin, Taotao Xie, Keting Yin, and Shengyu Zhang. Safepred: A predictive guardrail for computer-using agents via world models, 2026. URLhttps://arxiv.org/abs/2602 .01725
2026
-
[7]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Moura, Devi Parikh, and Dhruv Batra
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M.F. Moura, Devi Parikh, and Dhruv Batra. Visual Dialog. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
2017
-
[9]
Bangde Du, Minghao Guo, Songming He, Ziyi Ye, Xi Zhu, Weihang Su, Shuqi Zhu, Yujia Zhou, Yongfeng Zhang, Qingyao Ai, and Yiqun Liu. Twinvoice: A multi-dimensional benchmark towards digital twins via llm persona simulation.arXiv preprint arXiv:2510.25536, 2025
-
[10]
Junyu Feng, Binxiao Xu, Jiayi Chen, Mengyu Dai, Cenyang Wu, Haodong Li, Bohan Zeng, Yunliu Xie, Hao Liang, Ming Lu, and Wentao Zhang. M2a: Multimodal memory agent with dual-layer hybrid memory for long-term personalized interactions.arXiv preprint arXiv:2602.07624, 2026
-
[11]
Geminiapimodeldocumentation
Google. Geminiapimodeldocumentation. https://ai.google.dev/gemini-api/docs/models,
- [13]
-
[14]
DeepSieve: Information sieving via LLM-as-a-knowledge-router
Minghao Guo, Qingcheng Zeng, Xujiang Zhao, Yanchi Liu, Wenchao Yu, Mengnan Du, Haifeng Chen, and Wei Cheng. DeepSieve: Information sieving via LLM-as-a-knowledge-router. In Vera Demberg, Kentaro Inui, and Lluís Marquez, editors,Findings of the Association for Computational Linguistics: EACL 2026, pages 3054–3077, Rabat, Morocco, March 2026. Association fo...
-
[15]
Evaluating memory in LLM agents via incremental multi- turn interactions
Yuanzhe Hu, Yu Wang, and Julian McAuley. Evaluating memory in LLM agents via incremental multi- turn interactions. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=DT7JyQC3MR
2026
-
[16]
Memory in the Age of AI Agents
Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, Senjie Jin, Jiejun Tan, Yanbin Yin, Jiongnan Liu, Zeyu Zhang, Zhongxiang Sun, Yutao Zhu, Hao Sun, Boci Peng, Zhenrong Cheng, Xuanbo Fan, Jiaxin Guo, Xinlei Yu, Zhenhong Zhou, Zewen Hu, Jiahao Huo, Junhao Wang, Yuwei Niu, Yu Wang, Zhe...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[17]
Automatic understanding of image and video advertisements
Zaeem Hussain, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas, Zuha Agber, Ralph Olen, and Adriana Kovashka. Automatic understanding of image and video advertisements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
2017
-
[18]
Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. Memory OS of AI agent. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of 13 MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25961–25970, Suzhou,...
-
[19]
Astyle-basedgeneratorarchitectureforgenerativeadversarial networks
TeroKarras, SamuliLaine, andTimoAila. Astyle-basedgeneratorarchitectureforgenerativeadversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
2019
-
[20]
Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, and Marcus Rohrbach. CLEVR-dialog: A diagnostic dataset for multi-round reasoning in visual dialog. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technol...
-
[21]
Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang, Yechan Hwang, Byungsoo Ko, Han-Gyu Kim, Dongyu Yao,XuankunRong,EojinJoo,Seung-HoHan,BowonKo,andHo-JinChoi. Multiverse: Amulti-turncon- versation benchmark for evaluating large vision and language models.arXiv preprint arXiv:2510.16641, 2025
-
[22]
Aiden Yiliu Li, Xinyue Hao, Shilong Liu, and Mengdi Wang. Avenir-web: Human-experience-imitating multimodal web agents with mixture of grounding experts.arXiv preprint arXiv:2602.02468, 2026
-
[23]
Sohn, Kaidong Hu, Muhammad Usman, and Mubbasir Kapadia
Danrui Li, Sen Zhang, Samuel S. Sohn, Kaidong Hu, Muhammad Usman, and Mubbasir Kapadia. Cardiverse: Harnessing LLMs for novel card game prototyping. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 29735–29762, Suzhou, Ch...
2025
-
[24]
Association for Computational Linguistics. ISBN 979-8-89176-332-6. doi: 10.18653/v1/2025.e mnlp-main.1511. URLhttps://aclanthology.org/2025.emnlp-main.1511/
-
[25]
SimpleMem: Efficient Lifelong Memory for LLM Agents
Jiaqi Liu, Yaofeng Su, Peng Xia, Yiyang Zhou, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, and Huaxiu Yao. Simplemem: Efficient lifelong memory for llm agents.arXiv preprint arXiv:2601.02553,
work page internal anchor Pith review Pith/arXiv arXiv
-
[26]
URLhttps://github.com/aiming-lab/SimpleMem
-
[27]
Jiaqi Liu, Zipeng Ling, Shi Qiu, Yanqing Liu, Siwei Han, Peng Xia, Haoqin Tu, Zeyu Zheng, Cihang Xie, Charles Fleming, Mingyu Ding, and Huaxiu Yao. Omni-simplemem: Autoresearch-guided discovery of lifelong multimodal agent memory.arXiv preprint arXiv:2604.01007, 2026
-
[28]
Convbench: a multi-turn conversation evaluation benchmark with hierarchical ablation capability for large vision-language models
Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao, and Kaipeng Zhang. Convbench: a multi-turn conversation evaluation benchmark with hierarchical ablation capability for large vision-language models. InProceedings of the 38th International Conference on Neural Information Processing Systems, NI...
2024
-
[29]
Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, et al. Mmdu: A multi-turn multi-image dialog understanding benchmark and instruction-tuning dataset for lvlms.arXiv preprint arXiv:2406.11833, 2024
-
[30]
Mma: Multimodal memory agent.arXiv preprint arXiv:2602.16493, 2026
Yihao Lu, Wanru Cheng, Zeyu Zhang, and Hao Tang. Mma: Multimodal memory agent.arXiv preprint arXiv:2602.16493, 2026. 14 MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
-
[31]
Evaluating very long-term conversational memory of LLM agents
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of LLM agents. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13851–13870, Bangko...
-
[32]
Evaluating very long-term conversational memory of LLM agents
Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long.747. URL https://aclanthology.org/2024.acl-long.747/
-
[33]
According to me: Long-term personalized referential memory qa.arXiv preprint arXiv:2603.01990, 2026
Jingbiao Mei, Jinghong Chen, Guangyu Yang, Xinyu Hou, Margaret Li, and Bill Byrne. According to me: Long-term personalized referential memory qa.arXiv preprint arXiv:2603.01990, 2026. doi: 10.48550/arXiv.2603.01990. URLhttps://arxiv.org/abs/2603.01990
-
[34]
R-wom: Retrieval-augmented world model for computer-use agents.arXiv preprint arXiv:2510.11892, 2025
Kai Mei, Jiang Guo, Shuaichen Chang, Mingwen Dong, Dongkyu Lee, Xing Niu, and Jiarong Jiang. R-wom: Retrieval-augmented world model for computer-use agents.arXiv preprint arXiv:2510.11892, 2025
-
[35]
Openai api model documentation
OpenAI. Openai api model documentation. https://platform.openai.com/docs/models ,
-
[36]
Accessed: 2026-05-01
2026
-
[37]
Steering the Verifiability of Multimodal AI Hallucinations
Jianhong Pang, Ruoxi Cheng, Ziyi Ye, Xingjun Ma, Zuxuan Wu, Xuanjing Huang, and Yu-Gang Jiang. Steering the verifiability of multimodal ai hallucinations.arXiv preprint arXiv:2604.06714, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[38]
Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST ’23, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701320. doi:...
-
[39]
From commands to prompts: LLM-based semantic file system for AIOS
Zeru Shi, Kai Mei, Mingyu Jin, Yongye Su, Chaoji Zuo, Wenyue Hua, Wujiang Xu, Yujie Ren, Zirui Liu, Mengnan Du, Dong Deng, and Yongfeng Zhang. From commands to prompts: LLM-based semantic file system for AIOS. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=2G021ZqUEZ
2025
-
[40]
Reflexion: language agents with verbal reinforcement learning
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: language agents with verbal reinforcement learning. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY, USA, 2023. Curran Associates Inc
2023
-
[41]
Japan open driving dataset sample.https://huggingface.co/datasets/turi ng-motors/Japan-Open-Driving-Dataset-Sample, 2024
Turing Motors. Japan open driving dataset sample.https://huggingface.co/datasets/turi ng-motors/Japan-Open-Driving-Dataset-Sample, 2024. Accessed: 2026-05-01
2024
-
[42]
MIRIX: Multi-Agent Memory System for LLM-Based Agents
Yu Wang and Xi Chen. Mirix: Multi-agent memory system for llm-based agents.arXiv preprint arXiv:2507.07957, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Longmemeval: Benchmarking chat assistants on long-term interactive memory.arXiv preprint arXiv:2410.10813, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[44]
Large multimodal agents: A survey.arXiv preprint arXiv:2402.15116, 2024
Junlin Xie, Zhihong Chen, Ruifei Zhang, Xiang Wan, and Guanbin Li. Large multimodal agents: A survey.arXiv preprint arXiv:2402.15116, 2024
-
[45]
Yiweng Xie, Bo He, Junke Wang, Xiangyu Zheng, Ziyi Ye, and Zuxuan Wu. Fluxmem: Adaptive hierarchical memory for streaming video understanding.arXiv preprint arXiv:2603.02096, 2026. 15 MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
-
[46]
Crab: Cross-environment agent benchmark for multimodal language model agents
Tianqi Xu, Linyao Chen, Dai-Jie Wu, Yanjun Chen, Zecheng Zhang, Xiang Yao, Zhiqiang Xie, Yongchao Chen, Shilong Liu, Bochen Qian, et al. Crab: Cross-environment agent benchmark for multimodal language model agents. InFindings of the Association for Computational Linguistics: ACL 2025, pages 21607–21647, 2025
2025
-
[47]
A-mem: Agentic memory for llm agents
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents. InAdvances in Neural Information Processing Systems, 2025
2025
-
[48]
AEL: Agent Evolving Learning for Open-Ended Environments
Wujiang Xu, Jiaojiao Han, Minghao Guo, Kai Mei, Xi Zhu, Han Zhang, and Dimitris N Metaxas. Ael: Agent evolving learning for open-ended environments.arXiv preprint arXiv:2604.21725, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[49]
Haochen Xue, Feilong Tang, Ming Hu, Yexin Liu, Qidong Huang, Yulong Li, Chengzhi Liu, Zhongxing Xu, Chong Zhang, Chun-Mei Feng, Yutong Xie, Imran Razzak, Zongyuan Ge, Jionglong Su, Junjun He, and Yu Qiao. MMRC: A large-scale benchmark for understanding multimodal large language model in real-world conversation. In Wanxiang Che, Joyce Nabende, Ekaterina Sh...
-
[50]
Seed- story: Multimodal long story generation with large language model
Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, and Yingcong Chen. Seed- story: Multimodal long story generation with large language model. In2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 1871–1881, 2025. doi: 10.1109/ICCVW690 36.2025.00197
-
[51]
Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model-based agents.ACM Trans. Inf. Syst., 43(6), September 2025. ISSN 1046-8188. doi: 10.1145/3748302. URLhttps: //doi.org/10.1145/3748302. A. Benchmark Construction and Dataset Details A.1. Task Stati...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.