Evaluating long-horizon memory for multi-party collaborative dialogues.arXiv preprint arXiv:2602.01313, 2026

Chuanrui Hu, Tong Li, Xingze Gao, Hongda Chen, Yi Bai, Dannong Xu, Tianwei Lin, Xiaohong Li, Yunyun Han, Jian Pei, Yafeng Deng · 2026 · arXiv 2602.01313

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

H2HMem: A Multimodal Memory Benchmark for Agents in Human-Human Interactions

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

H2HMem is a multimodal memory benchmark evaluating LLM agents on recall, reasoning, and application in dyadic and multi-party human-human conversations with phenomena such as anaphora and deixis.

MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts

cs.IR · 2026-05-20 · unverdicted · novelty 7.0

MemConflict provides a benchmark for testing LLM long-term memory systems under dynamic, static, and conditional conflicts involving temporal validity, factual correctness, and contextual applicability.

GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

cs.CL · 2026-05-14 · unverdicted · novelty 7.0 · 2 refs

GroupMemBench is a new benchmark exposing that LLM agent memory systems fail on group conversation properties like speaker-grounded tracking and audience-adapted responses, with top systems at 46% accuracy.

MemSyco-Bench: Benchmarking Sycophancy in Agent Memory

cs.IR · 2026-07-01

citing papers explorer

Showing 4 of 4 citing papers after filters.

H2HMem: A Multimodal Memory Benchmark for Agents in Human-Human Interactions cs.CL · 2026-06-08 · unverdicted · none · ref 28
H2HMem is a multimodal memory benchmark evaluating LLM agents on recall, reasoning, and application in dyadic and multi-party human-human conversations with phenomena such as anaphora and deixis.
MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts cs.IR · 2026-05-20 · unverdicted · none · ref 7
MemConflict provides a benchmark for testing LLM long-term memory systems under dynamic, static, and conditional conflicts involving temporal validity, factual correctness, and contextual applicability.
GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations cs.CL · 2026-05-14 · unverdicted · none · ref 22 · 2 links
GroupMemBench is a new benchmark exposing that LLM agent memory systems fail on group conversation properties like speaker-grounded tracking and audience-adapted responses, with top systems at 46% accuracy.
MemSyco-Bench: Benchmarking Sycophancy in Agent Memory cs.IR · 2026-07-01 · unreviewed · ref 62

Evaluating long-horizon memory for multi-party collaborative dialogues.arXiv preprint arXiv:2602.01313, 2026

fields

years

verdicts

representative citing papers

citing papers explorer