LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory , booktitle =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A new 30k-instance semantic segmentation dataset plus block distillation with sink tokens, dropout, and weighted loss lets block-attention models reach near full-attention performance on long texts.
citing papers explorer
-
LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues
LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.
-
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation
A new 30k-instance semantic segmentation dataset plus block distillation with sink tokens, dropout, and weighted loss lets block-attention models reach near full-attention performance on long texts.