hub Mixed citations

Evaluating Very Long-Term Conversational Memory of

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang · 2024 · Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) · DOI 10.18653/v1/2024.acl-long.747

Mixed citation behavior. Most common role is background (57%).

32 Pith papers citing it

36 external citations · Crossref

Background 57% of classified citations

open at publisher browse 32 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 5 baseline 1 method 1

citation-polarity summary

background 4 baseline 1 unclear 1 use method 1

representative citing papers

Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models

cs.AI · 2026-06-09 · conditional · novelty 8.0

Memory augmentation in LLMs amplifies sycophancy up to 25x compared to in-context baselines due to lossy memory extraction, with two lightweight mitigations that reduce the effect while preserving recall.

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

GateMem benchmark shows no existing memory method for LLM agents achieves strong utility, access control, and reliable forgetting simultaneously in multi-principal shared settings.

Tangram: Unlocking Non-Uniform KV Cache for Efficient Multi-turn LLM Serving

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

Tangram makes non-uniform KV cache compression practical for LLM serving with deterministic budget allocation, head group paging, and ahead-of-time load balancing, achieving up to 2.6x throughput gains.

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

cs.CR · 2026-05-28 · unverdicted · novelty 7.0

MemPoison enables stealthy memory poisoning in LLM agents via dialogue by using semantic relational bridges, entity masquerading, and joint embedding optimization to bypass selective extraction and rewriting, achieving up to 0.95 attack success rate.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.

Stateful Agent Backdoor

cs.CR · 2026-05-07 · unverdicted · novelty 7.0

A stateful backdoor for LLM agents, modeled as a Mealy machine with a decomposition framework, enables incremental malicious actions across sessions and achieves 80-95% attack success rate on four models.

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

cs.AI · 2025-09-29 · conditional · novelty 7.0

ReasoningBank distills generalizable reasoning strategies from agent successes and failures to enable self-evolution, with memory-aware test-time scaling amplifying gains over raw-trajectory or success-only memory on web and software benchmarks.

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

cs.CL · 2024-10-14 · unverdicted · novelty 7.0

LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.

Latent Personal Memory: Represent personal memory as dynamic soft prompts

cs.CL · 2026-06-18 · unverdicted · novelty 6.0

LPM encodes personal history as N latent slots projected by cross-attention into input-conditioned soft prompts for frozen LLMs, reporting up to 8.8% higher accuracy than LoRA and 64x lower KV-cache on PersonaMem v1 plus matching LoRA accuracy with 120x fewer parameters on LoCoMo.

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

MRAgent combines a Cue-Tag-Content associative graph with active reconstruction to enable dynamic memory access in LLM agents, reporting up to 23% gains on long-memory benchmarks with lower token costs.

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

AURA improves implicit-need coverage by 0.07 over ReAct baselines on a 100-query benchmark by inserting an intent inference step controlled by a gap score, while cutting probes 82% on factual tasks.

Eywa: Provenance-Grounded Long-Term Memory for AI Agents

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

Eywa introduces a provenance-grounded memory system for persistent AI agents featuring evidence-first storage, typed validation, and deterministic multi-route retrieval, reporting 90.19% accuracy on LoCoMo and 88.2% on LongMemEval-S.

What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis

cs.AI · 2026-05-05 · unverdicted · novelty 6.0 · 2 refs

In LLM agents, memory routing circuits emerge at 0.6B scale while content circuits appear only at 4B, and write/read operations recruit a pre-existing late-layer context hub instead of creating a new one, enabling a 76% accurate unsupervised failure diagnostic.

MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.

CL-bench Life: Can Language Models Learn from Real-Life Context?

cs.CL · 2026-04-29 · unverdicted · novelty 6.0

CL-bench Life shows frontier language models achieve only 13.8% average success on real-life context tasks, with the best model at 19.3%.

Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards

cs.AI · 2026-04-11 · unverdicted · novelty 6.0

Introduces MemHome benchmark and RL with multi-dimensional rewards for memory-driven smart home device control.

TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.

EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments

cs.CL · 2025-09-22 · unverdicted · novelty 6.0

EpiCache clusters long conversation history into coherent episodes for per-episode KV cache eviction, delivering up to 30% accuracy gains and 3.7x peak memory reduction on LongConvQA tasks under fixed budgets.

CoreMem: Riemannian Retrieval and Fisher-Guided Distillation for Long-Term Memory in Dialogue Agents

cs.CL · 2026-06-16 · unverdicted · novelty 5.0

CoreMem replaces cosine retrieval with Fisher-Rao Riemannian matching and introduces Fisher-guided discrete token distillation for syntax-aware compression, reporting +4.51 pp open-domain and +4.17 pp temporal gains on LOCOMO and LongMemEval-S while staying inside an 8 GB VRAM budget.

Decoupling Thought from Speech: Knowledge-Grounded Counterfactual Reasoning for Resilient Multi-Agent Argumentation

cs.MA · 2026-06-09 · unverdicted · novelty 5.0

KG-CFR decouples planning from execution via knowledge-grounded counterfactual reasoning, preventing critical degradation in over 95% of perturbed runs and raising argument quality from 0.694 to 0.822 in a 1v1v1 simulation.

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

cs.CL · 2026-05-28 · unverdicted · novelty 5.0

Introduces Parametric Memory Law as power law for LoRA memory capacity and MemFT threshold-guided optimization for better memory fidelity.

UserGPT Technical Report

cs.IR · 2026-05-09 · unverdicted · novelty 5.0

UserGPT introduces a generative LLM framework with a behavior simulation engine, semantization module, and DF-GRPO post-training that scores 0.7325 on tag prediction and 0.7528 on summary generation on HPR-Bench while compressing records by up to 97.9%.

Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly

cs.CR · 2026-05-02 · unverdicted · novelty 5.0 · 2 refs

The paper measures policy-carriage failures during LLM context assembly and evaluates SafeContext as a partial mitigation on Llama, Qwen, and Mistral models.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models cs.AI · 2026-06-09 · conditional · none · ref 17
Memory augmentation in LLMs amplifies sycophancy up to 25x compared to in-context baselines due to lossy memory extraction, with two lightweight mitigations that reduce the effect while preserving recall.
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory cs.AI · 2025-09-29 · conditional · none · ref 42
ReasoningBank distills generalizable reasoning strategies from agent successes and failures to enable self-evolution, with memory-aware test-time scaling amplifying gains over raw-trajectory or success-only memory on web and software benchmarks.

Evaluating Very Long-Term Conversational Memory of

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer