hub

Walking down the memory maze: Beyond context limit through interactive reading

Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz · 2023 · arXiv 2310.05029

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-Wiki

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

LLM-Wiki structures external knowledge as compilable wiki pages with links and persistent self-correction, achieving SOTA results on HotpotQA, MuSiQue, and 2WikiMultiHopQA by 2.0-8.1 F1 points over prior RAG systems.

Same Ranking, Different Winner: How Scoring Targets Shape LLM Memory Benchmarks

cs.IR · 2026-05-22 · unverdicted · novelty 7.0

Switching the credited target among Raw, Source, and Canonical changes nDCG on 83.4-94.0% of queries, flips system orderings, and reverses parser-density recommendations on LoCoMo and LongMemEval-S.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

cs.CL · 2026-04-29 · unverdicted · novelty 7.0

OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.

LSTM-MAS: A Long Short-Term Memory Inspired Multi-Agent System for Long-Context Understanding

cs.CL · 2026-01-17 · unverdicted · novelty 7.0

LSTM-MAS uses a chained multi-agent architecture modeled on LSTM input, forget, and output gates to improve long-context QA performance and reduce hallucinations compared with prior multi-agent baselines.

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

cs.CL · 2024-10-14 · unverdicted · novelty 7.0

LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.

Trustworthiness in Retrieval-Augmented Generation Systems: A Survey

cs.IR · 2024-09-16 · unverdicted · novelty 7.0

Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.

When Does Overlap Help? OSU-Mem and a Cell-Conditional Analysis of Trajectory Memory for LLM Agents

cs.IR · 2026-06-19 · unverdicted · novelty 5.0

OSU-Mem shows overlapping memory helps retrieval when evidence shares tools or entities but hurts when steps are heterogeneous, with benefits on synthetic benchmarks vanishing on mixed real ones due to query mixing.

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

cs.CL · 2026-06-03 · unverdicted · novelty 5.0

SegTreeMem organizes agent conversation history as a temporally ordered segment tree and shows improved answer quality on long-horizon benchmarks when chronological order is preserved during insertion and retrieval.

DocRetriever: A Plug-and-Play Framework for Multimodal Document Retrieval with Comprehensive Benchmark

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

DocRetriever introduces a framework using layout-aware sparse embeddings for hybrid encoding without OCR and a generalizable reasoning-augmented reranker for few-shot settings, plus the MultiDocR benchmark for evaluation.

CALMem : Application-Layer Dual Memory for Conversational AI

cs.IR · 2026-05-20 · unverdicted · novelty 5.0

CALMem delivers virtually unbounded effective context for LLM conversations via an application-layer dual memory architecture with intra-session retrieval and token-adaptive injection.

ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education

cs.IR · 2026-02-04 · conditional · novelty 5.0

ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.

MemOS: A Memory OS for AI System

cs.CL · 2025-07-04 · unverdicted · novelty 5.0

MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.

A Survey on Retrieval-Augmented Text Generation for Large Language Models

cs.IR · 2024-04-17 · unverdicted · novelty 2.0

A survey that categorizes RAG methods for LLMs into four retrieval-centric stages, reviews their evolution and evaluation, and outlines challenges and future directions.

citing papers explorer

Showing 14 of 14 citing papers after filters.

Evaluating Very Long-Term Conversational Memory of LLM Agents cs.CL · 2024-02-27 · unverdicted · none · ref 40
Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-Wiki cs.CL · 2026-05-25 · unverdicted · none · ref 4
LLM-Wiki structures external knowledge as compilable wiki pages with links and persistent self-correction, achieving SOTA results on HotpotQA, MuSiQue, and 2WikiMultiHopQA by 2.0-8.1 F1 points over prior RAG systems.
Same Ranking, Different Winner: How Scoring Targets Shape LLM Memory Benchmarks cs.IR · 2026-05-22 · unverdicted · none · ref 1
Switching the credited target among Raw, Source, and Canonical changes nDCG on 83.4-94.0% of queries, flips system orderings, and reverses parser-density recommendations on LoCoMo and LongMemEval-S.
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States cs.CL · 2026-05-06 · unverdicted · none · ref 51
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory cs.CL · 2026-04-29 · unverdicted · none · ref 3
OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.
LSTM-MAS: A Long Short-Term Memory Inspired Multi-Agent System for Long-Context Understanding cs.CL · 2026-01-17 · unverdicted · none · ref 40
LSTM-MAS uses a chained multi-agent architecture modeled on LSTM input, forget, and output gates to improve long-context QA performance and reduce hallucinations compared with prior multi-agent baselines.
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory cs.CL · 2024-10-14 · unverdicted · none · ref 62
LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey cs.IR · 2024-09-16 · unverdicted · none · ref 39
Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.
When Does Overlap Help? OSU-Mem and a Cell-Conditional Analysis of Trajectory Memory for LLM Agents cs.IR · 2026-06-19 · unverdicted · none · ref 1
OSU-Mem shows overlapping memory helps retrieval when evidence shares tools or entities but hurts when steps are heterogeneous, with benefits on synthetic benchmarks vanishing on mixed real ones due to query mixing.
Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents cs.CL · 2026-06-03 · unverdicted · none · ref 4
SegTreeMem organizes agent conversation history as a temporally ordered segment tree and shows improved answer quality on long-horizon benchmarks when chronological order is preserved during insertion and retrieval.
DocRetriever: A Plug-and-Play Framework for Multimodal Document Retrieval with Comprehensive Benchmark cs.CV · 2026-05-28 · unverdicted · none · ref 9
DocRetriever introduces a framework using layout-aware sparse embeddings for hybrid encoding without OCR and a generalizable reasoning-augmented reranker for few-shot settings, plus the MultiDocR benchmark for evaluation.
CALMem : Application-Layer Dual Memory for Conversational AI cs.IR · 2026-05-20 · unverdicted · none · ref 3
CALMem delivers virtually unbounded effective context for LLM conversations via an application-layer dual memory architecture with intra-session retrieval and token-adaptive injection.
MemOS: A Memory OS for AI System cs.CL · 2025-07-04 · unverdicted · none · ref 9
MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.
A Survey on Retrieval-Augmented Text Generation for Large Language Models cs.IR · 2024-04-17 · unverdicted · none · ref 13
A survey that categorizes RAG methods for LLMs into four retrieval-centric stages, reviews their evolution and evaluation, and outlines challenges and future directions.

Walking down the memory maze: Beyond context limit through interactive reading

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer