hub

Walking down the memory maze: Beyond context limit through interactive reading

Minimal test collections for retrieval evaluation · 2023 · arXiv 2310.05029

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-Wiki

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

LLM-Wiki structures external knowledge as compilable wiki pages with links and persistent self-correction, achieving SOTA results on HotpotQA, MuSiQue, and 2WikiMultiHopQA by 2.0-8.1 F1 points over prior RAG systems.

Same Ranking, Different Winner: How Scoring Targets Shape LLM Memory Benchmarks

cs.IR · 2026-05-22 · unverdicted · novelty 7.0

Switching the credited target among Raw, Source, and Canonical changes nDCG on 83.4-94.0% of queries, flips system orderings, and reverses parser-density recommendations on LoCoMo and LongMemEval-S.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

cs.CL · 2026-04-29 · unverdicted · novelty 7.0

OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.

LSTM-MAS: A Long Short-Term Memory Inspired Multi-Agent System for Long-Context Understanding

cs.CL · 2026-01-17 · unverdicted · novelty 7.0

LSTM-MAS uses a chained multi-agent architecture modeled on LSTM input, forget, and output gates to improve long-context QA performance and reduce hallucinations compared with prior multi-agent baselines.

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

cs.CL · 2024-10-14 · unverdicted · novelty 7.0

LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.

Trustworthiness in Retrieval-Augmented Generation Systems: A Survey

cs.IR · 2024-09-16 · unverdicted · novelty 7.0

Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.

When Does Overlap Help? OSU-Mem and a Cell-Conditional Analysis of Trajectory Memory for LLM Agents

cs.IR · 2026-06-19 · unverdicted · novelty 5.0

OSU-Mem shows overlapping memory helps retrieval when evidence shares tools or entities but hurts when steps are heterogeneous, with benefits on synthetic benchmarks vanishing on mixed real ones due to query mixing.

DocRetriever: A Plug-and-Play Framework for Multimodal Document Retrieval with Comprehensive Benchmark

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

DocRetriever introduces a framework using layout-aware sparse embeddings for hybrid encoding without OCR and a generalizable reasoning-augmented reranker for few-shot settings, plus the MultiDocR benchmark for evaluation.

CALMem : Application-Layer Dual Memory for Conversational AI

cs.IR · 2026-05-20 · unverdicted · novelty 5.0

CALMem delivers virtually unbounded effective context for LLM conversations via an application-layer dual memory architecture with intra-session retrieval and token-adaptive injection.

ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education

cs.IR · 2026-02-04 · conditional · novelty 5.0

ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.

MemOS: A Memory OS for AI System

cs.CL · 2025-07-04 · unverdicted · novelty 5.0

MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.

A Survey on Retrieval-Augmented Text Generation for Large Language Models

cs.IR · 2024-04-17 · unverdicted · novelty 2.0

A survey that categorizes RAG methods for LLMs into four retrieval-centric stages, reviews their evolution and evaluation, and outlines challenges and future directions.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Walking down the memory maze: Beyond context limit through interactive reading

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer