Needle in the haystack for memory based large language models

· 2024 · arXiv 2407.01437

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

MemFail: Stress-Testing Failure Modes of LLM Memory Systems

cs.AI · 2026-05-26 · unverdicted · novelty 7.0

MemFail introduces diagnostic datasets that isolate failure modes in LLM memory systems by testing summarization, storage, and retrieval operations separately.

From Facts to Insights: A Persona-Driven Dual Memory Framework and Dataset for Role-Playing Agents

cs.CL · 2026-05-25 · unverdicted · novelty 6.0

RoleMemo dataset and DualMem dual-memory framework let role-playing agents interpret facts through personas, with a 4B model beating larger zero-shot systems on fidelity.

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

cs.LG · 2026-03-15 · unverdicted · novelty 6.0

M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.

Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs

cs.CL · 2026-05-08 · conditional · novelty 5.0 · 2 refs

EngGPT2MoE-16B-A3B matches or exceeds other Italian open-source LLMs on most international benchmarks while remaining competitive on ITALIC, though it trails some top international models.

Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents

cs.AI · 2025-10-03

citing papers explorer

Showing 5 of 5 citing papers.

MemFail: Stress-Testing Failure Modes of LLM Memory Systems cs.AI · 2026-05-26 · unverdicted · none · ref 2
MemFail introduces diagnostic datasets that isolate failure modes in LLM memory systems by testing summarization, storage, and retrieval operations separately.
From Facts to Insights: A Persona-Driven Dual Memory Framework and Dataset for Role-Playing Agents cs.CL · 2026-05-25 · unverdicted · none · ref 81
RoleMemo dataset and DualMem dual-memory framework let role-playing agents interpret facts through personas, with a 4B model beating larger zero-shot systems on fidelity.
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling cs.LG · 2026-03-15 · unverdicted · none · ref 24
M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs cs.CL · 2026-05-08 · conditional · none · ref 90 · 2 links
EngGPT2MoE-16B-A3B matches or exceeds other Italian open-source LLMs on most international benchmarks while remaining competitive on ITALIC, though it trails some top international models.
Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents cs.AI · 2025-10-03 · unreviewed · ref 11

Needle in the haystack for memory based large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer