pith. sign in
Pith Number

pith:CAC5GPPS

pith:2026:CAC5GPPSU3MDSN6DOIJTUUVON7
not attested not anchored not stored refs resolved

Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory

Chengshuai Shi, Cong Shen, Jundong Li, Peng Wang, Songwei Dong, Zihan Chen

Aggregate accuracy scores can mask forgetting and negative transfer in sequentially evolving LLM memory.

arxiv:2605.15384 v1 · 2026-05-14 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{CAC5GPPSU3MDSN6DOIJTUUVON7}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

higher final or cumulative accuracy does not necessarily imply better memory quality: many methods exhibit strong performance gains while suffering from substantial forgetting or negative transfer.

C2weakest assumption

The four proposed metrics (online utility, hold-out generalization, backward transfer, and forgetting) provide a meaningfully finer-grained and more informative assessment of memory quality than aggregate metrics in the external prompt-mediated test-time setting.

C3one line summary

SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.

References

53 extracted · 53 resolved · 15 Pith anchors

[1] GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning 2025 · arXiv:2507.19457
[2] M. Biesialska, K. Biesialska, and M. R. Costa-Jussa. Continual lifelong learning in natu- ral language processing: A survey. InProceedings of the 28th international conference on computational linguis 2020
[3] Efficient lifelong learning with A-GEM.CoRR, abs/1812.00420 2018 · arXiv:1812.00420
[4] Chaudhry et al 1902 · arXiv:1902.10486
[5] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Cha 2021

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:00:55.692719Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

1005d33df2a6d83937c372133a52ae6fc77e4874a292e064294ab8cb955bc55c

Aliases

arxiv: 2605.15384 · arxiv_version: 2605.15384v1 · doi: 10.48550/arxiv.2605.15384 · pith_short_12: CAC5GPPSU3MD · pith_short_16: CAC5GPPSU3MDSN6D · pith_short_8: CAC5GPPS
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/CAC5GPPSU3MDSN6DOIJTUUVON7 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1005d33df2a6d83937c372133a52ae6fc77e4874a292e064294ab8cb955bc55c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ca2ba03628ac9b5e5f095729398e5284937376cd7c2a8debc3d6f741e0e63d53",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T20:15:22Z",
    "title_canon_sha256": "364097b5cce5aabdeb141acedf8cdf72d41fcb661b32bd48920619558314aa32"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15384",
    "kind": "arxiv",
    "version": 1
  }
}