Pith Number

pith:CAC5GPPS

pith:2026:CAC5GPPSU3MDSN6DOIJTUUVON7

not attested not anchored not stored refs resolved

Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory

Chengshuai Shi, Cong Shen, Jundong Li, Peng Wang, Songwei Dong, Zihan Chen

Aggregate accuracy scores can mask forgetting and negative transfer in sequentially evolving LLM memory.

arxiv:2605.15384 v1 · 2026-05-14 · cs.LG · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{CAC5GPPSU3MDSN6DOIJTUUVON7}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

higher final or cumulative accuracy does not necessarily imply better memory quality: many methods exhibit strong performance gains while suffering from substantial forgetting or negative transfer.

C2weakest assumption

The four proposed metrics (online utility, hold-out generalization, backward transfer, and forgetting) provide a meaningfully finer-grained and more informative assessment of memory quality than aggregate metrics in the external prompt-mediated test-time setting.

C3one line summary

SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.

References

53 extracted · 53 resolved · 15 Pith anchors

[1] GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning 2025 · arXiv:2507.19457

[2] M. Biesialska, K. Biesialska, and M. R. Costa-Jussa. Continual lifelong learning in natu- ral language processing: A survey. InProceedings of the 28th international conference on computational linguis 2020

[3] Efficient lifelong learning with A-GEM.CoRR, abs/1812.00420 2018 · arXiv:1812.00420

[4] Chaudhry et al 1902 · arXiv:1902.10486

[5] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Cha 2021

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:00:55.692719Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

1005d33df2a6d83937c372133a52ae6fc77e4874a292e064294ab8cb955bc55c

Aliases

arxiv: 2605.15384 · arxiv_version: 2605.15384v1 · doi: 10.48550/arxiv.2605.15384 · pith_short_12: CAC5GPPSU3MD · pith_short_16: CAC5GPPSU3MDSN6D · pith_short_8: CAC5GPPS

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/CAC5GPPSU3MDSN6DOIJTUUVON7 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1005d33df2a6d83937c372133a52ae6fc77e4874a292e064294ab8cb955bc55c

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ca2ba03628ac9b5e5f095729398e5284937376cd7c2a8debc3d6f741e0e63d53",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T20:15:22Z",
    "title_canon_sha256": "364097b5cce5aabdeb141acedf8cdf72d41fcb661b32bd48920619558314aa32"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15384",
    "kind": "arxiv",
    "version": 1
  }
}