pith. sign in
Pith Number

pith:RZ7367QX

pith:2025:RZ7367QXH25YUOKYP43PJPHK2O
not attested not anchored not stored refs resolved

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

Benjamin Coleman, Chi Wang, Derek Zhiyuan Cheng, Ed H. Chi, Fernando Pereira, Jingrui He, Mengting Ai, Noveen Sachdeva, Shuo Chen, Tianxin Wei, Wang-Cheng Kang, Xuying Ning, Yuanchen Bei, Yunzhe Li, Zhankui He

LLM agents achieve continual improvement on streaming tasks by using the ReMem pipeline to integrate reasoning, actions, and memory updates.

arxiv:2511.20857 v1 · 2025-11-25 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{RZ7367QXH25YUOKYP43PJPHK2O}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

ReMem, an action-think-memory refine pipeline, tightly integrates reasoning, task actions, and memory updates to achieve continual improvement in LLM agents on streaming tasks.

C2weakest assumption

That the chosen sequential task streams and the implemented memory modules faithfully capture the dynamics of real-world continuous interactions where memory evolution is required, without hidden implementation biases affecting the comparisons.

C3one line summary

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.

References

299 extracted · 299 resolved · 36 Pith anchors

[1] Measuring Massive Multitask Language Understanding 2009 · arXiv:2009.03300
[2] International Conference on Learning Representations (ICLR) , year=
[3] Advances in Neural Information Processing Systems (NeurIPS) , year=
[4] Advances in Neural Information Processing Systems (NeurIPS) , year=
[5] International Conference on Machine Learning (ICML) , year=

Formal links

1 machine-checked theorem link

Cited by

37 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:19.896925Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

8e7fbf7e173ebb8a39587f36f4bcead394069ddcc3d7107926de73ed48948a0c

Aliases

arxiv: 2511.20857 · arxiv_version: 2511.20857v1 · doi: 10.48550/arxiv.2511.20857 · pith_short_12: RZ7367QXH25Y · pith_short_16: RZ7367QXH25YUOKY · pith_short_8: RZ7367QX
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/RZ7367QXH25YUOKYP43PJPHK2O \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8e7fbf7e173ebb8a39587f36f4bcead394069ddcc3d7107926de73ed48948a0c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "91510619ee77a1e8bb1dbe58ace5f7ed30ee2190b6f2fa018506d5ca6f3c0544",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-11-25T21:08:07Z",
    "title_canon_sha256": "0d818bd916402e6653c573575779d522ab3b6ccd03e7edcf7f20c510c04d1e7e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2511.20857",
    "kind": "arxiv",
    "version": 1
  }
}