pith. sign in
Pith Number

pith:PQELSMEF

pith:2025:PQELSMEFFT32PBED4PJAX3QAS4
not attested not anchored not stored refs resolved

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Alok Prakash, Ao Qu, Bryan Kian Hsiang Low, Daniela Rus, Jinhua Zhao, Paul Pu Liang, Sunghwan Kim, Zhaoxuan Wu, Zijian Zhou

MEM1 trains agents to keep constant memory in long multi-turn tasks by updating one shared state that merges memory and reasoning via reinforcement learning.

arxiv:2506.15841 v2 · 2025-06-18 · cs.CL · cs.AI · cs.IR

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PQELSMEFFT32PBED4PJAX3QAS4}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task, and generalizes beyond the training horizon.

C2weakest assumption

That reinforcement learning on composed multi-turn environments will produce a memory-update policy that reliably retains all information needed for future interdependent queries while discarding only truly irrelevant content.

C3one line summary

MEM1 uses end-to-end RL to learn constant-memory agents that update a shared state for memory and reasoning, delivering 3.5x better performance and 3.7x lower memory use than larger baselines on long-horizon QA and shopping tasks.

References

75 extracted · 75 resolved · 19 Pith anchors

[1] Surprising exercises that will sharpen your short- term memory, January 2024 2024
[2] Why does the effective context length of llms fall short? In Proceedings of the International Conference on Learning Representations (ICLR), 2025 2025
[3] The claude 3 model family: Opus, sonnet, haiku 2024
[4] Baddeley and Graham J 1974
[5] Digirl: Training in-the-wild device-control agents with autonomous reinforcement learning 2024

Formal links

3 machine-checked theorem links

Cited by

30 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:19.719794Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

7c08b930852cf7a78483e3d20bee00970e581d8d267dd0a8f92ed248170f3299

Aliases

arxiv: 2506.15841 · arxiv_version: 2506.15841v2 · doi: 10.48550/arxiv.2506.15841 · pith_short_12: PQELSMEFFT32 · pith_short_16: PQELSMEFFT32PBED · pith_short_8: PQELSMEF
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PQELSMEFFT32PBED4PJAX3QAS4 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7c08b930852cf7a78483e3d20bee00970e581d8d267dd0a8f92ed248170f3299
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c09ec5a31e75575ba5f80d2b9b9f5dfe7951c0d0433c0a73b84970c066add2e3",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.IR"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-06-18T19:44:46Z",
    "title_canon_sha256": "9ac90f2b5b798f1c5e39839f89246cd79b4c1d6500ad89179d71c3d4225e63fd"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2506.15841",
    "kind": "arxiv",
    "version": 2
  }
}