pith. sign in

arxiv: 2606.19172 · v1 · pith:UP35RQTGnew · submitted 2026-06-17 · 💻 cs.AI

User as Engram: Internalizing Per-User Memory as Local Parametric Edits

Pith reviewed 2026-06-26 20:34 UTC · model grok-4.3

classification 💻 cs.AI
keywords personalizationmemoryparametric editslanguage modelsLoRAEngram modeluser-specific knowledge
0
0 comments X

The pith

Storing each user's facts as precise local edits inside an Engram model's hash-keyed table, while keeping reasoning skill in one shared adapter, matches LoRA direct recall yet raises indirect-reasoning accuracy 5.6 times and shrinks memory

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that language-model personalization splits into two distinct problems: holding a user's private facts and carrying the shared skill to use those facts in new contexts. Current approaches either leave facts outside the weights or fold them into a per-user LoRA that alters every token the model sees. The proposed method writes facts instead as surgical edits to the hash-keyed memory table of an Engram model and leaves the reasoning adapter unchanged across users. This separation produces edits that activate only at their exact trigger, leave every other weight bit-for-bit identical, and compose additively when many users occupy the same table. A reader would care because the design promises scalable, non-interfering per-user memory without the contamination or storage growth of existing recipes.

Core claim

User facts can be internalized by writing them as local parametric edits to the hash-keyed memory table of an Engram model while the reasoning skill remains in a single shared adapter. The edits match the direct-recall performance of per-user LoRAs, deliver 5.6 times higher accuracy on indirect reasoning tasks on average, never degrade any user below the untouched base model, and occupy roughly 33,000 times less memory. Because different users land in disjoint hash slots their edits stack losslessly; upon retrieval the per-user table does not grow with population size and therefore overtakes retrieval pipelines once the fact count exceeds roughly 100.

What carries the argument

Surgical edits to the hash-keyed memory table of an Engram model, which activate lookup only at the exact trigger position, add the required value, and leave every other weight position mathematically unchanged to the last bit.

If this is right

  • Direct recall of stored facts equals that of a dedicated per-user LoRA.
  • Indirect reasoning accuracy rises 5.6 times on average while no user falls below base-model performance.
  • Multiple users' facts occupy disjoint hash slots and compose additively without interference.
  • Memory footprint remains roughly 33,000 times smaller than per-user adapters.
  • Past approximately 100 facts the per-user table overtakes retrieval on a 2.5 times larger model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Dynamic insertion and deletion of individual facts becomes possible without retraining or global weight changes.
  • The same table could support verifiable fact provenance because each edit is isolated and reversible.
  • Retrieval pipelines could be replaced by the table for populations larger than a few hundred users without proportional search cost growth.

Load-bearing premise

An Engram model already exists or can be built whose hash-keyed memory table accepts arbitrary user facts as edits that affect only the intended trigger positions and leave all other weights untouched.

What would settle it

A controlled test that writes a new fact into the table and then measures whether any token unrelated to that fact changes its output probability or whether the edit fails to activate exactly at the trigger position.

Figures

Figures reproduced from arXiv: 2606.19172 by Bojie Li.

Figure 1
Figure 1. Figure 1: User as Engram splits personal memory into two layers, mirroring the brain’s complementary learning systems. Top (content). Each user’s facts are written as a few local row overrides (colored) in the Engram model’s one hash-keyed memory table; the table’s other rows hold the general knowledge learned at pretraining (gray). A write touches only that user’s rows (∆bpb = +0.0001 on all other text), and differ… view at source ↗
Figure 2
Figure 2. Figure 2: Where User as Engram sits among personal-memory methods. Context-based methods (in-context learning, retrieval, memory systems) leave the model untouched but pay context tokens at query time (top-left); weight-based methods (per-user LoRA, knowledge editing) pay no context but edit the model globally (bottom-right). User as Engram occupies the remaining corner: a local edit at zero context cost. Axes are q… view at source ↗
Figure 3
Figure 3. Figure 3: Architectural contamination on held-out text, Mini-Engram-d20 (canonical seed S0, n=20 users). A per-user LoRA more than triples val bpb on text unrelated to the user’s facts (+1.784; per-user range [+0.44, +3.74]; 17/20 users worse), while a per-user Engram row leaves it unchanged to four decimals (+0.00005; 0/20 worse)—∼34,000× less, by design. The 3-seed mean ratio is ∼33,000× (Appendix N). throughout. … view at source ↗
Figure 4
Figure 4. Figure 4: Cross-base LoRA scaling: mean change in indirect recall (adapter − base) with the fraction of users it hurts. On the base LM (Mini-Engram-d20) a per-user LoRA disrupts fragile completion behavior (85% worse, ∆=−0.133); on every instruction-tuned base the reasoning skill absorbs the perturbation (0–20% worse, ∆ from +0.088 to +0.217). (more than tripling here), and it rises a little further with each fact a… view at source ↗
Figure 5
Figure 5. Figure 5: Where the Engram lives in the transformer. The backbone is a standard transformer; at a few designated Engram layers (⋆) a content-addressable lookup runs alongside attention and the MLP. At a position, the recent tokens’ suffix N-gram is hashed by K heads into K table addresses (deterministic, hence known before the forward pass); the retrieved rows are projected by WK, WV, gated by α, and added to the re… view at source ↗
Figure 6
Figure 6. Figure 6: Inserting a user fact into the Engram. A fact is reduced to where and what: the trigger’s suffix N-gram hashes to a sparse set of row addresses Rf (where), and one of three strategies writes a value e ⋆ into those rows (what). UNEMBED_P solves for e ⋆ in closed form; OPT takes a few gradient steps per fact; Joint OPT—our default beyond ∼30 facts/user— optimizes all of the user’s rows together so they do no… view at source ↗
Figure 7
Figure 7. Figure 7: Addressed write vs. global function-bend. Per-layer, per-position residual-stream change ∥x (ℓ) after − x (ℓ) before∥ on the same trigger sentence and base (Mini-Engram-d12@1280, log color scale). (a) An Engram row insertion is exactly 0.000 at every position before the Engram layer (causality) and at every non-trigger position after it—only the trigger column moves. (b) A per-user LoRA fit to the same sin… view at source ↗
Figure 8
Figure 8. Figure 8: The mechanism on the trained model (Mini-Engram-d20, 16 facts). (a) The write opens its own gate: the trigger-position gate α rises from 0.02 to 0.99 (OPT) while non-trigger positions stay at 0.04. (b) The deployed row’s residual change is cosine 0.999 to its value-path projection WVe (left bar); what the row encodes relative to the gold token’s unembedding is exact for the closed-form solution (0.59) and … view at source ↗
Figure 9
Figure 9. Figure 9: EngramServer architecture. Per user, an override map of (row index, row vector) pairs lives in DRAM. On each request, the server saves the originals at the affected addresses (∼2 ms), writes the user’s overrides, runs the forward pass, then restores. The Engram lookup at the configured layer transparently sees the override values; the gate fires only at the trigger N-gram. There is no router and no graph r… view at source ↗
Figure 10
Figure 10. Figure 10: Within-user fact-density: top-1 (left) and top-5 (right) recall vs. number of facts inserted simultaneously into one user’s override map. Joint OPT (blue) closes most of the gap to LoRA rank-64 (red dashed) at 161× less storage (88 KB vs. 14.2 MB at 100 facts). is what happens when many facts are live at once: the rows in a single user’s override table interfere during the forward pass. We sweep that dens… view at source ↗
Figure 11
Figure 11. Figure 11: Cost-quality trade-off across personal-memory methods at the same answering LM (Mini￾Engram-d20@1536). User-as-Engram Joint OPT matches a per-user LoRA’s LOCOMO F1 at 161× smaller per-user storage (88 KB vs. 14.2 MB at 100 facts). Retrieval baselines store each fact at a higher per-fact cost than an Engram row and still cap out below Engram on quality. Composition. Two users’ tables, or a user’s facts and… view at source ↗
Figure 12
Figure 12. Figure 12: Direct recall vs. memory systems, all sharing the same Mini-Engram-d12 answering LM (100 USER facts, XXL corpus). a When the query is the fact’s stored prefix, nearest-neighbor retrieval is near-perfect and beats Engram’s 68% top-1—but at a context-token cost Engram does not pay. b When the query is a paraphrase, retrieval drops to 60–75% while multi-trigger Engram insertion reaches 96.9% top-1. The story… view at source ↗
Figure 13
Figure 13. Figure 13: LOCOMO single-hop, full 10 conversations, scaling with Mini-Engram dense size. (a) Token-F1: User-as-Engram Joint OPT (blue) beats every retrieval baseline at every scale. (b) LLM-judge accuracy (Qwen2.5-14B-Instruct): the ranking flips—retrieval baselines (MEM0_LIKE, MEMMACHINE) beat Joint OPT by 0.05–0.10 because token-F1 over-credits Engram for the correct first answer token despite a noisy continuatio… view at source ↗
Figure 14
Figure 14. Figure 14: LOCOMO category breakdown (token-F1), Engram Joint OPT (blue) vs. the best retrieval baseline (red; max of MEM0_LIKE and MEMMACHINE_LIKE) across three dense scales. Engram wins single-hop, multi-hop, and reasoning at every scale—the per-fact (question, answer) loss encodes the question→answer map that retrieval must chain across evidence sentences—but loses open-domain, where the answer is a verbatim span… view at source ↗
Figure 15
Figure 15. Figure 15: Multi-hop reasoning over Engram-inserted facts (n=63 chained pairs on Mini-Engram-d20, two OPT-15 insertions per item). Per-fact direct recall is 99.2% (both facts are insertable), so the gap is chaining, not insertion. When the query’s suffix N-gram coincides with Fact-2’s trigger, the gate fires and recall is 90.6%; when true chaining is required, it collapses to 12.9% (Wilson 95% CIs shown). The gate i… view at source ↗
Figure 16
Figure 16. Figure 16: The six parametric conditions on n=20 test users (Mini-Engram-d20, seed S0; bars left-to-right as listed in the text). Left: the layered design (Engram + shared LoRA) matches per-user LoRA’s direct recall (100% vs. 99%) while delivering 7.4× its indirect_any on this seed (44% vs. 6%); neither the per-user Engram (no skill) nor the shared LoRA alone (never saw the user’s facts) suffices. Right: the layered… view at source ↗
Figure 17
Figure 17. Figure 17: Indirect-reasoning accuracy and retrieval recall vs. KB size (n=20 users, 20 indirect probes each; KB = test user’s 34 facts + distractors sampled from the 30-user schema-family pool, log x-axis, N ∈ {34, 100, 200, 300, 500, 1000}). (a) the layered design (horizontal solid line at 44%) is invariant to KB size; the retrieval-based conditions degrade monotonically. Naive RAG and Qwen-3B + RAG fall below the… view at source ↗
Figure 18
Figure 18. Figure 18: EngramServer throughput across four deployment scales on one idle Blackwell GPU. On the ablation-optimal d12@1280 the server reaches 232 req/s (30 u/50 f) and holds 226 req/s at 100 u/100 f— within 3% as tenants triple—at 4.4 ms p50 latency, and a sub-millisecond override apply (0.03 ms p50; the larger per-component figures in Appendix G are from an earlier, slower benchmark on the smaller d12 v2 checkpoi… view at source ↗
Figure 19
Figure 19. Figure 19: LOCOMO token-F1 vs. LLM-judge accuracy across all system × dense-size cells. Retrieval baselines (red/orange) sit near the y=x line: their token-F1 and LLM-judge scores roughly agree. User￾as-Engram OPT and Joint OPT (blue/cyan) sit systematically above the line, indicating that token-F1 over-credits Engram. The asymmetry is the metric mismatch documented in Section 5.5. d8 178M d12 v2 339M d12@1280 625M … view at source ↗
Figure 20
Figure 20. Figure 20: LOCOMO single-hop under an LLM judge (Qwen2.5-14B), matched multi-token pipeline. First-token OPT lets token-F1 over-credit Engram for a correct first token over a noisy continuation ( [PITH_FULL_IMAGE:figures/full_fig_p032_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: LogitLens KL by layer (Mini-Engram-d8 vs. base-d8). Engram converges faster at layer 3 (– 3.66 KL gap), confirming Cheng et al.’s effective-deepening claim at our scale (stage 4 of the mechanism above). because an early injection must survive the rest of the transformer stack and largely washes out. The depth at which the Engram reads is not incidental—it is where a value-path edit can override the predic… view at source ↗
Figure 22
Figure 22. Figure 22: Mini-Engram pretraining curves. Engram d8 reaches lower validation bpb than base d8 at iso-FLOPs (matching the paper’s finding at much smaller scale); engram d12 is substantially better thanks to the larger backbone and longer training. and the larger d12 Engram model—more backbone, longer training—is substantially better (0.849 bpb). These checkpoints are the base models for the row-insertion results in … view at source ↗
Figure 23
Figure 23. Figure 23: Top-1 and top-5 recall at 100 facts/user, Mini-Engram-d12. RANDOM (control) confirms our signal isn’t noise; UNEMBED_P alone is too weak; OPT-15 helps; Joint OPT closes most of the gap to full LoRA fine-tuning at 161× less storage. doctor=Patel spice=saffron Globex hours=9 Stark HQ=Manhattan 0.0 0.2 0.4 0.6 0.8 1.0 Top-1 recall (over 5 paraphrases/fact) single-trigger insert multi-trigger insert (5×) [PI… view at source ↗
Figure 24
Figure 24. Figure 24: Paraphrase generalization across 4 facts × 5 paraphrases each. Single-trigger insertion gets 50% top-1 for free (suffix N-gram overlap); multi-trigger insertion (insert at all 5 paraphrases) gets 100% at 5× the per-fact OPT cost. Appendix F: Multi-domain composition [PITH_FULL_IMAGE:figures/full_fig_p036_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Multi-domain additive composition heatmap. Each cell shows per-domain top-1 recall when D domains are stacked. Heavy degradation when domains share trigger templates (high address overlap). Disjoint-template domains (corp + user, demonstrated in main text) compose without loss. Appendix G: Serving latency CDF [PITH_FULL_IMAGE:figures/full_fig_p037_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Per-request latency CDF on Mini-Engram-d12 (30 users × 50 facts × 600 requests). Override apply (blue) and restore (orange) are both sub-3 ms at p99; the forward pass (grey) dominates total latency (black). 0.5 1.0 1.5 2.0 Override apply latency (ms) 0 20 40 60 80 100 120 # requests (a) Override apply median = 2.25 ms apply forward restore 0.0 2.5 5.0 7.5 10.0 12.5 15.0 Median latency (ms) 2.3 16.5 2.3 (b… view at source ↗
Figure 27
Figure 27. Figure 27: Per-request latency breakdown, Mini-Engram-d12 (30 users × 50 facts × 600 requests). (a) Override-apply latency: median 2.2 ms. (b) Median per-request latency by component: apply (2.3 ms) + forward (16.5 ms) + restore (2.3 ms); the frozen forward pass dominates, and the override apply+restore overhead (∼4.6 ms) is under 20% of the 23.2 ms end-to-end median. Question: <indirect question from indirect_qa> A… view at source ↗
Figure 28
Figure 28. Figure 28: Indirect-reasoning accuracy (indirect_any) vs. avg context tokens per query on the 20-user split. The layered design and the shared LoRA alone anchor the 0-context end; RAG top-3 + shared LoRA lifts accuracy to 54% at 44 context tokens; Qwen-3B + RAG reaches 55–57% at 82–390 tokens on a ∼2.5× larger backbone. Naive RAG on the same Mini-Engram-d20 base sits below the layered design: putting facts in contex… view at source ↗
Figure 29
Figure 29. Figure 29: Joint OPT convergence on Mini-Engram-d12. At 100 facts/user, loss converges to 0.8 within 1500 steps; at 1000 facts, loss saturates around 2.0 due to higher within-user fact-density interference (Section 5.2). Appendix L: Joint OPT convergence [PITH_FULL_IMAGE:figures/full_fig_p041_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Multi-fact-in-the-loss finetune vs. baseline d12@1280 on the Joint-OPT density curve (matched OPT-step schedule). At a fixed inference budget MF improves recall at high density (+14% relative at n=1000, 8k steps). With the budget unpinned (12k steps) both reach ∼34.5% at n=1000: MF accelerates convergence rather than raising the asymptotic ceiling. val_bpb is preserved (0.774 vs. 0.770). Reading: the arch… view at source ↗
Figure 31
Figure 31. Figure 31: Engram capacity × token-budget response surface, at Mini-Engram-d8 v2 (left) and d12 v2 (right). Top row: E1 USER OPT top-1 (100-fact recall). Bottom row: LOCOMO Joint OPT F1. The same large (50 K × 256) Engram size wins at the highest token budget for both dense scales; tiny is under-capacity, xlarge is over-provisioned. The optimum scales in tokens, not capacity [PITH_FULL_IMAGE:figures/full_fig_p044_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Per-fact independent OPT recall vs. fact count n. Left: USER OPT top-1. Right: ORG OPT top-1. The d12@1280 and d20@1536 optimal cells (green, purple) hold near-1.00 recall from n = 100 to n = 1000. No ceiling visible in this per-fact independent-OPT setting. ins-OPT 1.00 vs. 0.62—the larger table is under-trained per slot. The cross-over occurs around 1 B tokens for d8 and 1.32 B tokens for d12. Practical… view at source ↗
Figure 33
Figure 33. Figure 33: Dense-size scaling at the ablation-optimal recipe. Left: LOCOMO Joint OPT first-token token-F1 climbs from 0.134 (d8 v2) to 0.233 (d20@1536); MEMMACHINE_LIKE retrieval baseline climbs more slowly because its quality is bounded by sentence-encoder retrieval, not the LM. Right: val_bpb (grey) drops monotonically with dense+tokens, while E1 USER OPT single-fact recall (blue) saturates at 1.00 from d12@1280 o… view at source ↗
Figure 34
Figure 34. Figure 34: Shared-LoRA rank ablation (the layered design, n=20). The layered design preserves 100% direct recall at every rank; indirect_any peaks at r=16 (44%) while contamination grows with rank. The r=64 surprise: 4× the capacity gives worse reasoning (35%) at 2.7× the contamination, plausibly because capacity exceeds what the 510-sample shared corpus can constrain. lands at position 0 or never appears in the win… view at source ↗
Figure 35
Figure 35. Figure 35: Cost-quality trade-off for the layered architecture (Section 6). Same Mini-Engram-d20 base, n=20 test users, six method combinations. The layered design gives the best trade-off: it matches the highest indirect-reasoning accuracy (44% indirect_any) at low storage (88 KB/user) and low contamination (∆bpb +0.39). It beats per-user LoRA on every measure. The naive LoRA+Engram stack (the “add Engram on top of… view at source ↗
Figure 36
Figure 36. Figure 36: The layered claim survives a personal → medical schema shift (shared LoRA trained on personal-schema users, tested on medical-schema users). The layered design’s indirect_any decays from 44% to 31% (a realistic 30% relative drop) while per-user LoRA goes 6% → 4%, so the layered design’s lead is preserved (7.4× → 7.6×); locality is unchanged (the layered design’s ∆bpb equals the shared LoRA’s, +0.386). the… view at source ↗
Figure 37
Figure 37. Figure 37: Per-user storage scales linearly with user count. Engram override is ∼161× smaller at 100 facts/user (88 KB vs. 14.2 MB) and the gap widens to ∼1700× at 10 facts/user (LoRA’s parameter count is fact-count-independent while Engram grows linearly). At 1 M users × 100 facts/user, Engram storage is 100 GB vs. per-user LoRA’s 14.2 TB [PITH_FULL_IMAGE:figures/full_fig_p050_37.png] view at source ↗
read the original abstract

Personal memory in a language model is two problems: content and reasoning skill. The brain keeps the two apart (a sparse, local engram in the hippocampus for each episode, a slow neocortex for the shared skills that interpret it), so a new fact need not overwrite everything else. Most personalization today keeps a user's facts outside the weights, in a natural-language memory file or a retrieval index. When facts are written into the model instead, the standard recipe is the per-user LoRA adapter, which does the opposite of the brain, folding content and skill into one global weight delta. Writing a user's facts as a LoRA contaminates text unrelated to them; writing the same facts as local Engram rows leaves it mathematically untouched, resulting in a roughly 33,000x smaller memory footprint. We therefore propose User as Engram: store a user's content as surgical edits to the hash-keyed memory table of an Engram model, and carry the reasoning skill in one shared adapter. This layered design matches per-user LoRA's direct recall while delivering 5.6x higher indirect-reasoning accuracy on average, and never makes a single user worse at reasoning than the untouched base. The edit is a glass box: writing a fact switches on its lookup at exactly the trigger, adds the value the answer needs, leaves every other position unchanged to the last bit, and fails if written into the wrong layer. Because different users' facts land in disjoint hash slots, their edits compose: many users live in one shared table at once, stacking additively and losslessly, where a per-user LoRA, a single global weight delta, admits only one. Upon retrieval, a per-user Engram table does not grow with the population the retriever must search, so past ~100 facts it overtakes a retrieval pipeline on a 2.5x larger model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes 'User as Engram,' a layered personalization method that stores per-user facts as surgical edits to the hash-keyed memory table of an Engram model while carrying reasoning skill in one shared adapter. It claims this matches per-user LoRA on direct recall, delivers 5.6x higher indirect-reasoning accuracy on average, produces a roughly 33,000x smaller memory footprint, enables lossless additive composition across users, and never degrades any user's reasoning relative to the base model. The edits are described as glass-box operations that affect only exact trigger positions.

Significance. If the isolation property and empirical results hold, the approach could meaningfully advance scalable multi-user personalization by separating content storage from reasoning skill, avoiding the interference and memory scaling issues of per-user LoRA while outperforming retrieval on larger models past ~100 facts. The lossless composition and 'never worse' guarantee would be particularly valuable if rigorously shown.

major comments (2)
  1. [Abstract / Engram model description] Abstract and Engram model section: the central claims of bit-exact isolation ('leaves every other position unchanged to the last bit'), 33,000x footprint reduction, and 'never makes a single user worse' all rest on the existence of a hash-keyed memory table enabling surgical edits with no side effects on unrelated positions or layers; no construction, hashing scheme, layer placement, or edit operator is provided to demonstrate how arbitrary facts achieve this isolation.
  2. [Abstract] Abstract: the quantitative performance claims (5.6x indirect-reasoning accuracy, matching direct recall, no degradation) are stated without any reference to datasets, baselines, number of users/facts, evaluation protocol, or error bars, making it impossible to assess whether the layered design actually delivers the reported advantages over per-user LoRA.
minor comments (1)
  1. [Notation / Model description] The term 'Engram rows' and 'hash-keyed memory table' are introduced without a formal definition, pseudocode, or equation showing the mapping from fact to edit and the exact lookup mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and the clear identification of areas where additional detail is required. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract / Engram model description] Abstract and Engram model section: the central claims of bit-exact isolation ('leaves every other position unchanged to the last bit'), 33,000x footprint reduction, and 'never makes a single user worse' all rest on the existence of a hash-keyed memory table enabling surgical edits with no side effects on unrelated positions or layers; no construction, hashing scheme, layer placement, or edit operator is provided to demonstrate how arbitrary facts achieve this isolation.

    Authors: The referee is correct that the abstract and Engram model section currently provide only a high-level description of the hash-keyed table and do not specify the concrete construction, hashing scheme, layer placement, or edit operator. We will revise the Engram model section to include these details: a deterministic hash function that maps fact keys to disjoint slots, placement within designated feed-forward sublayers, and an additive edit operator applied only to the value vector at the hashed index. This addition will make the bit-exact isolation property explicit and verifiable. revision: yes

  2. Referee: [Abstract] Abstract: the quantitative performance claims (5.6x indirect-reasoning accuracy, matching direct recall, no degradation) are stated without any reference to datasets, baselines, number of users/facts, evaluation protocol, or error bars, making it impossible to assess whether the layered design actually delivers the reported advantages over per-user LoRA.

    Authors: We agree that the abstract presents the numerical claims without the necessary experimental context. We will revise the abstract to reference the evaluation setup, including the datasets and tasks used, the number of users and facts, the per-user LoRA and retrieval baselines, the evaluation protocol, and the fact that error bars appear in the main results section. revision: yes

Circularity Check

0 steps flagged

No circularity; proposal rests on external model assumption and empirical claims

full rationale

The manuscript proposes a layered architecture that stores user facts as edits to a hash-keyed Engram memory table while sharing one adapter for reasoning skill. All performance claims (5.6x indirect-reasoning gain, 33,000x footprint reduction, lossless composition, bit-exact isolation) are presented as direct consequences of the stated properties of that table rather than derived from any equation, fitted parameter, or self-citation chain inside the paper. No self-definitional loops, renamed predictions, or load-bearing self-citations appear; the central premise is an engineering assumption about the existence and behavior of the Engram substrate, not a reduction of the target result to itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of an Engram model whose memory table supports hash-keyed surgical edits that are guaranteed to leave all other positions unchanged; this property is asserted but not derived from prior literature in the abstract.

axioms (1)
  • domain assumption The brain separates episodic content (hippocampus) from shared reasoning skill (neocortex).
    Used to motivate why content and skill should be stored separately in the model.
invented entities (1)
  • Engram model with hash-keyed memory table no independent evidence
    purpose: To enable local parametric edits for user facts that are mathematically isolated from other positions.
    Introduced as the base architecture that makes the surgical-edit property possible; no independent evidence supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5872 in / 1413 out tokens · 23363 ms · 2026-06-26T20:34:33.813886+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Memory Depth, Not Memory Access: Selective Parametric Consolidation for Long-Running Language Agents

    cs.AI 2026-06 unverdicted novelty 6.0

    EVAF, a surprise- and valence-gated LoRA mechanism, provides memory depth for goal persistence in language agents via the loop-drift protocol, complementary to retrieval.

Reference graph

Works this paper leans on

96 extracted references · 20 linked inside Pith · cited by 1 Pith paper

  1. [1]

    B. Li. User as code: Executable memory for personalized agents. arXiv:2606.16707, 2026

  2. [2]

    R. Semon. The Mneme. George Allen & Unwin, London, 1921. (English translation of Die Mneme, 1904; origin of the term ``engram'')

  3. [3]

    J. L. McClelland, B. L. McNaughton, and R. C. O'Reilly. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3):419--457, 1995

  4. [4]

    Tonegawa, X

    S. Tonegawa, X. Liu, S. Ramirez, and R. Redondo. Memory engram cells have come of age. Neuron, 87(5):918--931, 2015

  5. [5]

    Kumaran, D

    D. Kumaran, D. Hassabis, and J. L. McClelland. What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends in Cognitive Sciences, 20(7):512--534, 2016

  6. [6]

    Cheng, W

    X. Cheng, W. Zeng, D. Dai, Q. Chen, B. Wang, Z. Xie, K. Huang, X. Yu, Z. Hao, Y. Li, H. Zhang, H. Zhang, D. Zhao, and W. Liang. Conditional memory via scalable lookup: A new axis of sparsity for large language models. arXiv:2601.07372, January 2026

  7. [7]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. NeurIPS 2017

  8. [8]

    Brown et al

    T. Brown et al. Language models are few-shot learners. NeurIPS 2020

  9. [9]

    Radford, J

    A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners ( GPT-2 ). OpenAI technical report, 2019

  10. [10]

    S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? EMNLP 2022

  11. [11]

    uttler, M. Lewis, W. Yih, T. Rockt\

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. K\"uttler, M. Lewis, W. Yih, T. Rockt\"aschel, S. Riedel, and D. Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS 2020

  12. [12]

    Gao et al

    Y. Gao et al. Retrieval-augmented generation for large language models: A survey. arXiv:2312.10997, 2023

  13. [13]

    Shazeer, A

    N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. ICLR 2017

  14. [14]

    Dai et al

    D. Dai et al. DeepSeekMoE : Towards ultimate expert specialization in mixture-of-experts language models. arXiv:2401.06066, 2024

  15. [15]

    Mangrulkar et al

    S. Mangrulkar et al. PEFT : State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022

  16. [16]

    Jordan et al

    K. Jordan et al. Muon : An optimiser for hidden layers in neural networks. GitHub / blog post, 2024

  17. [17]

    Huang et al

    C. Huang et al. LoRAHub : Efficient cross-task generalization via dynamic LoRA composition. arXiv:2307.13269, 2023

  18. [18]

    Wu et al

    D. Wu et al. LongMemEval : Benchmarking chat assistants on long-term memory. arXiv 2024

  19. [19]

    Maharana et al

    A. Maharana et al. LOCOMO : Evaluating very long-term conversational memory of LLM agents. ACL 2024

  20. [20]

    Tavakoli, A

    M. Tavakoli, A. Salemi, C. Ye, M. Abdalla, H. Zamani, and J. R. Mitchell. Beyond a million tokens: Benchmarking and enhancing long-term memory in LLM s ( BEAM ). arXiv:2510.27246, 2025

  21. [21]

    Jiang, Y

    B. Jiang, Y. Yuan, M. Shen, Z. Hao, Z. Xu, Z. Chen, Z. Liu, A. R. Vijjini, J. He, H. Yu, R. Poovendran, G. Wornell, L. Ungar, D. Roth, S. Chen, and C. J. Taylor. PersonaMem-v2 : Towards personalized intelligence via learning implicit user personas and agentic memory. arXiv:2512.06688, 2025

  22. [22]

    Karpathy

    A. Karpathy. nanochat: an experimental training harness for LLMs. https://github.com/karpathy/nanochat, 2026

  23. [23]

    E. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. LoRA : Low-rank adaptation of large language models. ICLR 2022

  24. [24]

    Houlsby et al

    N. Houlsby et al. Parameter-efficient transfer learning for NLP . ICML 2019

  25. [25]

    Su et al

    W. Su et al. Parametric retrieval-augmented generation ( PRAG ). arXiv:2501.15915, 2025

  26. [26]

    Tan et al

    Z. Tan et al. DyPRAG : Dynamic parametric RAG . arXiv:2505.19386, 2025

  27. [27]

    J. Chen, H. Zhang, L. Pang, Y. Tong, H. Zhou, Y. Zhan, W. Lin, and Z. Zheng. Privacy-preserving reasoning with knowledge-distilled parametric retrieval-augmented generation ( DistilledPRAG ). arXiv:2509.01088, 2025

  28. [28]

    Z. Tan, Q. Liu, and M. Jiang. Democratizing large language models via personalized parameter-efficient fine-tuning ( OPPU ). EMNLP 2024 (arXiv:2402.04401)

  29. [29]

    Zhuang et al

    Y. Zhuang et al. HYDRA : Per-user adapters for personalised LLMs . arXiv 2024

  30. [30]

    M. Bini, O. Bohdal, U. Michieli, Z. Akata, M. Ozay, and T. Ceritli. MemLoRA : Distilling expert adapters for on-device memory systems. arXiv:2512.04763, 2025

  31. [31]

    Charakorn, E

    R. Charakorn, E. Cetin, Y. Tang, and R. T. Lange. Text-to- LoRA : Instant transformer adaption. arXiv:2506.06105, 2025

  32. [32]

    Tan et al

    Z. Tan et al. PER-PCS : Per-user post-hoc LoRA composition. arXiv 2024

  33. [33]

    Sheng, S

    Y. Sheng, S. Cao, D. Li, et al. S-LoRA : Serving thousands of concurrent LoRA adapters. arXiv:2311.03285, 2024

  34. [34]

    L. Chen, Z. Ye, Y. Wu, et al. Punica: Multi-tenant LoRA serving. MLSys 2024

  35. [35]

    Lample, A

    G. Lample, A. Sablayrolles, M. Ranzato, L. Denoyer, and H. J\'egou. Large memory layers with product keys. NeurIPS 2019

  36. [36]

    P. He. PEER : Mixture of one million experts. arXiv 2024

  37. [37]

    Berges, B

    V. Berges, B. O g uz, D. Haziza, W. Yih, L. Zettlemoyer, and G. Ghosh. Memory layers at scale. ICML 2025

  38. [38]

    Huang, Q

    Z. Huang, Q. Min, H. Huang, D. Zhu, Y. Zeng, R. Guo, and X. Zhou. Ultra-sparse memory network ( Ultra-Mem ). arXiv:2411.12364, 2024 (ICLR 2025)

  39. [39]

    Huang et al

    J. Huang et al. OverEncoding : hashed N-gram embeddings via averaging. 2025

  40. [40]

    Yu et al

    L. Yu et al. SCONE : scalable contextual N-gram embeddings. 2025

  41. [41]

    Pagnoni, R

    A. Pagnoni, R. Pasunuru, P. Rodriguez, et al. BLT : byte latent transformer with hashed N-gram embeddings. arXiv:2412.09871, 2025

  42. [42]

    Liu et al

    A. Liu et al. SuperBPE : word-level BPE for compositional patterns. 2025

  43. [43]

    LoRA stacking patterns for Stable Diffusion

    CivitAI Community . LoRA stacking patterns for Stable Diffusion. https://civitai.com/, 2024

  44. [44]

    K. Meng, D. Bau, A. Andonian, and Y. Belinkov. Locating and editing factual associations in GPT ( ROME ). NeurIPS 2022

  45. [45]

    Meng et al

    K. Meng et al. MEMIT : Mass-editing memory in a transformer. ICLR 2023

  46. [46]

    Cohen et al

    R. Cohen et al. Evaluating the ripple effects of knowledge editing in language models ( MQuAKE ). 2023

  47. [47]

    Cohen et al

    R. Cohen et al. RippleEdits : A benchmark for ripple effects of model editing. 2024

  48. [48]

    Meng et al

    K. Meng et al. CounterFact : a counterfactual editing benchmark. 2022

  49. [49]

    Wu, J.-C

    D. Wu, J.-C. Gu, K.-W. Chang, and N. Peng. Self-routing RAG : Binding selective retrieval with knowledge verbalization. arXiv:2504.01018, 2025

  50. [50]

    Sun et al

    Z. Sun et al. Recitation-augmented language models. ICLR 2023

  51. [51]

    A. Yang, B. Yang, B. Zhang, et al. Qwen2.5 technical report. arXiv:2412.15115, 2025

  52. [52]

    A. Yang, A. Li, B. Yang, et al. Qwen3 technical report. arXiv:2505.09388, 2025

  53. [53]

    Grattafiori, A

    A. Grattafiori, A. Dubey, A. Jauhri, et al. The Llama 3 herd of models. arXiv:2407.21783, 2024

  54. [54]

    A. Q. Jiang, A. Sablayrolles, A. Mensch, et al. Mistral 7B. arXiv:2310.06825, 2023

  55. [55]

    DeepSeek-V3 technical report

    DeepSeek-AI. DeepSeek-V3 technical report. arXiv:2412.19437, 2024

  56. [56]

    Reimers and I

    N. Reimers and I. Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. EMNLP-IJCNLP 2019

  57. [57]

    W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou. MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. NeurIPS 2020

  58. [58]

    Zheng, W.-L

    L. Zheng, W.-L. Chiang, Y. Sheng, et al. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. NeurIPS 2023 Datasets and Benchmarks. arXiv:2306.05685

  59. [59]

    Packer, S

    C. Packer, S. Wooders, K. Lin, et al. MemGPT: Towards LLMs as operating systems. arXiv:2310.08560, 2023

  60. [60]

    Chhikara, D

    P. Chhikara, D. Khant, S. Aryan, T. Singh, and D. Yadav. Mem0: Building production-ready AI agents with scalable long-term memory. arXiv:2504.19413, 2025

  61. [61]

    W. Xu, Z. Liang, K. Mei, H. Gao, J. Tan, and Y. Zhang. A-MEM: Agentic memory for LLM agents. arXiv:2502.12110, 2025

  62. [62]

    Rasmussen, P

    P. Rasmussen, P. Paliychuk, T. Beauvais, J. Ryan, and D. Chalef. Zep: A temporal knowledge graph architecture for agent memory. arXiv:2501.13956, 2025

  63. [63]

    Z. Li, S. Song, H. Wang, et al. MemOS: An operating system for memory-augmented generation in large language models. arXiv:2505.22101, 2025

  64. [64]

    S. Wang, E. Yu, O. Love, T. Zhang, T. Wong, S. Scargall, and C. Fan. MemMachine: A ground-truth-preserving memory system for personalized AI agents. arXiv:2604.04853, 2026

  65. [65]

    C. Hu, X. Gao, Z. Zhou, et al. EverMemOS: A self-organizing memory operating system for structured long-horizon reasoning. arXiv:2601.02163, 2026

  66. [66]

    Patel and S

    D. Patel and S. Patel. ENGRAM: Effective, lightweight memory orchestration for conversational agents. arXiv:2511.12960, 2025

  67. [67]

    S. Yan, X. Yang, Z. Huang, et al. Memory-R1: Enhancing large language model agents to manage and utilize memories via reinforcement learning. arXiv:2508.19828, 2025

  68. [68]

    Y. Yu, L. Yao, Y. Xie, et al. Agentic memory: Learning unified long-term and short-term memory management for LLM agents. arXiv:2601.01885, 2026

  69. [69]

    Y. Wang, R. Takanobu, Z. Liang, et al. Mem- : Learning memory construction via reinforcement learning. arXiv:2509.25911, 2025

  70. [70]

    Zhang, X

    Z. Zhang, X. Bo, C. Ma, et al. A survey on the memory mechanism of large language model based agents. arXiv:2404.13501, 2024

  71. [71]

    Y. Wu, S. Liang, C. Zhang, et al. From human memory to AI memory: A survey on memory mechanisms in the era of LLMs. arXiv:2504.15965, 2025

  72. [72]

    P. Du. Memory for autonomous LLM agents: Mechanisms, evaluation, and emerging frontiers. arXiv:2603.07670, 2026

  73. [73]

    Pollertlam and W

    N. Pollertlam and W. Kornsuwannawit. Beyond the context window: A cost-performance analysis of fact-based memory vs.\ long-context LLMs for persistent agents. arXiv:2603.04814, 2026

  74. [74]

    Salemi, S

    A. Salemi, S. Mysore, M. Bendersky, and H. Zamani. LaMP: When large language models meet personalization. ACL 2024. arXiv:2304.11406

  75. [75]

    Zhang, R

    Z. Zhang, R. A. Rossi, B. Kveton, et al. Personalization of large language models: A survey. arXiv:2411.00027, 2024

  76. [76]

    J. Liu, Z. Qiu, Z. Li, et al. A survey of personalized large language models: Progress and future directions. arXiv:2502.11528, 2025

  77. [77]

    Y. Xu, Q. Chen, Z. Ma, et al. Toward personalized LLM-powered agents: Foundations, evaluation, and future directions. arXiv:2602.22680, 2026

  78. [78]

    Mitchell, C

    E. Mitchell, C. Lin, A. Bosselut, C. Finn, and C. D. Manning. Fast model editing at scale. ICLR 2022

  79. [79]

    Mitchell, C

    E. Mitchell, C. Lin, A. Bosselut, C. D. Manning, and C. Finn. Memory-based model editing at scale. ICML 2022

  80. [80]

    D. Dai, L. Dong, Y. Hao, Z. Sui, B. Chang, and F. Wei. Knowledge neurons in pretrained transformers. ACL 2022

Showing first 80 references.