User as Engram: Internalizing Per-User Memory as Local Parametric Edits

Bojie Li

arxiv: 2606.19172 · v1 · pith:UP35RQTGnew · submitted 2026-06-17 · 💻 cs.AI

User as Engram: Internalizing Per-User Memory as Local Parametric Edits

Bojie Li This is my paper

Pith reviewed 2026-06-26 20:34 UTC · model grok-4.3

classification 💻 cs.AI

keywords personalizationmemoryparametric editslanguage modelsLoRAEngram modeluser-specific knowledge

0 comments

The pith

Storing each user's facts as precise local edits inside an Engram model's hash-keyed table, while keeping reasoning skill in one shared adapter, matches LoRA direct recall yet raises indirect-reasoning accuracy 5.6 times and shrinks memory

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that language-model personalization splits into two distinct problems: holding a user's private facts and carrying the shared skill to use those facts in new contexts. Current approaches either leave facts outside the weights or fold them into a per-user LoRA that alters every token the model sees. The proposed method writes facts instead as surgical edits to the hash-keyed memory table of an Engram model and leaves the reasoning adapter unchanged across users. This separation produces edits that activate only at their exact trigger, leave every other weight bit-for-bit identical, and compose additively when many users occupy the same table. A reader would care because the design promises scalable, non-interfering per-user memory without the contamination or storage growth of existing recipes.

Core claim

User facts can be internalized by writing them as local parametric edits to the hash-keyed memory table of an Engram model while the reasoning skill remains in a single shared adapter. The edits match the direct-recall performance of per-user LoRAs, deliver 5.6 times higher accuracy on indirect reasoning tasks on average, never degrade any user below the untouched base model, and occupy roughly 33,000 times less memory. Because different users land in disjoint hash slots their edits stack losslessly; upon retrieval the per-user table does not grow with population size and therefore overtakes retrieval pipelines once the fact count exceeds roughly 100.

What carries the argument

Surgical edits to the hash-keyed memory table of an Engram model, which activate lookup only at the exact trigger position, add the required value, and leave every other weight position mathematically unchanged to the last bit.

If this is right

Direct recall of stored facts equals that of a dedicated per-user LoRA.
Indirect reasoning accuracy rises 5.6 times on average while no user falls below base-model performance.
Multiple users' facts occupy disjoint hash slots and compose additively without interference.
Memory footprint remains roughly 33,000 times smaller than per-user adapters.
Past approximately 100 facts the per-user table overtakes retrieval on a 2.5 times larger model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Dynamic insertion and deletion of individual facts becomes possible without retraining or global weight changes.
The same table could support verifiable fact provenance because each edit is isolated and reversible.
Retrieval pipelines could be replaced by the table for populations larger than a few hundred users without proportional search cost growth.

Load-bearing premise

An Engram model already exists or can be built whose hash-keyed memory table accepts arbitrary user facts as edits that affect only the intended trigger positions and leave all other weights untouched.

What would settle it

A controlled test that writes a new fact into the table and then measures whether any token unrelated to that fact changes its output probability or whether the edit fails to activate exactly at the trigger position.

Figures

Figures reproduced from arXiv: 2606.19172 by Bojie Li.

**Figure 1.** Figure 1: User as Engram splits personal memory into two layers, mirroring the brain’s complementary learning systems. Top (content). Each user’s facts are written as a few local row overrides (colored) in the Engram model’s one hash-keyed memory table; the table’s other rows hold the general knowledge learned at pretraining (gray). A write touches only that user’s rows (∆bpb = +0.0001 on all other text), and differ… view at source ↗

**Figure 2.** Figure 2: Where User as Engram sits among personal-memory methods. Context-based methods (in-context learning, retrieval, memory systems) leave the model untouched but pay context tokens at query time (top-left); weight-based methods (per-user LoRA, knowledge editing) pay no context but edit the model globally (bottom-right). User as Engram occupies the remaining corner: a local edit at zero context cost. Axes are q… view at source ↗

**Figure 3.** Figure 3: Architectural contamination on held-out text, Mini-Engram-d20 (canonical seed S0, n=20 users). A per-user LoRA more than triples val bpb on text unrelated to the user’s facts (+1.784; per-user range [+0.44, +3.74]; 17/20 users worse), while a per-user Engram row leaves it unchanged to four decimals (+0.00005; 0/20 worse)—∼34,000× less, by design. The 3-seed mean ratio is ∼33,000× (Appendix N). throughout. … view at source ↗

**Figure 4.** Figure 4: Cross-base LoRA scaling: mean change in indirect recall (adapter − base) with the fraction of users it hurts. On the base LM (Mini-Engram-d20) a per-user LoRA disrupts fragile completion behavior (85% worse, ∆=−0.133); on every instruction-tuned base the reasoning skill absorbs the perturbation (0–20% worse, ∆ from +0.088 to +0.217). (more than tripling here), and it rises a little further with each fact a… view at source ↗

**Figure 5.** Figure 5: Where the Engram lives in the transformer. The backbone is a standard transformer; at a few designated Engram layers (⋆) a content-addressable lookup runs alongside attention and the MLP. At a position, the recent tokens’ suffix N-gram is hashed by K heads into K table addresses (deterministic, hence known before the forward pass); the retrieved rows are projected by WK, WV, gated by α, and added to the re… view at source ↗

**Figure 6.** Figure 6: Inserting a user fact into the Engram. A fact is reduced to where and what: the trigger’s suffix N-gram hashes to a sparse set of row addresses Rf (where), and one of three strategies writes a value e ⋆ into those rows (what). UNEMBED_P solves for e ⋆ in closed form; OPT takes a few gradient steps per fact; Joint OPT—our default beyond ∼30 facts/user— optimizes all of the user’s rows together so they do no… view at source ↗

**Figure 7.** Figure 7: Addressed write vs. global function-bend. Per-layer, per-position residual-stream change ∥x (ℓ) after − x (ℓ) before∥ on the same trigger sentence and base (Mini-Engram-d12@1280, log color scale). (a) An Engram row insertion is exactly 0.000 at every position before the Engram layer (causality) and at every non-trigger position after it—only the trigger column moves. (b) A per-user LoRA fit to the same sin… view at source ↗

**Figure 8.** Figure 8: The mechanism on the trained model (Mini-Engram-d20, 16 facts). (a) The write opens its own gate: the trigger-position gate α rises from 0.02 to 0.99 (OPT) while non-trigger positions stay at 0.04. (b) The deployed row’s residual change is cosine 0.999 to its value-path projection WVe (left bar); what the row encodes relative to the gold token’s unembedding is exact for the closed-form solution (0.59) and … view at source ↗

**Figure 9.** Figure 9: EngramServer architecture. Per user, an override map of (row index, row vector) pairs lives in DRAM. On each request, the server saves the originals at the affected addresses (∼2 ms), writes the user’s overrides, runs the forward pass, then restores. The Engram lookup at the configured layer transparently sees the override values; the gate fires only at the trigger N-gram. There is no router and no graph r… view at source ↗

**Figure 10.** Figure 10: Within-user fact-density: top-1 (left) and top-5 (right) recall vs. number of facts inserted simultaneously into one user’s override map. Joint OPT (blue) closes most of the gap to LoRA rank-64 (red dashed) at 161× less storage (88 KB vs. 14.2 MB at 100 facts). is what happens when many facts are live at once: the rows in a single user’s override table interfere during the forward pass. We sweep that dens… view at source ↗

**Figure 11.** Figure 11: Cost-quality trade-off across personal-memory methods at the same answering LM (MiniEngram-d20@1536). User-as-Engram Joint OPT matches a per-user LoRA’s LOCOMO F1 at 161× smaller per-user storage (88 KB vs. 14.2 MB at 100 facts). Retrieval baselines store each fact at a higher per-fact cost than an Engram row and still cap out below Engram on quality. Composition. Two users’ tables, or a user’s facts and… view at source ↗

**Figure 12.** Figure 12: Direct recall vs. memory systems, all sharing the same Mini-Engram-d12 answering LM (100 USER facts, XXL corpus). a When the query is the fact’s stored prefix, nearest-neighbor retrieval is near-perfect and beats Engram’s 68% top-1—but at a context-token cost Engram does not pay. b When the query is a paraphrase, retrieval drops to 60–75% while multi-trigger Engram insertion reaches 96.9% top-1. The story… view at source ↗

**Figure 13.** Figure 13: LOCOMO single-hop, full 10 conversations, scaling with Mini-Engram dense size. (a) Token-F1: User-as-Engram Joint OPT (blue) beats every retrieval baseline at every scale. (b) LLM-judge accuracy (Qwen2.5-14B-Instruct): the ranking flips—retrieval baselines (MEM0_LIKE, MEMMACHINE) beat Joint OPT by 0.05–0.10 because token-F1 over-credits Engram for the correct first answer token despite a noisy continuatio… view at source ↗

**Figure 14.** Figure 14: LOCOMO category breakdown (token-F1), Engram Joint OPT (blue) vs. the best retrieval baseline (red; max of MEM0_LIKE and MEMMACHINE_LIKE) across three dense scales. Engram wins single-hop, multi-hop, and reasoning at every scale—the per-fact (question, answer) loss encodes the question→answer map that retrieval must chain across evidence sentences—but loses open-domain, where the answer is a verbatim span… view at source ↗

**Figure 15.** Figure 15: Multi-hop reasoning over Engram-inserted facts (n=63 chained pairs on Mini-Engram-d20, two OPT-15 insertions per item). Per-fact direct recall is 99.2% (both facts are insertable), so the gap is chaining, not insertion. When the query’s suffix N-gram coincides with Fact-2’s trigger, the gate fires and recall is 90.6%; when true chaining is required, it collapses to 12.9% (Wilson 95% CIs shown). The gate i… view at source ↗

**Figure 16.** Figure 16: The six parametric conditions on n=20 test users (Mini-Engram-d20, seed S0; bars left-to-right as listed in the text). Left: the layered design (Engram + shared LoRA) matches per-user LoRA’s direct recall (100% vs. 99%) while delivering 7.4× its indirect_any on this seed (44% vs. 6%); neither the per-user Engram (no skill) nor the shared LoRA alone (never saw the user’s facts) suffices. Right: the layered… view at source ↗

**Figure 17.** Figure 17: Indirect-reasoning accuracy and retrieval recall vs. KB size (n=20 users, 20 indirect probes each; KB = test user’s 34 facts + distractors sampled from the 30-user schema-family pool, log x-axis, N ∈ {34, 100, 200, 300, 500, 1000}). (a) the layered design (horizontal solid line at 44%) is invariant to KB size; the retrieval-based conditions degrade monotonically. Naive RAG and Qwen-3B + RAG fall below the… view at source ↗

**Figure 18.** Figure 18: EngramServer throughput across four deployment scales on one idle Blackwell GPU. On the ablation-optimal d12@1280 the server reaches 232 req/s (30 u/50 f) and holds 226 req/s at 100 u/100 f— within 3% as tenants triple—at 4.4 ms p50 latency, and a sub-millisecond override apply (0.03 ms p50; the larger per-component figures in Appendix G are from an earlier, slower benchmark on the smaller d12 v2 checkpoi… view at source ↗

**Figure 19.** Figure 19: LOCOMO token-F1 vs. LLM-judge accuracy across all system × dense-size cells. Retrieval baselines (red/orange) sit near the y=x line: their token-F1 and LLM-judge scores roughly agree. Useras-Engram OPT and Joint OPT (blue/cyan) sit systematically above the line, indicating that token-F1 over-credits Engram. The asymmetry is the metric mismatch documented in Section 5.5. d8 178M d12 v2 339M d12@1280 625M … view at source ↗

**Figure 20.** Figure 20: LOCOMO single-hop under an LLM judge (Qwen2.5-14B), matched multi-token pipeline. First-token OPT lets token-F1 over-credit Engram for a correct first token over a noisy continuation ( [PITH_FULL_IMAGE:figures/full_fig_p032_20.png] view at source ↗

**Figure 21.** Figure 21: LogitLens KL by layer (Mini-Engram-d8 vs. base-d8). Engram converges faster at layer 3 (– 3.66 KL gap), confirming Cheng et al.’s effective-deepening claim at our scale (stage 4 of the mechanism above). because an early injection must survive the rest of the transformer stack and largely washes out. The depth at which the Engram reads is not incidental—it is where a value-path edit can override the predic… view at source ↗

**Figure 22.** Figure 22: Mini-Engram pretraining curves. Engram d8 reaches lower validation bpb than base d8 at iso-FLOPs (matching the paper’s finding at much smaller scale); engram d12 is substantially better thanks to the larger backbone and longer training. and the larger d12 Engram model—more backbone, longer training—is substantially better (0.849 bpb). These checkpoints are the base models for the row-insertion results in … view at source ↗

**Figure 23.** Figure 23: Top-1 and top-5 recall at 100 facts/user, Mini-Engram-d12. RANDOM (control) confirms our signal isn’t noise; UNEMBED_P alone is too weak; OPT-15 helps; Joint OPT closes most of the gap to full LoRA fine-tuning at 161× less storage. doctor=Patel spice=saffron Globex hours=9 Stark HQ=Manhattan 0.0 0.2 0.4 0.6 0.8 1.0 Top-1 recall (over 5 paraphrases/fact) single-trigger insert multi-trigger insert (5×) [PI… view at source ↗

**Figure 24.** Figure 24: Paraphrase generalization across 4 facts × 5 paraphrases each. Single-trigger insertion gets 50% top-1 for free (suffix N-gram overlap); multi-trigger insertion (insert at all 5 paraphrases) gets 100% at 5× the per-fact OPT cost. Appendix F: Multi-domain composition [PITH_FULL_IMAGE:figures/full_fig_p036_24.png] view at source ↗

**Figure 25.** Figure 25: Multi-domain additive composition heatmap. Each cell shows per-domain top-1 recall when D domains are stacked. Heavy degradation when domains share trigger templates (high address overlap). Disjoint-template domains (corp + user, demonstrated in main text) compose without loss. Appendix G: Serving latency CDF [PITH_FULL_IMAGE:figures/full_fig_p037_25.png] view at source ↗

**Figure 26.** Figure 26: Per-request latency CDF on Mini-Engram-d12 (30 users × 50 facts × 600 requests). Override apply (blue) and restore (orange) are both sub-3 ms at p99; the forward pass (grey) dominates total latency (black). 0.5 1.0 1.5 2.0 Override apply latency (ms) 0 20 40 60 80 100 120 # requests (a) Override apply median = 2.25 ms apply forward restore 0.0 2.5 5.0 7.5 10.0 12.5 15.0 Median latency (ms) 2.3 16.5 2.3 (b… view at source ↗

**Figure 27.** Figure 27: Per-request latency breakdown, Mini-Engram-d12 (30 users × 50 facts × 600 requests). (a) Override-apply latency: median 2.2 ms. (b) Median per-request latency by component: apply (2.3 ms) + forward (16.5 ms) + restore (2.3 ms); the frozen forward pass dominates, and the override apply+restore overhead (∼4.6 ms) is under 20% of the 23.2 ms end-to-end median. Question: <indirect question from indirect_qa> A… view at source ↗

**Figure 28.** Figure 28: Indirect-reasoning accuracy (indirect_any) vs. avg context tokens per query on the 20-user split. The layered design and the shared LoRA alone anchor the 0-context end; RAG top-3 + shared LoRA lifts accuracy to 54% at 44 context tokens; Qwen-3B + RAG reaches 55–57% at 82–390 tokens on a ∼2.5× larger backbone. Naive RAG on the same Mini-Engram-d20 base sits below the layered design: putting facts in contex… view at source ↗

**Figure 29.** Figure 29: Joint OPT convergence on Mini-Engram-d12. At 100 facts/user, loss converges to 0.8 within 1500 steps; at 1000 facts, loss saturates around 2.0 due to higher within-user fact-density interference (Section 5.2). Appendix L: Joint OPT convergence [PITH_FULL_IMAGE:figures/full_fig_p041_29.png] view at source ↗

**Figure 30.** Figure 30: Multi-fact-in-the-loss finetune vs. baseline d12@1280 on the Joint-OPT density curve (matched OPT-step schedule). At a fixed inference budget MF improves recall at high density (+14% relative at n=1000, 8k steps). With the budget unpinned (12k steps) both reach ∼34.5% at n=1000: MF accelerates convergence rather than raising the asymptotic ceiling. val_bpb is preserved (0.774 vs. 0.770). Reading: the arch… view at source ↗

**Figure 31.** Figure 31: Engram capacity × token-budget response surface, at Mini-Engram-d8 v2 (left) and d12 v2 (right). Top row: E1 USER OPT top-1 (100-fact recall). Bottom row: LOCOMO Joint OPT F1. The same large (50 K × 256) Engram size wins at the highest token budget for both dense scales; tiny is under-capacity, xlarge is over-provisioned. The optimum scales in tokens, not capacity [PITH_FULL_IMAGE:figures/full_fig_p044_31.png] view at source ↗

**Figure 32.** Figure 32: Per-fact independent OPT recall vs. fact count n. Left: USER OPT top-1. Right: ORG OPT top-1. The d12@1280 and d20@1536 optimal cells (green, purple) hold near-1.00 recall from n = 100 to n = 1000. No ceiling visible in this per-fact independent-OPT setting. ins-OPT 1.00 vs. 0.62—the larger table is under-trained per slot. The cross-over occurs around 1 B tokens for d8 and 1.32 B tokens for d12. Practical… view at source ↗

**Figure 33.** Figure 33: Dense-size scaling at the ablation-optimal recipe. Left: LOCOMO Joint OPT first-token token-F1 climbs from 0.134 (d8 v2) to 0.233 (d20@1536); MEMMACHINE_LIKE retrieval baseline climbs more slowly because its quality is bounded by sentence-encoder retrieval, not the LM. Right: val_bpb (grey) drops monotonically with dense+tokens, while E1 USER OPT single-fact recall (blue) saturates at 1.00 from d12@1280 o… view at source ↗

**Figure 34.** Figure 34: Shared-LoRA rank ablation (the layered design, n=20). The layered design preserves 100% direct recall at every rank; indirect_any peaks at r=16 (44%) while contamination grows with rank. The r=64 surprise: 4× the capacity gives worse reasoning (35%) at 2.7× the contamination, plausibly because capacity exceeds what the 510-sample shared corpus can constrain. lands at position 0 or never appears in the win… view at source ↗

**Figure 35.** Figure 35: Cost-quality trade-off for the layered architecture (Section 6). Same Mini-Engram-d20 base, n=20 test users, six method combinations. The layered design gives the best trade-off: it matches the highest indirect-reasoning accuracy (44% indirect_any) at low storage (88 KB/user) and low contamination (∆bpb +0.39). It beats per-user LoRA on every measure. The naive LoRA+Engram stack (the “add Engram on top of… view at source ↗

**Figure 36.** Figure 36: The layered claim survives a personal → medical schema shift (shared LoRA trained on personal-schema users, tested on medical-schema users). The layered design’s indirect_any decays from 44% to 31% (a realistic 30% relative drop) while per-user LoRA goes 6% → 4%, so the layered design’s lead is preserved (7.4× → 7.6×); locality is unchanged (the layered design’s ∆bpb equals the shared LoRA’s, +0.386). the… view at source ↗

**Figure 37.** Figure 37: Per-user storage scales linearly with user count. Engram override is ∼161× smaller at 100 facts/user (88 KB vs. 14.2 MB) and the gap widens to ∼1700× at 10 facts/user (LoRA’s parameter count is fact-count-independent while Engram grows linearly). At 1 M users × 100 facts/user, Engram storage is 100 GB vs. per-user LoRA’s 14.2 TB [PITH_FULL_IMAGE:figures/full_fig_p050_37.png] view at source ↗

read the original abstract

Personal memory in a language model is two problems: content and reasoning skill. The brain keeps the two apart (a sparse, local engram in the hippocampus for each episode, a slow neocortex for the shared skills that interpret it), so a new fact need not overwrite everything else. Most personalization today keeps a user's facts outside the weights, in a natural-language memory file or a retrieval index. When facts are written into the model instead, the standard recipe is the per-user LoRA adapter, which does the opposite of the brain, folding content and skill into one global weight delta. Writing a user's facts as a LoRA contaminates text unrelated to them; writing the same facts as local Engram rows leaves it mathematically untouched, resulting in a roughly 33,000x smaller memory footprint. We therefore propose User as Engram: store a user's content as surgical edits to the hash-keyed memory table of an Engram model, and carry the reasoning skill in one shared adapter. This layered design matches per-user LoRA's direct recall while delivering 5.6x higher indirect-reasoning accuracy on average, and never makes a single user worse at reasoning than the untouched base. The edit is a glass box: writing a fact switches on its lookup at exactly the trigger, adds the value the answer needs, leaves every other position unchanged to the last bit, and fails if written into the wrong layer. Because different users' facts land in disjoint hash slots, their edits compose: many users live in one shared table at once, stacking additively and losslessly, where a per-user LoRA, a single global weight delta, admits only one. Upon retrieval, a per-user Engram table does not grow with the population the retriever must search, so past ~100 facts it overtakes a retrieval pipeline on a 2.5x larger model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames per-user facts as hash-keyed local edits separate from a shared reasoning adapter, but the isolation property and performance numbers rest on unshown details.

read the letter

The paper's main claim is that user memory can be handled by storing facts as local parametric edits to a hash-keyed table in an Engram model, with reasoning skill kept in one shared adapter. This is supposed to avoid contaminating unrelated text, allow many users to share the table losslessly, and give better indirect reasoning than per-user LoRAs while using much less memory.

It does a solid job identifying the mismatch between how the brain separates content and skill and how current personalization methods mix them in one adapter. The hash-keyed approach is presented as enabling disjoint edits that compose additively across users, which is a real difference from a single global delta.

The soft spots are the lack of any concrete demonstration. The abstract states that edits leave every other position unchanged to the last bit and delivers specific gains like 5.6x accuracy and 33,000x smaller footprint, but there are no methods, no datasets, and no explanation of how the hash-keyed table or the edit operator achieves the isolation. The concern in the stress-test note holds: if the surgical edits aren't truly isolated without side effects, then the advantages over LoRA and the multi-user scaling don't work. The paper seems to treat the Engram model as given, but without showing the construction, the central argument is not yet supported.

This is for researchers focused on scalable personalization in LLMs. Someone looking for parametric memory options beyond retrieval or standard adapters would get the conceptual value, but the work needs the experiments and implementation details to be convincing. It deserves a serious referee because the idea is distinct and the problem it targets is important, even though the current evidence is limited to the abstract claims.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes 'User as Engram,' a layered personalization method that stores per-user facts as surgical edits to the hash-keyed memory table of an Engram model while carrying reasoning skill in one shared adapter. It claims this matches per-user LoRA on direct recall, delivers 5.6x higher indirect-reasoning accuracy on average, produces a roughly 33,000x smaller memory footprint, enables lossless additive composition across users, and never degrades any user's reasoning relative to the base model. The edits are described as glass-box operations that affect only exact trigger positions.

Significance. If the isolation property and empirical results hold, the approach could meaningfully advance scalable multi-user personalization by separating content storage from reasoning skill, avoiding the interference and memory scaling issues of per-user LoRA while outperforming retrieval on larger models past ~100 facts. The lossless composition and 'never worse' guarantee would be particularly valuable if rigorously shown.

major comments (2)

[Abstract / Engram model description] Abstract and Engram model section: the central claims of bit-exact isolation ('leaves every other position unchanged to the last bit'), 33,000x footprint reduction, and 'never makes a single user worse' all rest on the existence of a hash-keyed memory table enabling surgical edits with no side effects on unrelated positions or layers; no construction, hashing scheme, layer placement, or edit operator is provided to demonstrate how arbitrary facts achieve this isolation.
[Abstract] Abstract: the quantitative performance claims (5.6x indirect-reasoning accuracy, matching direct recall, no degradation) are stated without any reference to datasets, baselines, number of users/facts, evaluation protocol, or error bars, making it impossible to assess whether the layered design actually delivers the reported advantages over per-user LoRA.

minor comments (1)

[Notation / Model description] The term 'Engram rows' and 'hash-keyed memory table' are introduced without a formal definition, pseudocode, or equation showing the mapping from fact to edit and the exact lookup mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and the clear identification of areas where additional detail is required. We address each major comment below.

read point-by-point responses

Referee: [Abstract / Engram model description] Abstract and Engram model section: the central claims of bit-exact isolation ('leaves every other position unchanged to the last bit'), 33,000x footprint reduction, and 'never makes a single user worse' all rest on the existence of a hash-keyed memory table enabling surgical edits with no side effects on unrelated positions or layers; no construction, hashing scheme, layer placement, or edit operator is provided to demonstrate how arbitrary facts achieve this isolation.

Authors: The referee is correct that the abstract and Engram model section currently provide only a high-level description of the hash-keyed table and do not specify the concrete construction, hashing scheme, layer placement, or edit operator. We will revise the Engram model section to include these details: a deterministic hash function that maps fact keys to disjoint slots, placement within designated feed-forward sublayers, and an additive edit operator applied only to the value vector at the hashed index. This addition will make the bit-exact isolation property explicit and verifiable. revision: yes
Referee: [Abstract] Abstract: the quantitative performance claims (5.6x indirect-reasoning accuracy, matching direct recall, no degradation) are stated without any reference to datasets, baselines, number of users/facts, evaluation protocol, or error bars, making it impossible to assess whether the layered design actually delivers the reported advantages over per-user LoRA.

Authors: We agree that the abstract presents the numerical claims without the necessary experimental context. We will revise the abstract to reference the evaluation setup, including the datasets and tasks used, the number of users and facts, the per-user LoRA and retrieval baselines, the evaluation protocol, and the fact that error bars appear in the main results section. revision: yes

Circularity Check

0 steps flagged

No circularity; proposal rests on external model assumption and empirical claims

full rationale

The manuscript proposes a layered architecture that stores user facts as edits to a hash-keyed Engram memory table while sharing one adapter for reasoning skill. All performance claims (5.6x indirect-reasoning gain, 33,000x footprint reduction, lossless composition, bit-exact isolation) are presented as direct consequences of the stated properties of that table rather than derived from any equation, fitted parameter, or self-citation chain inside the paper. No self-definitional loops, renamed predictions, or load-bearing self-citations appear; the central premise is an engineering assumption about the existence and behavior of the Engram substrate, not a reduction of the target result to itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of an Engram model whose memory table supports hash-keyed surgical edits that are guaranteed to leave all other positions unchanged; this property is asserted but not derived from prior literature in the abstract.

axioms (1)

domain assumption The brain separates episodic content (hippocampus) from shared reasoning skill (neocortex).
Used to motivate why content and skill should be stored separately in the model.

invented entities (1)

Engram model with hash-keyed memory table no independent evidence
purpose: To enable local parametric edits for user facts that are mathematically isolated from other positions.
Introduced as the base architecture that makes the surgical-edit property possible; no independent evidence supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5872 in / 1413 out tokens · 23363 ms · 2026-06-26T20:34:33.813886+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Memory Depth, Not Memory Access: Selective Parametric Consolidation for Long-Running Language Agents
cs.AI 2026-06 unverdicted novelty 6.0

EVAF, a surprise- and valence-gated LoRA mechanism, provides memory depth for goal persistence in language agents via the loop-drift protocol, complementary to retrieval.

Reference graph

Works this paper leans on

96 extracted references · 20 linked inside Pith · cited by 1 Pith paper

[1]

B. Li. User as code: Executable memory for personalized agents. arXiv:2606.16707, 2026

arXiv 2026
[2]

R. Semon. The Mneme. George Allen & Unwin, London, 1921. (English translation of Die Mneme, 1904; origin of the term ``engram'')

1921
[3]

J. L. McClelland, B. L. McNaughton, and R. C. O'Reilly. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3):419--457, 1995

1995
[4]

Tonegawa, X

S. Tonegawa, X. Liu, S. Ramirez, and R. Redondo. Memory engram cells have come of age. Neuron, 87(5):918--931, 2015

2015
[5]

Kumaran, D

D. Kumaran, D. Hassabis, and J. L. McClelland. What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends in Cognitive Sciences, 20(7):512--534, 2016

2016
[6]

Cheng, W

X. Cheng, W. Zeng, D. Dai, Q. Chen, B. Wang, Z. Xie, K. Huang, X. Yu, Z. Hao, Y. Li, H. Zhang, H. Zhang, D. Zhao, and W. Liang. Conditional memory via scalable lookup: A new axis of sparsity for large language models. arXiv:2601.07372, January 2026

Pith/arXiv arXiv 2026
[7]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. NeurIPS 2017

2017
[8]

Brown et al

T. Brown et al. Language models are few-shot learners. NeurIPS 2020

2020
[9]

Radford, J

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners ( GPT-2 ). OpenAI technical report, 2019

2019
[10]

S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? EMNLP 2022

2022
[11]

uttler, M. Lewis, W. Yih, T. Rockt\

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. K\"uttler, M. Lewis, W. Yih, T. Rockt\"aschel, S. Riedel, and D. Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS 2020

2020
[12]

Gao et al

Y. Gao et al. Retrieval-augmented generation for large language models: A survey. arXiv:2312.10997, 2023

Pith/arXiv arXiv 2023
[13]

Shazeer, A

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. ICLR 2017

2017
[14]

Dai et al

D. Dai et al. DeepSeekMoE : Towards ultimate expert specialization in mixture-of-experts language models. arXiv:2401.06066, 2024

Pith/arXiv arXiv 2024
[15]

Mangrulkar et al

S. Mangrulkar et al. PEFT : State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022

2022
[16]

Jordan et al

K. Jordan et al. Muon : An optimiser for hidden layers in neural networks. GitHub / blog post, 2024

2024
[17]

Huang et al

C. Huang et al. LoRAHub : Efficient cross-task generalization via dynamic LoRA composition. arXiv:2307.13269, 2023

arXiv 2023
[18]

Wu et al

D. Wu et al. LongMemEval : Benchmarking chat assistants on long-term memory. arXiv 2024

2024
[19]

Maharana et al

A. Maharana et al. LOCOMO : Evaluating very long-term conversational memory of LLM agents. ACL 2024

2024
[20]

Tavakoli, A

M. Tavakoli, A. Salemi, C. Ye, M. Abdalla, H. Zamani, and J. R. Mitchell. Beyond a million tokens: Benchmarking and enhancing long-term memory in LLM s ( BEAM ). arXiv:2510.27246, 2025

arXiv 2025
[21]

Jiang, Y

B. Jiang, Y. Yuan, M. Shen, Z. Hao, Z. Xu, Z. Chen, Z. Liu, A. R. Vijjini, J. He, H. Yu, R. Poovendran, G. Wornell, L. Ungar, D. Roth, S. Chen, and C. J. Taylor. PersonaMem-v2 : Towards personalized intelligence via learning implicit user personas and agentic memory. arXiv:2512.06688, 2025

arXiv 2025
[22]

Karpathy

A. Karpathy. nanochat: an experimental training harness for LLMs. https://github.com/karpathy/nanochat, 2026

2026
[23]

E. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. LoRA : Low-rank adaptation of large language models. ICLR 2022

2022
[24]

Houlsby et al

N. Houlsby et al. Parameter-efficient transfer learning for NLP . ICML 2019

2019
[25]

Su et al

W. Su et al. Parametric retrieval-augmented generation ( PRAG ). arXiv:2501.15915, 2025

arXiv 2025
[26]

Tan et al

Z. Tan et al. DyPRAG : Dynamic parametric RAG . arXiv:2505.19386, 2025

arXiv 2025
[27]

J. Chen, H. Zhang, L. Pang, Y. Tong, H. Zhou, Y. Zhan, W. Lin, and Z. Zheng. Privacy-preserving reasoning with knowledge-distilled parametric retrieval-augmented generation ( DistilledPRAG ). arXiv:2509.01088, 2025

arXiv 2025
[28]

Z. Tan, Q. Liu, and M. Jiang. Democratizing large language models via personalized parameter-efficient fine-tuning ( OPPU ). EMNLP 2024 (arXiv:2402.04401)

arXiv 2024
[29]

Zhuang et al

Y. Zhuang et al. HYDRA : Per-user adapters for personalised LLMs . arXiv 2024

2024
[30]

M. Bini, O. Bohdal, U. Michieli, Z. Akata, M. Ozay, and T. Ceritli. MemLoRA : Distilling expert adapters for on-device memory systems. arXiv:2512.04763, 2025

arXiv 2025
[31]

Charakorn, E

R. Charakorn, E. Cetin, Y. Tang, and R. T. Lange. Text-to- LoRA : Instant transformer adaption. arXiv:2506.06105, 2025

arXiv 2025
[32]

Tan et al

Z. Tan et al. PER-PCS : Per-user post-hoc LoRA composition. arXiv 2024

2024
[33]

Sheng, S

Y. Sheng, S. Cao, D. Li, et al. S-LoRA : Serving thousands of concurrent LoRA adapters. arXiv:2311.03285, 2024

arXiv 2024
[34]

L. Chen, Z. Ye, Y. Wu, et al. Punica: Multi-tenant LoRA serving. MLSys 2024

2024
[35]

Lample, A

G. Lample, A. Sablayrolles, M. Ranzato, L. Denoyer, and H. J\'egou. Large memory layers with product keys. NeurIPS 2019

2019
[36]

P. He. PEER : Mixture of one million experts. arXiv 2024

2024
[37]

Berges, B

V. Berges, B. O g uz, D. Haziza, W. Yih, L. Zettlemoyer, and G. Ghosh. Memory layers at scale. ICML 2025

2025
[38]

Huang, Q

Z. Huang, Q. Min, H. Huang, D. Zhu, Y. Zeng, R. Guo, and X. Zhou. Ultra-sparse memory network ( Ultra-Mem ). arXiv:2411.12364, 2024 (ICLR 2025)

arXiv 2024
[39]

Huang et al

J. Huang et al. OverEncoding : hashed N-gram embeddings via averaging. 2025

2025
[40]

Yu et al

L. Yu et al. SCONE : scalable contextual N-gram embeddings. 2025

2025
[41]

Pagnoni, R

A. Pagnoni, R. Pasunuru, P. Rodriguez, et al. BLT : byte latent transformer with hashed N-gram embeddings. arXiv:2412.09871, 2025

arXiv 2025
[42]

Liu et al

A. Liu et al. SuperBPE : word-level BPE for compositional patterns. 2025

2025
[43]

LoRA stacking patterns for Stable Diffusion

CivitAI Community . LoRA stacking patterns for Stable Diffusion. https://civitai.com/, 2024

2024
[44]

K. Meng, D. Bau, A. Andonian, and Y. Belinkov. Locating and editing factual associations in GPT ( ROME ). NeurIPS 2022

2022
[45]

Meng et al

K. Meng et al. MEMIT : Mass-editing memory in a transformer. ICLR 2023

2023
[46]

Cohen et al

R. Cohen et al. Evaluating the ripple effects of knowledge editing in language models ( MQuAKE ). 2023

2023
[47]

Cohen et al

R. Cohen et al. RippleEdits : A benchmark for ripple effects of model editing. 2024

2024
[48]

Meng et al

K. Meng et al. CounterFact : a counterfactual editing benchmark. 2022

2022
[49]

Wu, J.-C

D. Wu, J.-C. Gu, K.-W. Chang, and N. Peng. Self-routing RAG : Binding selective retrieval with knowledge verbalization. arXiv:2504.01018, 2025

arXiv 2025
[50]

Sun et al

Z. Sun et al. Recitation-augmented language models. ICLR 2023

2023
[51]

A. Yang, B. Yang, B. Zhang, et al. Qwen2.5 technical report. arXiv:2412.15115, 2025

Pith/arXiv arXiv 2025
[52]

A. Yang, A. Li, B. Yang, et al. Qwen3 technical report. arXiv:2505.09388, 2025

Pith/arXiv arXiv 2025
[53]

Grattafiori, A

A. Grattafiori, A. Dubey, A. Jauhri, et al. The Llama 3 herd of models. arXiv:2407.21783, 2024

Pith/arXiv arXiv 2024
[54]

A. Q. Jiang, A. Sablayrolles, A. Mensch, et al. Mistral 7B. arXiv:2310.06825, 2023

Pith/arXiv arXiv 2023
[55]

DeepSeek-V3 technical report

DeepSeek-AI. DeepSeek-V3 technical report. arXiv:2412.19437, 2024

Pith/arXiv arXiv 2024
[56]

Reimers and I

N. Reimers and I. Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. EMNLP-IJCNLP 2019

2019
[57]

W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou. MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. NeurIPS 2020

2020
[58]

Zheng, W.-L

L. Zheng, W.-L. Chiang, Y. Sheng, et al. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. NeurIPS 2023 Datasets and Benchmarks. arXiv:2306.05685

Pith/arXiv arXiv 2023
[59]

Packer, S

C. Packer, S. Wooders, K. Lin, et al. MemGPT: Towards LLMs as operating systems. arXiv:2310.08560, 2023

Pith/arXiv arXiv 2023
[60]

Chhikara, D

P. Chhikara, D. Khant, S. Aryan, T. Singh, and D. Yadav. Mem0: Building production-ready AI agents with scalable long-term memory. arXiv:2504.19413, 2025

Pith/arXiv arXiv 2025
[61]

W. Xu, Z. Liang, K. Mei, H. Gao, J. Tan, and Y. Zhang. A-MEM: Agentic memory for LLM agents. arXiv:2502.12110, 2025

Pith/arXiv arXiv 2025
[62]

Rasmussen, P

P. Rasmussen, P. Paliychuk, T. Beauvais, J. Ryan, and D. Chalef. Zep: A temporal knowledge graph architecture for agent memory. arXiv:2501.13956, 2025

Pith/arXiv arXiv 2025
[63]

Z. Li, S. Song, H. Wang, et al. MemOS: An operating system for memory-augmented generation in large language models. arXiv:2505.22101, 2025

arXiv 2025
[64]

S. Wang, E. Yu, O. Love, T. Zhang, T. Wong, S. Scargall, and C. Fan. MemMachine: A ground-truth-preserving memory system for personalized AI agents. arXiv:2604.04853, 2026

Pith/arXiv arXiv 2026
[65]

C. Hu, X. Gao, Z. Zhou, et al. EverMemOS: A self-organizing memory operating system for structured long-horizon reasoning. arXiv:2601.02163, 2026

arXiv 2026
[66]

Patel and S

D. Patel and S. Patel. ENGRAM: Effective, lightweight memory orchestration for conversational agents. arXiv:2511.12960, 2025

arXiv 2025
[67]

S. Yan, X. Yang, Z. Huang, et al. Memory-R1: Enhancing large language model agents to manage and utilize memories via reinforcement learning. arXiv:2508.19828, 2025

Pith/arXiv arXiv 2025
[68]

Y. Yu, L. Yao, Y. Xie, et al. Agentic memory: Learning unified long-term and short-term memory management for LLM agents. arXiv:2601.01885, 2026

Pith/arXiv arXiv 2026
[69]

Y. Wang, R. Takanobu, Z. Liang, et al. Mem- : Learning memory construction via reinforcement learning. arXiv:2509.25911, 2025

Pith/arXiv arXiv 2025
[70]

Zhang, X

Z. Zhang, X. Bo, C. Ma, et al. A survey on the memory mechanism of large language model based agents. arXiv:2404.13501, 2024

Pith/arXiv arXiv 2024
[71]

Y. Wu, S. Liang, C. Zhang, et al. From human memory to AI memory: A survey on memory mechanisms in the era of LLMs. arXiv:2504.15965, 2025

Pith/arXiv arXiv 2025
[72]

P. Du. Memory for autonomous LLM agents: Mechanisms, evaluation, and emerging frontiers. arXiv:2603.07670, 2026

arXiv 2026
[73]

Pollertlam and W

N. Pollertlam and W. Kornsuwannawit. Beyond the context window: A cost-performance analysis of fact-based memory vs.\ long-context LLMs for persistent agents. arXiv:2603.04814, 2026

arXiv 2026
[74]

Salemi, S

A. Salemi, S. Mysore, M. Bendersky, and H. Zamani. LaMP: When large language models meet personalization. ACL 2024. arXiv:2304.11406

arXiv 2024
[75]

Zhang, R

Z. Zhang, R. A. Rossi, B. Kveton, et al. Personalization of large language models: A survey. arXiv:2411.00027, 2024

arXiv 2024
[76]

J. Liu, Z. Qiu, Z. Li, et al. A survey of personalized large language models: Progress and future directions. arXiv:2502.11528, 2025

arXiv 2025
[77]

Y. Xu, Q. Chen, Z. Ma, et al. Toward personalized LLM-powered agents: Foundations, evaluation, and future directions. arXiv:2602.22680, 2026

arXiv 2026
[78]

Mitchell, C

E. Mitchell, C. Lin, A. Bosselut, C. Finn, and C. D. Manning. Fast model editing at scale. ICLR 2022

2022
[79]

Mitchell, C

E. Mitchell, C. Lin, A. Bosselut, C. D. Manning, and C. Finn. Memory-based model editing at scale. ICML 2022

2022
[80]

D. Dai, L. Dong, Y. Hao, Z. Sui, B. Chang, and F. Wei. Knowledge neurons in pretrained transformers. ACL 2022

2022

Showing first 80 references.

[1] [1]

B. Li. User as code: Executable memory for personalized agents. arXiv:2606.16707, 2026

arXiv 2026

[2] [2]

R. Semon. The Mneme. George Allen & Unwin, London, 1921. (English translation of Die Mneme, 1904; origin of the term ``engram'')

1921

[3] [3]

J. L. McClelland, B. L. McNaughton, and R. C. O'Reilly. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3):419--457, 1995

1995

[4] [4]

Tonegawa, X

S. Tonegawa, X. Liu, S. Ramirez, and R. Redondo. Memory engram cells have come of age. Neuron, 87(5):918--931, 2015

2015

[5] [5]

Kumaran, D

D. Kumaran, D. Hassabis, and J. L. McClelland. What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends in Cognitive Sciences, 20(7):512--534, 2016

2016

[6] [6]

Cheng, W

X. Cheng, W. Zeng, D. Dai, Q. Chen, B. Wang, Z. Xie, K. Huang, X. Yu, Z. Hao, Y. Li, H. Zhang, H. Zhang, D. Zhao, and W. Liang. Conditional memory via scalable lookup: A new axis of sparsity for large language models. arXiv:2601.07372, January 2026

Pith/arXiv arXiv 2026

[7] [7]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. NeurIPS 2017

2017

[8] [8]

Brown et al

T. Brown et al. Language models are few-shot learners. NeurIPS 2020

2020

[9] [9]

Radford, J

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners ( GPT-2 ). OpenAI technical report, 2019

2019

[10] [10]

S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? EMNLP 2022

2022

[11] [11]

uttler, M. Lewis, W. Yih, T. Rockt\

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. K\"uttler, M. Lewis, W. Yih, T. Rockt\"aschel, S. Riedel, and D. Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS 2020

2020

[12] [12]

Gao et al

Y. Gao et al. Retrieval-augmented generation for large language models: A survey. arXiv:2312.10997, 2023

Pith/arXiv arXiv 2023

[13] [13]

Shazeer, A

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. ICLR 2017

2017

[14] [14]

Dai et al

D. Dai et al. DeepSeekMoE : Towards ultimate expert specialization in mixture-of-experts language models. arXiv:2401.06066, 2024

Pith/arXiv arXiv 2024

[15] [15]

Mangrulkar et al

S. Mangrulkar et al. PEFT : State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022

2022

[16] [16]

Jordan et al

K. Jordan et al. Muon : An optimiser for hidden layers in neural networks. GitHub / blog post, 2024

2024

[17] [17]

Huang et al

C. Huang et al. LoRAHub : Efficient cross-task generalization via dynamic LoRA composition. arXiv:2307.13269, 2023

arXiv 2023

[18] [18]

Wu et al

D. Wu et al. LongMemEval : Benchmarking chat assistants on long-term memory. arXiv 2024

2024

[19] [19]

Maharana et al

A. Maharana et al. LOCOMO : Evaluating very long-term conversational memory of LLM agents. ACL 2024

2024

[20] [20]

Tavakoli, A

M. Tavakoli, A. Salemi, C. Ye, M. Abdalla, H. Zamani, and J. R. Mitchell. Beyond a million tokens: Benchmarking and enhancing long-term memory in LLM s ( BEAM ). arXiv:2510.27246, 2025

arXiv 2025

[21] [21]

Jiang, Y

B. Jiang, Y. Yuan, M. Shen, Z. Hao, Z. Xu, Z. Chen, Z. Liu, A. R. Vijjini, J. He, H. Yu, R. Poovendran, G. Wornell, L. Ungar, D. Roth, S. Chen, and C. J. Taylor. PersonaMem-v2 : Towards personalized intelligence via learning implicit user personas and agentic memory. arXiv:2512.06688, 2025

arXiv 2025

[22] [22]

Karpathy

A. Karpathy. nanochat: an experimental training harness for LLMs. https://github.com/karpathy/nanochat, 2026

2026

[23] [23]

E. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. LoRA : Low-rank adaptation of large language models. ICLR 2022

2022

[24] [24]

Houlsby et al

N. Houlsby et al. Parameter-efficient transfer learning for NLP . ICML 2019

2019

[25] [25]

Su et al

W. Su et al. Parametric retrieval-augmented generation ( PRAG ). arXiv:2501.15915, 2025

arXiv 2025

[26] [26]

Tan et al

Z. Tan et al. DyPRAG : Dynamic parametric RAG . arXiv:2505.19386, 2025

arXiv 2025

[27] [27]

J. Chen, H. Zhang, L. Pang, Y. Tong, H. Zhou, Y. Zhan, W. Lin, and Z. Zheng. Privacy-preserving reasoning with knowledge-distilled parametric retrieval-augmented generation ( DistilledPRAG ). arXiv:2509.01088, 2025

arXiv 2025

[28] [28]

Z. Tan, Q. Liu, and M. Jiang. Democratizing large language models via personalized parameter-efficient fine-tuning ( OPPU ). EMNLP 2024 (arXiv:2402.04401)

arXiv 2024

[29] [29]

Zhuang et al

Y. Zhuang et al. HYDRA : Per-user adapters for personalised LLMs . arXiv 2024

2024

[30] [30]

M. Bini, O. Bohdal, U. Michieli, Z. Akata, M. Ozay, and T. Ceritli. MemLoRA : Distilling expert adapters for on-device memory systems. arXiv:2512.04763, 2025

arXiv 2025

[31] [31]

Charakorn, E

R. Charakorn, E. Cetin, Y. Tang, and R. T. Lange. Text-to- LoRA : Instant transformer adaption. arXiv:2506.06105, 2025

arXiv 2025

[32] [32]

Tan et al

Z. Tan et al. PER-PCS : Per-user post-hoc LoRA composition. arXiv 2024

2024

[33] [33]

Sheng, S

Y. Sheng, S. Cao, D. Li, et al. S-LoRA : Serving thousands of concurrent LoRA adapters. arXiv:2311.03285, 2024

arXiv 2024

[34] [34]

L. Chen, Z. Ye, Y. Wu, et al. Punica: Multi-tenant LoRA serving. MLSys 2024

2024

[35] [35]

Lample, A

G. Lample, A. Sablayrolles, M. Ranzato, L. Denoyer, and H. J\'egou. Large memory layers with product keys. NeurIPS 2019

2019

[36] [36]

P. He. PEER : Mixture of one million experts. arXiv 2024

2024

[37] [37]

Berges, B

V. Berges, B. O g uz, D. Haziza, W. Yih, L. Zettlemoyer, and G. Ghosh. Memory layers at scale. ICML 2025

2025

[38] [38]

Huang, Q

Z. Huang, Q. Min, H. Huang, D. Zhu, Y. Zeng, R. Guo, and X. Zhou. Ultra-sparse memory network ( Ultra-Mem ). arXiv:2411.12364, 2024 (ICLR 2025)

arXiv 2024

[39] [39]

Huang et al

J. Huang et al. OverEncoding : hashed N-gram embeddings via averaging. 2025

2025

[40] [40]

Yu et al

L. Yu et al. SCONE : scalable contextual N-gram embeddings. 2025

2025

[41] [41]

Pagnoni, R

A. Pagnoni, R. Pasunuru, P. Rodriguez, et al. BLT : byte latent transformer with hashed N-gram embeddings. arXiv:2412.09871, 2025

arXiv 2025

[42] [42]

Liu et al

A. Liu et al. SuperBPE : word-level BPE for compositional patterns. 2025

2025

[43] [43]

LoRA stacking patterns for Stable Diffusion

CivitAI Community . LoRA stacking patterns for Stable Diffusion. https://civitai.com/, 2024

2024

[44] [44]

K. Meng, D. Bau, A. Andonian, and Y. Belinkov. Locating and editing factual associations in GPT ( ROME ). NeurIPS 2022

2022

[45] [45]

Meng et al

K. Meng et al. MEMIT : Mass-editing memory in a transformer. ICLR 2023

2023

[46] [46]

Cohen et al

R. Cohen et al. Evaluating the ripple effects of knowledge editing in language models ( MQuAKE ). 2023

2023

[47] [47]

Cohen et al

R. Cohen et al. RippleEdits : A benchmark for ripple effects of model editing. 2024

2024

[48] [48]

Meng et al

K. Meng et al. CounterFact : a counterfactual editing benchmark. 2022

2022

[49] [49]

Wu, J.-C

D. Wu, J.-C. Gu, K.-W. Chang, and N. Peng. Self-routing RAG : Binding selective retrieval with knowledge verbalization. arXiv:2504.01018, 2025

arXiv 2025

[50] [50]

Sun et al

Z. Sun et al. Recitation-augmented language models. ICLR 2023

2023

[51] [51]

A. Yang, B. Yang, B. Zhang, et al. Qwen2.5 technical report. arXiv:2412.15115, 2025

Pith/arXiv arXiv 2025

[52] [52]

A. Yang, A. Li, B. Yang, et al. Qwen3 technical report. arXiv:2505.09388, 2025

Pith/arXiv arXiv 2025

[53] [53]

Grattafiori, A

A. Grattafiori, A. Dubey, A. Jauhri, et al. The Llama 3 herd of models. arXiv:2407.21783, 2024

Pith/arXiv arXiv 2024

[54] [54]

A. Q. Jiang, A. Sablayrolles, A. Mensch, et al. Mistral 7B. arXiv:2310.06825, 2023

Pith/arXiv arXiv 2023

[55] [55]

DeepSeek-V3 technical report

DeepSeek-AI. DeepSeek-V3 technical report. arXiv:2412.19437, 2024

Pith/arXiv arXiv 2024

[56] [56]

Reimers and I

N. Reimers and I. Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. EMNLP-IJCNLP 2019

2019

[57] [57]

W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou. MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. NeurIPS 2020

2020

[58] [58]

Zheng, W.-L

L. Zheng, W.-L. Chiang, Y. Sheng, et al. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. NeurIPS 2023 Datasets and Benchmarks. arXiv:2306.05685

Pith/arXiv arXiv 2023

[59] [59]

Packer, S

C. Packer, S. Wooders, K. Lin, et al. MemGPT: Towards LLMs as operating systems. arXiv:2310.08560, 2023

Pith/arXiv arXiv 2023

[60] [60]

Chhikara, D

P. Chhikara, D. Khant, S. Aryan, T. Singh, and D. Yadav. Mem0: Building production-ready AI agents with scalable long-term memory. arXiv:2504.19413, 2025

Pith/arXiv arXiv 2025

[61] [61]

W. Xu, Z. Liang, K. Mei, H. Gao, J. Tan, and Y. Zhang. A-MEM: Agentic memory for LLM agents. arXiv:2502.12110, 2025

Pith/arXiv arXiv 2025

[62] [62]

Rasmussen, P

P. Rasmussen, P. Paliychuk, T. Beauvais, J. Ryan, and D. Chalef. Zep: A temporal knowledge graph architecture for agent memory. arXiv:2501.13956, 2025

Pith/arXiv arXiv 2025

[63] [63]

Z. Li, S. Song, H. Wang, et al. MemOS: An operating system for memory-augmented generation in large language models. arXiv:2505.22101, 2025

arXiv 2025

[64] [64]

S. Wang, E. Yu, O. Love, T. Zhang, T. Wong, S. Scargall, and C. Fan. MemMachine: A ground-truth-preserving memory system for personalized AI agents. arXiv:2604.04853, 2026

Pith/arXiv arXiv 2026

[65] [65]

C. Hu, X. Gao, Z. Zhou, et al. EverMemOS: A self-organizing memory operating system for structured long-horizon reasoning. arXiv:2601.02163, 2026

arXiv 2026

[66] [66]

Patel and S

D. Patel and S. Patel. ENGRAM: Effective, lightweight memory orchestration for conversational agents. arXiv:2511.12960, 2025

arXiv 2025

[67] [67]

S. Yan, X. Yang, Z. Huang, et al. Memory-R1: Enhancing large language model agents to manage and utilize memories via reinforcement learning. arXiv:2508.19828, 2025

Pith/arXiv arXiv 2025

[68] [68]

Y. Yu, L. Yao, Y. Xie, et al. Agentic memory: Learning unified long-term and short-term memory management for LLM agents. arXiv:2601.01885, 2026

Pith/arXiv arXiv 2026

[69] [69]

Y. Wang, R. Takanobu, Z. Liang, et al. Mem- : Learning memory construction via reinforcement learning. arXiv:2509.25911, 2025

Pith/arXiv arXiv 2025

[70] [70]

Zhang, X

Z. Zhang, X. Bo, C. Ma, et al. A survey on the memory mechanism of large language model based agents. arXiv:2404.13501, 2024

Pith/arXiv arXiv 2024

[71] [71]

Y. Wu, S. Liang, C. Zhang, et al. From human memory to AI memory: A survey on memory mechanisms in the era of LLMs. arXiv:2504.15965, 2025

Pith/arXiv arXiv 2025

[72] [72]

P. Du. Memory for autonomous LLM agents: Mechanisms, evaluation, and emerging frontiers. arXiv:2603.07670, 2026

arXiv 2026

[73] [73]

Pollertlam and W

N. Pollertlam and W. Kornsuwannawit. Beyond the context window: A cost-performance analysis of fact-based memory vs.\ long-context LLMs for persistent agents. arXiv:2603.04814, 2026

arXiv 2026

[74] [74]

Salemi, S

A. Salemi, S. Mysore, M. Bendersky, and H. Zamani. LaMP: When large language models meet personalization. ACL 2024. arXiv:2304.11406

arXiv 2024

[75] [75]

Zhang, R

Z. Zhang, R. A. Rossi, B. Kveton, et al. Personalization of large language models: A survey. arXiv:2411.00027, 2024

arXiv 2024

[76] [76]

J. Liu, Z. Qiu, Z. Li, et al. A survey of personalized large language models: Progress and future directions. arXiv:2502.11528, 2025

arXiv 2025

[77] [77]

Y. Xu, Q. Chen, Z. Ma, et al. Toward personalized LLM-powered agents: Foundations, evaluation, and future directions. arXiv:2602.22680, 2026

arXiv 2026

[78] [78]

Mitchell, C

E. Mitchell, C. Lin, A. Bosselut, C. Finn, and C. D. Manning. Fast model editing at scale. ICLR 2022

2022

[79] [79]

Mitchell, C

E. Mitchell, C. Lin, A. Bosselut, C. D. Manning, and C. Finn. Memory-based model editing at scale. ICML 2022

2022

[80] [80]

D. Dai, L. Dong, Y. Hao, Z. Sui, B. Chang, and F. Wei. Knowledge neurons in pretrained transformers. ACL 2022

2022