pith. sign in

arxiv: 2605.03804 · v2 · pith:LVEOBLVNnew · submitted 2026-05-05 · 💻 cs.AI

ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting

Pith reviewed 2026-05-07 16:18 UTC · model grok-4.3

classification 💻 cs.AI
keywords on-device memoryLLM agentsmemory compressionoptical forgettingepisodic memory graphmultimodal memorypersonalized agentsedge AI
0
0 comments X

The pith

ScrapMem lets LLM agents keep long-term multimodal memories on edge devices by progressively lowering the resolution of old entries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to solve the storage and complexity problems that prevent LLM agents from maintaining useful personalized memory over long periods when running on phones or other limited hardware. It does this by turning incoming multimodal data into scrapbook-style pages, then applying optical forgetting to shrink the detail level of older pages while keeping the most recent ones intact. An Episodic Memory Graph links the remaining entries into a causal timeline so the agent can still retrieve relevant past events efficiently. Experiments on the ATM-Bench dataset show the method reaches a new best Joint@10 score of 51.0 percent, cuts memory use by as much as 93 percent, and lifts Recall@10 to 70.3 percent. If the approach works as described, on-device agents could retain weeks or months of personal context without needing constant cloud uploads or oversized local storage.

Core claim

ScrapMem integrates multimodal inputs into Scrapbook Pages, applies optical forgetting that progressively reduces resolution of older memories to cut storage cost while suppressing low-value details, and builds an Episodic Memory Graph to preserve causal-temporal relationships among key events; on the multimodal ATM-Bench this yields 51.0 percent Joint@10, up to 93 percent lower memory usage, and 70.3 percent Recall@10.

What carries the argument

Optical Forgetting, a progressive resolution-reduction step applied to older memories, supported by an Episodic Memory Graph that links events in causal-temporal order to keep retrieval accurate after compression.

If this is right

  • Agents running locally can sustain much longer interaction histories without exhausting device storage.
  • Structured graph aggregation raises the chance that relevant past episodes are retrieved even after compression.
  • Multimodal on-device agents become practical for personalized tasks without constant data transfer.
  • Memory management can shift from keeping everything to selectively discarding detail in a controlled way.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same forgetting pattern could be tested on non-LLM memory systems such as robotic state trackers to see whether resolution reduction still preserves task-critical information.
  • Real-device measurements of power draw and latency after applying optical forgetting would show whether the storage savings translate into usable runtime gains.
  • Extending the Episodic Memory Graph with explicit decay rates might allow further tuning of how quickly older events lose detail.

Load-bearing premise

Lowering the resolution of older memories keeps their semantic content usable and does not erase or distort important multimodal details that the agent will later need.

What would settle it

A controlled test in which memories compressed by optical forgetting cause the agent to give incorrect answers on questions about past events that were still present before compression, dropping performance below the reported baseline.

Figures

Figures reproduced from arXiv: 2605.03804 by Jiale Chang, Yuxiang Ren.

Figure 1
Figure 1. Figure 1: Comparison between human memory (CLS theory) and Scrapbook Memory. Top: The hippocampus rapidly encodes multimodal episodic experiences, while the neocortex gradually consolidates them into stable long-term knowledge. Bottom: ScrapMem similarly binds heterogeneous user data into scrapbook pages and progressively compresses old memories via optical for￾getting, preserving core semantics for efficient retrie… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the ScrapMem. (1) Consolidation and Perception: Unifies heterogeneous records (images, videos, text) into hybrid representations via OCR and vision-to-text extraction. (2) EM-Graph Construction: Organizes nodes into an Episodic Memory Graph with event-centric paths (EM-Paths) for structured retrieval and multi-hop reasoning. (3) Optical Forgetting: Compresses outdated memories through temporal … view at source ↗
Figure 3
Figure 3. Figure 3: Retrieval performance (Recall@K) under varying optical forgetting intensities. The clustering of different forgetting curves demonstrates that ScrapMem is highly robust to specific hyperparameter configura￾tions. quality (Q), resolution scaling factor (S), and tem￾poral stage boundaries (T) for Recent, Mid-term, and Old memories, respectively view at source ↗
Figure 4
Figure 4. Figure 4: Storage–performance trade-off on ATM￾Bench (Joint@10). The x-axis uses a logarithmic scale. ScrapMem (Timed-Gentle, orange star) reduces stor￾age by 93.0% relative to the raw-data baseline while retaining over 90% of SOTA performance (46.3% vs. 51.0%). The Pareto frontier indicates strong efficiency and graceful performance degradation, supporting on￾device deployment. strengthen long-range reasoning. Exte… view at source ↗
read the original abstract

Long-term personalized memory for LLM agents is challenging on resource-limited edge devices due to high storage costs and multimodal complexity. To address this, we propose ScrapMem, a framework that integrates multimodal data into "Scrapbook Page." ScrapMem introduces Optical Forgetting, an optical compression mechanism that progressively reduces the resolution of older memories, lowering storage cost while suppressing low-value details. To maintain semantic consistency, we construct an Episodic Memory Graph (EM-Graph) that organizes key events into a causal-temporal structure. Extensive experiments on the multimodal ATM-Bench showcase that ScrapMem provides three main benefits: (1) strong performance, achieving a new state-of-the-art with a 51.0% Joint@10 score; (2) high storage efficiency, reducing memory usage by up to 93% via optical forgetting; and (3) improved recall, increasing Recall@10 to 70.3% through structured aggregation. ScrapMem offers an effective and storage-efficient solution for on-device long-term memory in multimodal LLM agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes ScrapMem, a bio-inspired framework for on-device long-term personalized memory in multimodal LLM agents. It integrates multimodal data into 'Scrapbook Page' structures, introduces Optical Forgetting as a progressive resolution-reduction mechanism for older memories to cut storage costs, and builds an Episodic Memory Graph (EM-Graph) to enforce causal-temporal organization of key events. Experiments on the multimodal ATM-Bench are reported to deliver a new SOTA of 51.0% Joint@10, up to 93% memory reduction, and 70.3% Recall@10 via structured aggregation.

Significance. If the empirical results hold after proper validation, ScrapMem would represent a meaningful advance for resource-constrained edge agents by addressing the tension between long-term multimodal memory and storage limits. The combination of bio-inspired compression with graph-structured retention is conceptually appealing and could influence subsequent work on efficient agent memory. No machine-checked proofs, reproducible code artifacts, or parameter-free derivations are present to credit.

major comments (3)
  1. Abstract: The central performance claims (51.0% Joint@10 SOTA, 93% storage reduction, 70.3% Recall@10) are asserted without any description of baselines, experimental setup, error bars, statistical significance, or implementation details of Optical Forgetting, making it impossible to verify support for the claims from the available text.
  2. Method section on Optical Forgetting: The mechanism that progressively lowers resolution of older memories is described only at a high level; no concrete algorithm, information-loss metrics, or ablations isolating its effect on semantic consistency and multimodal fidelity are supplied, which is load-bearing for both the efficiency and recall claims.
  3. Experiments / ATM-Bench results: No quantitative evidence (e.g., retention metrics, consistency scores, or ablation tables) is given to substantiate that the Episodic Memory Graph preserves causal-temporal structure and critical multimodal details under Optical Forgetting; without these, the 93% reduction could mask unmeasured recall degradation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important areas for improving clarity and substantiation of our claims. We address each major comment point by point below and commit to revisions that will strengthen the manuscript without altering its core contributions.

read point-by-point responses
  1. Referee: Abstract: The central performance claims (51.0% Joint@10 SOTA, 93% storage reduction, 70.3% Recall@10) are asserted without any description of baselines, experimental setup, error bars, statistical significance, or implementation details of Optical Forgetting, making it impossible to verify support for the claims from the available text.

    Authors: We agree that the abstract, as a concise summary, omits these supporting details. The full manuscript provides baselines and setup in Section 4.1, error bars and significance testing in the results tables of Section 4, and Optical Forgetting implementation in Section 3.2. To address the concern directly, we will revise the abstract to include a brief reference to the primary baselines (e.g., standard retrieval and memory-augmented agents), the ATM-Bench evaluation protocol, and a note that detailed metrics and ablations appear in the experiments section. This change will make the performance claims more self-contained while preserving the abstract's brevity. revision: yes

  2. Referee: Method section on Optical Forgetting: The mechanism that progressively lowers resolution of older memories is described only at a high level; no concrete algorithm, information-loss metrics, or ablations isolating its effect on semantic consistency and multimodal fidelity are supplied, which is load-bearing for both the efficiency and recall claims.

    Authors: The current description emphasizes the bio-inspired motivation and high-level progressive reduction process. We acknowledge that a more concrete specification is needed to support the efficiency and fidelity claims. In the revised manuscript, we will expand Section 3.2 to include the explicit algorithm (step-wise resolution scaling with modality-specific parameters), quantitative information-loss metrics (e.g., embedding similarity and perceptual quality scores), and dedicated ablation tables isolating Optical Forgetting's contribution to storage reduction versus semantic consistency. These additions will directly substantiate the 93% reduction claim. revision: yes

  3. Referee: Experiments / ATM-Bench results: No quantitative evidence (e.g., retention metrics, consistency scores, or ablation tables) is given to substantiate that the Episodic Memory Graph preserves causal-temporal structure and critical multimodal details under Optical Forgetting; without these, the 93% reduction could mask unmeasured recall degradation.

    Authors: The reported results focus on end-to-end Joint@10 and Recall@10 metrics on ATM-Bench. We recognize that explicit evidence linking the EM-Graph to structure preservation under forgetting is required to rule out hidden degradation. We will add, in the revised experiments section, quantitative retention metrics (causal edge preservation rates and multimodal detail fidelity scores), consistency scores across forgetting levels, and ablation tables comparing performance with and without the EM-Graph. These will demonstrate that the observed recall improvements and storage savings are not achieved at the expense of unmeasured structural loss. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical results independent of inputs

full rationale

The paper proposes ScrapMem with Optical Forgetting for progressive resolution reduction and an Episodic Memory Graph for causal-temporal organization, then reports experimental outcomes on ATM-Bench including 51.0% Joint@10, 70.3% Recall@10, and up to 93% storage reduction. No equations, parameter fits, or derivations are present that reduce any claimed prediction or result to the inputs by construction. Claims rest on external benchmark evaluation rather than self-definitional loops, fitted-input renamings, or load-bearing self-citations, rendering the chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The framework introduces new mechanisms without citing prior independent evidence for their effectiveness; relies on the assumption that multimodal data can be progressively compressed while retaining utility.

axioms (1)
  • domain assumption Multimodal memories can be progressively reduced in resolution without losing semantic value for agent tasks
    Invoked to justify optical forgetting as a viable compression strategy.
invented entities (2)
  • Optical Forgetting no independent evidence
    purpose: Progressively reduce resolution of older memories to lower storage cost
    New compression mechanism central to the efficiency claim
  • Episodic Memory Graph (EM-Graph) no independent evidence
    purpose: Organize key events into causal-temporal structure for consistency
    New structure to maintain semantic consistency during compression

pith-pipeline@v0.9.0 · 5477 in / 1394 out tokens · 58629 ms · 2026-05-07T16:18:31.815932+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.