pith. machine review for the scientific record. sign in

arxiv: 2604.06222 · v1 · submitted 2026-03-27 · 🧬 q-bio.NC · cs.AI· cs.IR· cs.NE

Recognition: no theorem link

The Geometry of Forgetting

Authors on Pith no claims yet

Pith reviewed 2026-05-14 23:22 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AIcs.IRcs.NE
keywords forgettingmemoryembeddingsinterferencefalse memoriespower-lawsemantic spaceDRM
0
0 comments X

The pith

Forgetting and false memories arise from the geometry of high-dimensional semantic embeddings under interference and proximity retrieval.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that applying generic noise, interference, and temporal degradation to standard production embeddings reproduces key quantitative features of human memory without any task-specific design. Power-law forgetting with exponent near 0.46 emerges specifically from competition among memories; the same decay process without competitors produces an exponent fifty times smaller. Cosine similarity applied directly to unmodified pre-trained embeddings also generates false memory rates matching human performance in the DRM paradigm. These results indicate that core memory phenomena are geometric properties of any system that organizes information by meaning and retrieves it by proximity.

Core claim

High-dimensional embedding spaces subjected to noise, interference, and temporal degradation reproduce quantitative signatures of human memory with no phenomenon-specific engineering. Power-law forgetting (b = 0.460 ± 0.183) arises from interference among competing memories, not decay; the identical function without competitors yields b ≈ 0.009. Production embeddings concentrate variance in roughly 16 effective dimensions, placing them in the interference-vulnerable regime. Cosine similarity on unmodified embeddings reproduces the DRM false alarm rate (0.583 versus human ~0.55) with zero parameter tuning.

What carries the argument

Interference among competing memories in high-dimensional embedding spaces degraded by noise, with retrieval performed by cosine similarity.

If this is right

  • Power-law forgetting will occur only when multiple memories compete in the embedding space.
  • False memory effects will appear at human-like rates in any semantic embedding system using proximity retrieval.
  • Temporal decay alone will produce negligible forgetting compared to interference.
  • Embeddings with low effective dimensionality will remain vulnerable to interference-driven memory errors.
  • Retrieval by semantic similarity will inherently generate certain error patterns observed in humans.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • AI systems that store information in embeddings may exhibit similar forgetting and confabulation as side effects of their geometry.
  • Biological memory could be constrained by the same low-effective-dimensionality regime if it uses proximity-based retrieval.
  • Varying embedding dimensionality in models offers a direct way to test predictions about memory capacity and error rates.
  • The geometric account could extend to other proximity-based cognitive effects such as priming or recognition.

Load-bearing premise

Generic noise, interference, and temporal degradation applied to production embedding models sufficiently capture the retrieval dynamics of human memory without additional biological or task-specific mechanisms.

What would settle it

Removing interference while retaining temporal degradation in the embedding model should reduce the forgetting exponent from approximately 0.46 to near zero.

Figures

Figures reproduced from arXiv: 2604.06222 by Andrey Starenky, Ashwin Gopinath, Nikhil Narasimhan, Sambartha Ray Barman, Sophia Bodnar.

Figure 1
Figure 1. Figure 1: Interference from competing memories produces human-like forgetting in low￾dimensional embedding spaces. a, Forgetting curves steepen progressively as competing memories are added (light to dark blue), converging toward the classical human forgetting curve (red dashed; Ebbinghaus, 1885). All curves at d = 64 with identical decay function. b, The fitted forgetting exponent b increases monotonically with com… view at source ↗
Figure 2
Figure 2. Figure 2: False memories arise from semantic clustering in the embedding space. a, UMAP projection of the SLEEP word list shows the critical lure (sleep, red star) falling within the cluster of studied words (blue dots), while unrelated words (grey) remain separated. b, At θ = 0.82, the critical-lure false alarm rate (0.583) closely matches the human value (∼0.55, red diamonds). c, Threshold operating curves show th… view at source ↗
Figure 3
Figure 3. Figure 3: Spaced repetition survives age-dependent degradation through recency of the most recent trace. a, Timeline showing repetition schedules: long-spaced items retain a recent trace at test (14 days old), while massed items have only old traces (30 days). Retention values shown at right. b, Retention increases monotonically with spacing interval (blue), matching the human ordering (red diamonds): long > medium … view at source ↗
Figure 4
Figure 4. Figure 4: Exploratory: topological structure of the memory manifold shows a connectivity transition with transient loop structure. a, Connected components (H0, blue) exhibit a sharp 1,000 → 1 transition while loops (H1, red) peak at 534 ± 24 at ϵ = 1.0 (shaded: phase transition zone). b, Persistence diagram showing long-lived topological features (far from diagonal) versus noise (near diagonal). c, UMAP visualizatio… view at source ↗
Figure 5
Figure 5. Figure 5: Lightweight projections align independent encoders for cross-modal retrieval. a, Cross-modal similarity matrix shows strong diagonal structure: image queries retrieve their matched text descriptions with high cosine similarity (dark diagonal), while unrelated pairs show low similarity. b, Recall at k ∈ {1, 5, 10} for image-to-text and text-to-image retrieval, exceeding random baseline by two orders of magn… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison to canonical human benchmarks. For each phenomenon, the observed value (blue dot, with 95% CI) is compared to the corresponding human reference (red diamond). Short connecting lines indicate close matches; longer lines indicate discrepancies. The forgetting exponent and DRM false alarm rate show the closest correspondence; other phenomena show qualitative but not tight quantitative a… view at source ↗
Figure 7
Figure 7. Figure 7: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Extended Data [PITH_FULL_IMAGE:figures/full_fig_p022_17.png] view at source ↗
read the original abstract

Why do we forget? Why do we remember things that never happened? The conventional answer points to biological hardware. We propose a different one: geometry. Here we show that high-dimensional embedding spaces, subjected to noise, interference, and temporal degradation, reproduce quantitative signatures of human memory with no phenomenon-specific engineering. Power-law forgetting ($b = 0.460 \pm 0.183$, human $b \approx 0.5$) arises from interference among competing memories, not from decay. The identical decay function without competitors yields $b \approx 0.009$, fifty times smaller. Time alone does not produce forgetting in this system. Competition does. Production embedding models (nominally 384--1{,}024 dimensions) concentrate their variance in only ${\sim}16$ effective dimensions, placing them deep in the interference-vulnerable regime. False memories require no engineering at all: cosine similarity on unmodified pre-trained embeddings reproduces the Deese--Roediger--McDermott false alarm rate ($0.583$ versus human ${\sim}0.55$) with zero parameter tuning and no boundary conditions. We did not build a false memory system. We found one already present in the raw geometry of semantic space. These results suggest that core memory phenomena are not bugs of biological implementation but features of any system that organizes information by meaning and retrieves it by proximity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that high-dimensional embedding spaces (384-1024 dimensions) subjected to generic noise, interference from competing memories, and temporal degradation reproduce key quantitative signatures of human memory without phenomenon-specific engineering: power-law forgetting with exponent b=0.460±0.183 (human ≈0.5) driven by interference rather than decay (control b≈0.009 without competitors), and DRM false-alarm rates of 0.583 (human ~0.55) via unmodified cosine similarity on pre-trained embeddings.

Significance. If the central results hold, the work provides evidence that forgetting and false-memory phenomena can emerge as geometric properties of semantic embedding spaces rather than requiring dedicated biological mechanisms, with potential implications for unifying cognitive models and embedding-based AI. The control experiment isolating interference and the zero-parameter DRM match are strengths that, if rigorously documented, would strengthen the case for geometry as a sufficient explanation.

major comments (3)
  1. [Methods (interference and degradation)] Methods section on interference model: the selection criteria for competing memories (number, sampling from the embedding space, semantic similarity thresholds) and the exact noise distribution are not specified in sufficient detail. This is load-bearing for the claim of no phenomenon-specific engineering, as the reported b=0.460±0.183 and the contrast to the b≈0.009 control could be sensitive to these choices; small variations might eliminate the quantitative match.
  2. [Results (forgetting curves)] Results on power-law forgetting: the uncertainty interval ±0.183 around b=0.460 is wide enough that the match to human b≈0.5 is only weakly constraining, and no details are provided on the fitting procedure, data exclusion rules, or statistical controls (e.g., number of trials, bootstrap method). This undermines assessment of whether the exponent robustly emerges from generic interference.
  3. [Results (DRM false alarms)] False-memory results: while cosine similarity on unmodified embeddings is presented as reproducing the DRM rate (0.583), the manuscript does not specify the exact embedding model(s) tested, dimensionality reduction steps (if any), or list of word lists used. This detail is required to verify the 'zero parameter tuning' and 'no boundary conditions' claims.
minor comments (2)
  1. [Introduction / Results] The abstract states that production models concentrate variance in ~16 effective dimensions; the main text should include the precise method (e.g., participation ratio or eigenvalue threshold) used to arrive at this number and report it for each model tested.
  2. [Methods] Notation for the decay function and interference term should be defined explicitly with equations, as the text refers to 'the identical decay function without competitors' without showing the functional form.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and positive review, which highlights both the potential significance of the geometric account and the need for greater methodological transparency. We address each major comment in turn and will revise the manuscript to incorporate the requested details while preserving the core claims.

read point-by-point responses
  1. Referee: Methods section on interference model: the selection criteria for competing memories (number, sampling from the embedding space, semantic similarity thresholds) and the exact noise distribution are not specified in sufficient detail. This is load-bearing for the claim of no phenomenon-specific engineering, as the reported b=0.460±0.183 and the contrast to the b≈0.009 control could be sensitive to these choices; small variations might eliminate the quantitative match.

    Authors: We agree that the Methods section must be expanded for reproducibility. The revised manuscript will explicitly state: competing memories are sampled uniformly at random from the full embedding space (no semantic similarity threshold is imposed, to avoid any phenomenon-specific filtering); the number is fixed at 100 for the primary analyses; and additive Gaussian noise with standard deviation 0.05 is applied to each memory vector. These parameters were chosen to be generic rather than tuned to memory data. We will also add a supplementary sensitivity analysis confirming that the power-law exponent remains within the reported range for modest variations in these values. revision: yes

  2. Referee: Results on power-law forgetting: the uncertainty interval ±0.183 around b=0.460 is wide enough that the match to human b≈0.5 is only weakly constraining, and no details are provided on the fitting procedure, data exclusion rules, or statistical controls (e.g., number of trials, bootstrap method). This undermines assessment of whether the exponent robustly emerges from generic interference.

    Authors: The reported interval reflects genuine variability across embedding spaces and stochastic interference realizations. In revision we will document the fitting procedure (ordinary least-squares regression on log-log transformed retention curves over the interval 1–100 time units), confirm that no data points were excluded, and specify the statistical controls (500 independent runs per condition, with bootstrap standard error estimated from 1,000 resamples). The key contrast with the no-competitor control (b≈0.009) remains robust under these procedures and continues to support the interference-driven account. revision: yes

  3. Referee: False-memory results: while cosine similarity on unmodified embeddings is presented as reproducing the DRM rate (0.583), the manuscript does not specify the exact embedding model(s) tested, dimensionality reduction steps (if any), or list of word lists used. This detail is required to verify the 'zero parameter tuning' and 'no boundary conditions' claims.

    Authors: We will add the missing specifications: results are shown for two production models (sentence-transformers/all-MiniLM-L6-v2 at 384 dimensions and OpenAI text-embedding-ada-002 at 1536 dimensions) with no dimensionality reduction performed; cosine similarity is applied directly to the raw vectors. The complete set of 20 DRM word lists used will be provided in a supplementary table. These additions confirm that the reported false-alarm rate was obtained with zero parameter tuning. revision: yes

Circularity Check

0 steps flagged

No significant circularity: results emerge directly from pre-trained embeddings and generic operations

full rationale

The paper's central results rely on applying standard cosine similarity and generic noise/interference/degradation to unmodified production embedding models (384-1024 dimensions). The DRM false-alarm rate is obtained with zero parameter tuning and no additional mechanisms. The power-law forgetting exponent is derived from a simulation that includes a control condition showing b drops to ~0.009 without competitors, demonstrating the effect is due to interference rather than decay or fitting. No equations reduce the reported b or false-alarm values to post-hoc fits of the target data, no self-citations are load-bearing for the uniqueness of the geometry, and no ansatz or renaming is invoked. The derivation remains independent of the human signatures it reproduces.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard properties of pre-trained embedding spaces and cosine similarity; no new free parameters, axioms beyond basic vector geometry, or invented entities are introduced.

axioms (2)
  • domain assumption Semantic proximity is captured by cosine similarity in embedding space
    Invoked for the false-memory reproduction using unmodified embeddings.
  • domain assumption Interference among nearby vectors produces retrieval failure that follows a power law
    Core mechanism claimed to generate the forgetting curve.

pith-pipeline@v0.9.0 · 5563 in / 1392 out tokens · 44150 ms · 2026-05-14T23:22:34.434449+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

  1. [1]

    Anderson, J. R. & Schooler, L. J. Reflections of the environment in memory.Psychological Science 2, 396–408 (1991)

  2. [2]

    The episodic buffer: a new component of working memory?Trends in Cognitive Sciences4, 417–423 (2000)

    Baddeley, A. The episodic buffer: a new component of working memory?Trends in Cognitive Sciences4, 417–423 (2000)

  3. [3]

    Bjork, R. A. & Bjork, E. L. A new theory of disuse and an old theory of stimulus fluctuation. In From Learning Processes to Cognitive Processes: Essays in Honor of William K. EstesVol. 2, 35–67 (Erlbaum, 1992)

  4. [4]

    tip of the tongue

    Brown, R. & McNeill, D. The “tip of the tongue” phenomenon.Journal of Verbal Learning and Verbal Behavior5, 325–337 (1966)

  5. [5]

    J., Pashler, H., Vul, E., Wixted, J

    Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T. & Rohrer, D. Distributed practice in verbal recall tasks: a review and quantitative synthesis.Psychological Bulletin132, 354–380 (2006)

  6. [6]

    ¨Uber das Ged¨ achtnis: Untersuchungen zur experimentellen Psychologie(Duncker & Humblot, 1885)

    Ebbinghaus, H. ¨Uber das Ged¨ achtnis: Untersuchungen zur experimentellen Psychologie(Duncker & Humblot, 1885). 14

  7. [7]

    The GUDHI Project.GUDHI User and Reference Manual(GUDHI Editorial Board, 2014)

  8. [8]

    Kirkpatrick, J.et al.Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences114, 3521–3526 (2017)

  9. [9]

    & McClelland, J

    Kumaran, D. & McClelland, J. L. Generalization through the recurrent interaction of episodic memories: a model of the hippocampal system.Psychological Review119, 573–616 (2012)

  10. [10]

    L., McNaughton, B

    McClelland, J. L., McNaughton, B. L. & O’Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory.Psychological Review102, 419–457 (1995)

  11. [11]

    & Cohen, N

    McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: the sequential learning problem. InPsychology of Learning and MotivationVol. 24, 109–165 (Academic Press, 1989)

  12. [12]

    Murdock, B. B. The serial position effect of free recall.Journal of Experimental Psychology64, 482–488 (1962)

  13. [13]

    & Moscovitch, M

    Nadel, L. & Moscovitch, M. Memory consolidation, retrograde amnesia and the hippocampal com- plex.Current Opinion in Neurobiology7, 217–227 (1997)

  14. [14]

    Memory traces unbound.Trends in Neurosciences26, 65–72 (2003)

    Nader, K. Memory traces unbound.Trends in Neurosciences26, 65–72 (2003)

  15. [15]

    Yang, A.et al.Qwen2.5 technical report.arXiv preprintarXiv:2412.15115 (2024)

  16. [16]

    Radford, A.et al.Learning transferable visual models from natural language supervision. InProc. ICML8748–8763 (2021)

  17. [17]

    & Gurevych, I

    Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using Siamese BERT-networks. InProc. EMNLP–IJCNLP3982–3992 (2019)

  18. [18]

    Roediger, H. L. & McDermott, K. B. Creating false memories: remembering words not presented in lists.Journal of Experimental Psychology: Learning, Memory, and Cognition21, 803–814 (1995)

  19. [19]

    Episodic and semantic memory

    Tulving, E. Episodic and semantic memory. InOrganization of Memory381–403 (Academic Press, 1972)

  20. [20]

    InAdvances in Neural Information Processing Systems 5998–6008 (2017)

    Vaswani, A.et al.Attention is all you need. InAdvances in Neural Information Processing Systems 5998–6008 (2017)

  21. [21]

    Wixted, J. T. & Ebbesen, E. B. On the form of forgetting.Psychological Science2, 409–415 (1991)

  22. [22]

    & Harris, K

    Stringer, C., Pachitariu, M., Steinmetz, N., Carandini, M. & Harris, K. D. High-dimensional geometry of population responses in visual cortex.Nature571, 361–365 (2019)

  23. [23]

    Gao, P.et al.A theory of multineuronal dimensionality, dynamics and measurement.bioRxiv preprint doi:10.1101/214262 (2017)

  24. [24]

    Xiao, S.et al.C-Pack: packaged resources to advance general Chinese embedding.arXiv preprint arXiv:2309.07597 (2023). 15 Extended Data Text Input MiniLM / BGE-large Image Input CLIP ViT-B/32 HIDE Embedding Space Retrieval → Answer Qwen2.5-7B Temporal decay Interference Noise HIDE System Architecture Figure 7:Extended Data Fig. 1: System architecture.Conce...