pith. sign in

arxiv: 2606.31495 · v1 · pith:KKKSEY7Pnew · submitted 2026-06-30 · 💻 cs.AI · cs.LG

Surprise as a Signal for Plasticity and Metacognition

Pith reviewed 2026-07-01 05:39 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords continual learningmetacognitionprediction errorepisodic memoryvision-language modelssurprise signalplasticityfew-shot learning
0
0 comments X

The pith

A prediction-error signal from a small predictor on frozen latents can gate when to store new concepts and let models assess their own knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the idea that a simple surprise measure, computed as prediction error by a small auxiliary model on the fixed latent outputs of a frozen encoder, can control both plasticity and self-monitoring. In the first demonstration an episodic memory adds new visual concepts only when surprise is high and later consolidates them through replay, yielding large retention gains on the oldest classes in a 1000-class ImageNet stream. In the second demonstration the identical signal makes a vision-language model answer confidently on known items, hedge on partial familiarity, and request an explanation plus learn the concept when the item is novel. If the signal works as described, the same low-cost mechanism could support both long-term adaptation without encoder retraining and more reliable self-assessment than the model's own verbal confidence.

Core claim

The paper claims that a prediction-error signal computed by a small predictor over the latent space of a frozen encoder can serve both as a gate on plasticity and as a substrate for metacognition, shown in an episodic-memory continual-learning system that selectively writes and replays traces and in a vision-language system whose responses are modulated by the same signal to produce assertive, hedging, or clarification-seeking behavior.

What carries the argument

The surprise signal, defined as the prediction error produced by a small predictor trained on the latent space of a frozen encoder.

If this is right

  • Selective writing triggered by high surprise plus periodic replay recovers 17.7 points of retention on the oldest classes for a DINOv2 backbone and 51.3 points for an I-JEPA backbone in a 1000-class continual stream.
  • The same memory reaches 91.6 percent accuracy in 5-way 1-shot evaluation on mini-ImageNet while exposing greater difficulty in a 500-way regime.
  • In the vision-language setting the surprise signal produces assertive answers on known concepts, hedging on partial familiarity, and refusal-plus-explanation request on novel concepts, with an external detector achieving 0.966 AUROC for known-versus-novel separation.
  • After a consolidation sleep phase the system recalls 99.2 percent of fifty taught facts from the consolidated store while an unmodified base model recalls none.
  • The model's own verbalised confidence and token-level confidence under greedy decoding both perform far below the external surprise detector.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the surprise signal proves stable across additional encoders and modalities, it could support long-running agents that adapt without repeated full-model updates.
  • Pairing the signal with user-provided explanations on refusal could create a lightweight loop for on-the-fly personalization of vision-language models.
  • The separation between the external detector and the model's internal confidence suggests that explicit surprise monitoring might be added as an independent module rather than relying on the base model to self-report uncertainty.

Load-bearing premise

The surprise signal derived from the small predictor on frozen latents reliably distinguishes novelty in a way that produces the reported retention and AUROC gains rather than from dataset-specific tuning or the choice of replay window.

What would settle it

If an experiment on a fresh stream of image classes with a different frozen encoder shows that surprise-gated writing plus replay produces no retention improvement over a non-surprise baseline, or if the AUROC for known-versus-novel separation falls to chance levels, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2606.31495 by Louis Mouchon.

Figure 1
Figure 1. Figure 1: System 1. A frozen backbone feeds a JEPA surprise detector that gates writes to a fast non-parametric [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Few-shot accuracy. In the 5-way regime the frozen-backbone prototype memory exceeds a task-specific baseline; [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Retention of the oldest classes after 50 sequential tasks at 1000 classes. Sleep recovers most of the gap to the i.i.d. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Memory ablations at 1000 classes (DINOv2). A sliding-window replay (V2) is worse than no replay at all. Surprise [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: System 2. SigLIP computes a novelty score and BGE-M3 retrieves facts; the score selects one of three behavioural [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Behavioural regimes as a function of the external novelty score. The thresholds at 0.35 and 0.65 separate assertive, [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Separating known from novel concepts. The external frozen detector (AUROC 0.966, 95% CI ±0.024) outperforms [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The three-tier memory. Working memory tracks the dialogue; high-surprise inputs are written to the hippocampus; [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Post-sleep retention at scale. After teaching fifty facts, sleeping, emptying the hippocampus and clearing the [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Correction task: number of three corrected facts answered with the corrected value rather than the model’s [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
read the original abstract

We study a single idea across two settings: that a prediction-error signal, computed by a small predictor over the latent space of a frozen encoder, can serve both as a gate on plasticity and as a substrate for metacognition. In the first system, a non-parametric episodic memory writes a new concept only when this surprise is high, and a periodic offline replay phase consolidates recent traces into a slow linear readout. On a continual stream of 1000 ImageNet classes with a frozen DINOv2 or I-JEPA backbone, the consolidation phase recovers 17.7 points of retention on the oldest classes for DINOv2 and 51.3 points for I-JEPA (single-seed runs), and an ablation shows that replaying only a recent window is worse than no replay at all. In few-shot evaluation the same memory reaches 91.6% on 5-way 1-shot mini-ImageNet, above a task-specific baseline, while a harder 500-way regime exposes the true difficulty. In the second system, the same surprise signal, computed in a shared text-image space, modulates the behaviour of a vision-language model: it answers assertively when a concept is known, hedges when it is partially familiar, and refuses to identify the object and asks for an explanation when it is novel, learning the concept from a single user utterance. The external detector separates known from novel concepts at an AUROC of 0.966 (95% CI +/-0.024), far above the model's own verbalised confidence (0.618), while its token-level confidence sits below chance under greedy decoding; after a sleep phase that empties the fast store, the system recalls 99.2% of fifty taught facts from the consolidated store while a base model recovers none. We report both systems as proof-of-concept, with explicit limitations, and position the second against recent episodic-memory and personalised-VLM work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that a prediction-error (surprise) signal computed by a small predictor over the latent space of a frozen encoder can gate plasticity in a non-parametric episodic memory system (with periodic offline replay consolidating traces into a slow linear readout) and serve as a metacognitive substrate in a vision-language model. On continual ImageNet streams the system reports retention gains of 17.7 points (DINOv2) and 51.3 points (I-JEPA) on oldest classes; on mini-ImageNet it reaches 91.6% in 5-way 1-shot; the VLM component uses the same signal to answer assertively, hedge, or refuse and query when novel, achieving 0.966 AUROC for known/novel separation and 99.2% recall of taught facts after consolidation.

Significance. If the surprise signal is shown to be causally responsible, the work offers a lightweight, interpretable mechanism that unifies plasticity control and self-monitoring without retraining the backbone. Strengths include the use of frozen encoders, explicit positioning as proof-of-concept with stated limitations, and concrete baselines on standard datasets. The approach could inform continual-learning and personalized-VLM research if the reported gains are robustly attributable to the prediction-error computation rather than auxiliary design choices.

major comments (3)
  1. [Abstract] Abstract: retention gains of 17.7/51.3 points and 99.2% recall are reported from single-seed runs with no error bars or multi-seed statistics; this weakens support for the central claim that the surprise signal produces the observed plasticity and metacognitive effects.
  2. [Abstract] Abstract: the single ablation (recent-window replay worse than none) does not include a control that disables or randomizes the surprise predictor while holding the replay schedule fixed; without this, it remains unclear whether the prediction-error signal is causally necessary for the gating effect or whether gains arise from the replay window or predictor tuning.
  3. [Abstract] Abstract: the VLM results claim the surprise signal modulates assertive/hedging/refusal behavior and yields 0.966 AUROC (vs. 0.618 verbalised confidence), yet no ablation is described that removes the surprise computation while preserving other components; this is load-bearing for the metacognition claim.
minor comments (1)
  1. [Abstract] The abstract states that explicit limitations are reported; these should be expanded with concrete scope boundaries in the main text to aid readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and agree that additional statistical reporting and targeted ablations are needed to better support the causal role of the surprise signal.

read point-by-point responses
  1. Referee: [Abstract] Abstract: retention gains of 17.7/51.3 points and 99.2% recall are reported from single-seed runs with no error bars or multi-seed statistics; this weakens support for the central claim that the surprise signal produces the observed plasticity and metacognitive effects.

    Authors: We agree that single-seed reporting without error bars weakens the evidential support. In revision we will rerun the core continual ImageNet and VLM experiments over at least three seeds and report means with standard deviations for the retention, few-shot, and AUROC metrics. revision: yes

  2. Referee: [Abstract] Abstract: the single ablation (recent-window replay worse than none) does not include a control that disables or randomizes the surprise predictor while holding the replay schedule fixed; without this, it remains unclear whether the prediction-error signal is causally necessary for the gating effect or whether gains arise from the replay window or predictor tuning.

    Authors: We accept that the existing ablation does not isolate the surprise predictor. We will add a new condition in which the predictor is either disabled or its outputs are replaced by random values while the replay schedule, memory write threshold, and consolidation procedure remain unchanged. revision: yes

  3. Referee: [Abstract] Abstract: the VLM results claim the surprise signal modulates assertive/hedging/refusal behavior and yields 0.966 AUROC (vs. 0.618 verbalised confidence), yet no ablation is described that removes the surprise computation while preserving other components; this is load-bearing for the metacognition claim.

    Authors: We agree that an ablation removing the surprise computation is required to substantiate the metacognitive claim. We will add a control in which the VLM always uses its internal verbalised confidence (or a fixed threshold) instead of the external surprise signal, and report the resulting AUROC and refusal behavior. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical systems evaluated on standard benchmarks without self-referential derivations

full rationale

The paper presents two proof-of-concept systems that compute a surprise signal from a small predictor on frozen encoder latents and use it for memory gating or VLM modulation. All reported outcomes (retention gains of 17.7/51.3 points, 91.6% 5-way 1-shot accuracy, AUROC 0.966) are direct empirical measurements against explicit baselines on ImageNet and mini-ImageNet, accompanied by ablations. No equations, parameter fits, or self-citations are described that would reduce these quantities to inputs by construction; the surprise computation is an independent module whose causal contribution is tested via replay variants rather than assumed tautologically. The derivation chain is therefore self-contained against external data.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on standard assumptions about latent spaces of pretrained encoders and the utility of prediction error; no new physical entities are postulated and free parameters are limited to implementation choices.

free parameters (2)
  • surprise threshold for memory write
    Determines when a new concept is stored; value must be chosen or tuned to achieve the reported retention.
  • replay window length and consolidation schedule
    Controls which traces are replayed; the ablation implies this choice affects outcomes.
axioms (1)
  • domain assumption Latent spaces of frozen encoders such as DINOv2 and I-JEPA contain structure sufficient for a small predictor to compute useful surprise signals.
    Invoked for both systems in the abstract.

pith-pipeline@v0.9.1-grok · 5880 in / 1469 out tokens · 30002 ms · 2026-07-01T05:39:29.051022+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 4 canonical work pages · 4 internal anchors

  1. [1]

    Why There Are Complementary Learning Systems in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory,

    J. L. McClelland, B. L. McNaughton, and R. C. O'Reilly, “Why There Are Complementary Learning Systems in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory,” Psychological Review, 1995

  2. [2]

    What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated,

    D. Kumaran, D. Hassabis, and J. L. McClelland, “What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated,” Trends in Cognitive Sciences , 2016

  3. [3]

    Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture,

    M. Assran et al., “Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture,” CVPR, 2023

  4. [4]

    Loss of Plasticity in Deep Continual Learning,

    S. Dohare, J. F. Hernandez-Garcia, Q. Lan, P. Rahman, A. R. Mahmood, and R. S. Sutton, “Loss of Plasticity in Deep Continual Learning,” Nature, 2024

  5. [5]

    DINOv2: Learning Robust Visual Features without Supervision,

    M. Oquab et al., “DINOv2: Learning Robust Visual Features without Supervision,” TMLR, 2024

  6. [6]

    Sigmoid Loss for Language Image Pre- Training,

    X. Zhai, B. Mustafa, A. Kolesnikov, and L. Beyer, “Sigmoid Loss for Language Image Pre- Training,” ICCV, 2023

  7. [7]

    BGE M3-Embedding: Multi- Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation,

    J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu, “BGE M3-Embedding: Multi- Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation,” ACL Findings, 2024

  8. [8]

    Brain-Inspired Replay for Continual Learning with Artificial Neural Networks,

    G. M. van de Ven, H. T. Siegelmann, and A. S. Tolias, “Brain-Inspired Replay for Continual Learning with Artificial Neural Networks,” Nature Communications, 2020

  9. [9]

    Overcoming Catastrophic Forgetting in Neural Networks,

    J. Kirkpatrick et al., “Overcoming Catastrophic Forgetting in Neural Networks,” PNAS, 2017

  10. [10]

    Titans: Learning to Memorize at Test Time

    A. Behrouz, P. Zhong, and V. Mirrokni, “Titans: Learning to Memorize at Test Time,” arXiv:2501.00663, 2025

  11. [11]

    Larimar: Large Language Models with Episodic Memory Control,

    P. Das et al., “Larimar: Large Language Models with Episodic Memory Control,” ICML, 2024

  12. [12]

    MemGPT: Towards LLMs as Operating Systems

    C. Packer et al., “MemGPT: Towards LLMs as Operating Systems,” arXiv:2310.08560, 2023

  13. [13]

    Prototypical Networks for Few-shot Learning,

    J. Snell, K. Swersky, and R. Zemel, “Prototypical Networks for Few-shot Learning,” in NeurIPS, 2017. 13 / 14 Independent Research SGM: Surprise-Gated Memory

  14. [14]

    Matching Networks for One Shot Learning,

    O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, and others, “Matching Networks for One Shot Learning,” in NeurIPS, 2016

  15. [15]

    DeepEMD: Few-Shot Image Classification with Differentiable Earth Mover's Distance and Structured Classifiers,

    C. Zhang, Y. Cai, G. Lin, and C. Shen, “DeepEMD: Few-Shot Image Classification with Differentiable Earth Mover's Distance and Structured Classifiers,” in CVPR, 2020

  16. [16]

    MyVLM: Personalizing VLMs for User-Specific Queries,

    Y. Alaluf, E. Richardson, S. Tulyakov, K. Aberman, and D. Cohen-Or, “MyVLM: Personalizing VLMs for User-Specific Queries,” in ECCV, 2024

  17. [17]

    Yo'LLaVA: Your Personalized Language and Vision Assistant,

    T. Nguyen, H. Liu, Y. Li, M. Cai, U. Ojha, and Y. J. Lee, “Yo'LLaVA: Your Personalized Language and Vision Assistant,” in NeurIPS, 2024

  18. [18]

    Qwen2.5-VL Technical Report

    S. Bai, K. Chen, X. Liu, and others, “Qwen2.5-VL Technical Report,” arXiv:2502.13923, 2025

  19. [19]

    Gemma 4: Model Card and Technical Report

    Gemma Team, Google DeepMind, “Gemma 4: Model Card and Technical Report.” 2026

  20. [20]

    SCM: Sleep-Consolidated Memory with Algorithmic Forgetting for Large Language Models

    S. S. Shinde, “SCM: Sleep-Consolidated Memory with Algorithmic Forgetting for Large Language Models,” arXiv:2604.20943, 2026. 14 / 14