pith. sign in

arxiv: 2510.21588 · v2 · submitted 2025-10-24 · 🧬 q-bio.NC · cs.LG

Contribution of task-irrelevant stimuli to drift of neural representations

Pith reviewed 2026-05-18 05:12 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.LG
keywords representational drifttask-irrelevant stimulionline learningneural representationsHebbian learninggradient descentlifelong learning
0
0 comments X

The pith

Task-irrelevant stimuli that learners ignore still produce gradual long-term drift in representations of relevant stimuli.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines representational drift, where neural codes for stimuli change over time even as task performance stays stable. It demonstrates that in continuous online learning, the noise from task-irrelevant inputs leaks into weight updates and shifts the representation of task-relevant inputs. Theory and simulations across Hebbian rules and gradient descent show the drift rate rises with the variance and dimension of the irrelevant data subspace. This source of drift yields distinct predictions for geometry and scaling compared with random synaptic noise. The work ties stimulus structure and task demands directly to observed drift in both biological and artificial systems.

Core claim

In an online learning setup with a mixed stream of inputs, the component of the data that the agent learns to treat as irrelevant still injects persistent noise into synaptic updates, producing cumulative drift in the representation of the relevant component. This occurs under Oja's rule, similarity matching, autoencoder gradient descent, and supervised two-layer networks, with the drift rate scaling positively with the variance and dimensionality of the irrelevant subspace.

What carries the argument

Online weight updates on a continuous mixed stream of task-relevant and task-irrelevant vectors, where the irrelevant subspace contributes additive noise to the representation of the relevant subspace despite being ignored for the task objective.

If this is right

  • Drift rate grows monotonically with the variance and dimension of the task-irrelevant data subspace.
  • The effect appears consistently across Hebbian learning rules and stochastic gradient descent applied to autoencoders and supervised networks.
  • Geometry and dimension scaling of drift differ qualitatively from those produced by additive Gaussian synaptic noise.
  • Drift measurements could be used to infer which aspects of the input an agent is treating as irrelevant in a given context.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the mechanism holds, artificial lifelong learners may need explicit input segregation rather than relying solely on learning to ignore irrelevant features.
  • The same drift source could interact with other noise processes in real neural circuits, producing observable signatures that combine both.
  • Experimental designs that control the statistics of distractor stimuli during learning could test whether drift rates match the predicted dependence on irrelevant variance.

Load-bearing premise

Learning proceeds by continuous weight updates on an unsegregated stream of relevant and irrelevant stimuli, with no separate gating or blocking mechanism that would stop irrelevant inputs from influencing the updates.

What would settle it

Record drift magnitude in a network or brain region while parametrically varying the variance or dimension of added task-irrelevant stimuli; if the observed drift rate fails to increase with that variance or dimension, the proposed mechanism is falsified.

Figures

Figures reproduced from arXiv: 2510.21588 by Farhad Pashakhanloo.

Figure 1
Figure 1. Figure 1: Demonstration of the effect of task-irrelevant stimuli on representational drift. A one-layer [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Neural network models studied in this work. a) Multi-dimensional Oja network, b) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Drift rate for different architectures as a function of: a) learning rate [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example of drift in a non-linear network. a) Schematic of a two-layer network trained to [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Drift in MNIST data. a) Two snapshots of hidden layer representations for 10 sample digits [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of drift induced by learning noise from task-irrelevant data and by Gaussian [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
read the original abstract

Biological and artificial learners are inherently exposed to a stream of data and experience throughout their lifetimes and must constantly adapt to, learn from, or selectively ignore the ongoing input. Recent findings reveal that, even when the performance remains stable, the underlying neural representations can change gradually over time, a phenomenon known as representational drift. Studying the different sources of data and noise that may contribute to drift is essential for understanding lifelong learning in neural systems. However, a systematic study of drift across architectures and learning rules, and the connection to task, are missing. Here, in an online learning setup, we characterize drift as a function of data distribution, and specifically show that the learning noise induced by task-irrelevant stimuli, which the agent learns to ignore in a given context, can create long-term drift in the representation of task-relevant stimuli. Using theory and simulations, we demonstrate this phenomenon both in Hebbian-based learning -- Oja's rule and Similarity Matching -- and in stochastic gradient descent applied to autoencoders and a supervised two-layer network. We consistently observe that the drift rate increases with the variance and the dimension of the data in the task-irrelevant subspace. We further show that this yields different qualitative predictions for the geometry and dimension-dependency of drift than those arising from Gaussian synaptic noise. Overall, our study links the structure of stimuli, task, and learning rule to representational drift and could pave the way for using drift as a signal for uncovering underlying computation in the brain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that in an online mixed-stream learning setup, task-irrelevant stimuli induce learning noise that produces long-term representational drift in task-relevant stimuli even after the system has learned to ignore the irrelevant inputs for task performance. This is demonstrated via theory and simulations across Hebbian rules (Oja's rule, similarity matching) and SGD on autoencoders plus a supervised two-layer network, with the drift rate increasing as a function of variance and dimension in the irrelevant subspace and yielding distinct geometric predictions from those of Gaussian synaptic noise.

Significance. If the central results hold, the work supplies a concrete, input-driven mechanism for representational drift under stable behavior, directly linking stimulus statistics, task structure, and learning rule. The cross-architecture simulations and explicit contrast with synaptic-noise predictions are strengths that could generate falsifiable experimental tests in neuroscience; the absence of free parameters in the core derivations further strengthens the contribution.

major comments (1)
  1. [Simulation results (supervised network and autoencoder)] Simulation results for the supervised two-layer network and autoencoder: the central claim requires that task performance remains high and flat after initial convergence while drift continues indefinitely in the relevant subspace. The manuscript provides no explicit numerical check (e.g., loss/accuracy trajectories or subspace projection metrics over long timescales) confirming that performance plateaus in the mixed-stream regime; without this, it is unclear whether the irrelevant inputs truly leave task performance unaffected while still driving ongoing weight updates.
minor comments (2)
  1. [Abstract] Abstract: the term 'learning noise' is used without an immediate operational definition; a brief parenthetical clarifying that it refers to the residual Hebbian/SGD updates from the irrelevant component would improve immediate readability.
  2. [Methods] Methods: data-generation and update-rule parameters (learning rates, subspace variances, input dimensions, number of trials) should be tabulated for each architecture to support reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for identifying an important point regarding the clarity of our simulation results. We address the major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: Simulation results (supervised network and autoencoder)] Simulation results for the supervised two-layer network and autoencoder: the central claim requires that task performance remains high and flat after initial convergence while drift continues indefinitely in the relevant subspace. The manuscript provides no explicit numerical check (e.g., loss/accuracy trajectories or subspace projection metrics over long timescales) confirming that performance plateaus in the mixed-stream regime; without this, it is unclear whether the irrelevant inputs truly leave task performance unaffected while still driving ongoing weight updates.

    Authors: We agree that explicit verification of stable task performance alongside ongoing drift is essential for supporting the central claim. In the simulations presented, task performance (measured via loss or accuracy) converges rapidly and remains high and flat in the mixed-stream regime, while drift in the task-relevant subspace persists due to the ongoing updates from irrelevant inputs. However, we acknowledge that the manuscript does not include dedicated long-timescale trajectories or subspace projection metrics to make this explicit. In the revised version, we will add figures showing performance metrics (e.g., loss/accuracy) and relevant subspace projections over extended training periods for both the supervised network and autoencoder. These will confirm that performance plateaus while drift continues, directly addressing the concern and clarifying that irrelevant inputs drive weight updates without degrading task performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; drift characterization follows directly from standard update rules on mixed data streams

full rationale

The paper derives representational drift by applying standard learning rules (Oja's rule, similarity matching, SGD on autoencoders and supervised networks) to an online stream containing both task-relevant and task-irrelevant stimuli. Drift rate is shown to increase with variance and dimension of the irrelevant subspace through explicit theory and simulations. No load-bearing step reduces by construction to a fitted parameter, self-defined quantity, or self-citation chain; the central claim is obtained by direct substitution of the mixed input distribution into the update equations without re-labeling inputs as predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions of online learning and the existence of separable task-relevant and task-irrelevant subspaces; no new entities or fitted parameters are introduced in the abstract.

axioms (1)
  • domain assumption Continuous online exposure to a data stream containing both task-relevant and task-irrelevant stimuli
    Invoked in the description of the learning setup and the source of learning noise.

pith-pipeline@v0.9.0 · 5793 in / 1169 out tokens · 26700 ms · 2026-05-18T05:12:09.421343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 5 internal anchors

  1. [1]

    Continual lifelong learning with neural networks: A review.Neural networks, 113:54–71, 2019

    German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review.Neural networks, 113:54–71, 2019

  2. [2]

    Continual task learning in natural and artificial agents.Trends in neurosciences, 46(3):199–210, 2023

    Timo Flesch, Andrew Saxe, and Christopher Summerfield. Continual task learning in natural and artificial agents.Trends in neurosciences, 46(3):199–210, 2023

  3. [3]

    Representational drift in primary olfactory cortex.Nature, pages 1–6, 2021

    Carl E Schoonover, Sarah N Ohashi, Richard Axel, and Andrew JP Fink. Representational drift in primary olfactory cortex.Nature, pages 1–6, 2021

  4. [4]

    Long-term dynamics of ca1 hippocampal place codes.Nature neuroscience, 16(3):264–266, 2013

    Yaniv Ziv, Laurie D Burns, Eric D Cocker, Elizabeth O Hamel, Kunal K Ghosh, Lacey J Kitch, Ab- bas El Gamal, and Mark J Schnitzer. Long-term dynamics of ca1 hippocampal place codes.Nature neuroscience, 16(3):264–266, 2013

  5. [5]

    Causes and consequences of representational drift.Current opinion in neurobiology, 58:141–147, 2019

    Michael E Rule, Timothy O’Leary, and Christopher D Harvey. Causes and consequences of representational drift.Current opinion in neurobiology, 58:141–147, 2019

  6. [6]

    Representational drift in the mouse visual cortex.Current Biology, 31(19):4327–4339, 2021

    Daniel Deitch, Alon Rubin, and Yaniv Ziv. Representational drift in the mouse visual cortex.Current Biology, 31(19):4327–4339, 2021

  7. [7]

    Representational drift: Emerging theories for continual learning and experimental future directions.Current Opinion in Neurobiology, 76:102609, 2022

    Laura N Driscoll, Lea Duncker, and Christopher D Harvey. Representational drift: Emerging theories for continual learning and experimental future directions.Current Opinion in Neurobiology, 76:102609, 2022

  8. [8]

    Drifting neuronal representations: Bug or feature? Biological cybernetics, pages 1–14, 2022

    Paul Masset, Shanshan Qin, and Jacob A Zavatone-Veth. Drifting neuronal representations: Bug or feature? Biological cybernetics, pages 1–14, 2022

  9. [9]

    Coordinated drift of receptive fields in hebbian/anti-hebbian network models during noisy representation learning.Nature Neuroscience, pages 1–11, 2023

    Shanshan Qin, Shiva Farashahi, David Lipshutz, Anirvan M Sengupta, Dmitri B Chklovskii, and Cengiz Pehlevan. Coordinated drift of receptive fields in hebbian/anti-hebbian network models during noisy representation learning.Nature Neuroscience, pages 1–11, 2023

  10. [10]

    The geometry of representational drift in natural and artificial neural networks.PLOS Computational Biology, 18(11):e1010716, 2022

    Kyle Aitken, Marina Garrett, Shawn Olsen, and Stefan Mihalas. The geometry of representational drift in natural and artificial neural networks.PLOS Computational Biology, 18(11):e1010716, 2022

  11. [11]

    Representational drift as a result of implicit regularization

    Aviv Ratzon, Dori Derdikman, and Omri Barak. Representational drift as a result of implicit regularization. Elife, 12:RP90069, 2024

  12. [12]

    Stability through plasticity: Finding robust memories through representational drift.bioRxiv, pages 2024–12, 2024

    Maanasa Natrajan and James E Fitzgerald. Stability through plasticity: Finding robust memories through representational drift.bioRxiv, pages 2024–12, 2024

  13. [13]

    Representa- tional drift reflects ongoing balancing of stochastic changes by hebbian learning.bioRxiv, pages 2025–01, 2025

    Jens-Bastian Eppler, Thomas Lai, Dominik Aschauer, Simon Rumpel, and Matthias Kaschube. Representa- tional drift reflects ongoing balancing of stochastic changes by hebbian learning.bioRxiv, pages 2025–01, 2025

  14. [14]

    Representational drift as the consequence of ongoing memory storage.bioRxiv, pages 2024–06, 2024

    Federico Devalle, Licheng Zou, Gloria Cecchini, and Alex Roxin. Representational drift as the consequence of ongoing memory storage.bioRxiv, pages 2024–06, 2024

  15. [15]

    Network plasticity as bayesian inference.PLoS computational biology, 11(11):e1004485, 2015

    David Kappel, Stefan Habenschuss, Robert Legenstein, and Wolfgang Maass. Network plasticity as bayesian inference.PLoS computational biology, 11(11):e1004485, 2015

  16. [16]

    Motor learning with unstable neural representations.Neuron, 54(4):653–666, 2007

    Uri Rokni, Andrew G Richardson, Emilio Bizzi, and H Sebastian Seung. Motor learning with unstable neural representations.Neuron, 54(4):653–666, 2007. 11

  17. [17]

    Representational drift as a window into neural and behavioural plasticity.Current opinion in neurobiology, 81:102746, 2023

    Charles Micou and Timothy O’Leary. Representational drift as a window into neural and behavioural plasticity.Current opinion in neurobiology, 81:102746, 2023

  18. [18]

    Intrinsic volatility of synaptic connec- tions—a challenge to the synaptic trace theory of memory.Current opinion in neurobiology, 46:7–13, 2017

    Gianluigi Mongillo, Simon Rumpel, and Yonatan Loewenstein. Intrinsic volatility of synaptic connec- tions—a challenge to the synaptic trace theory of memory.Current opinion in neurobiology, 46:7–13, 2017

  19. [19]

    A dynamic connectome supports the emergence of stable computational function of neural circuits through reward- based learning.eneuro, 5(2), 2018

    David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, and Wolfgang Maass. A dynamic connectome supports the emergence of stable computational function of neural circuits through reward- based learning.eneuro, 5(2), 2018

  20. [20]

    Three Factors Influencing Minima in SGD

    Stanislaw Jastrzkebski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, and Amos Storkey. Three factors influencing minima in sgd.arXiv preprint arXiv:1711.04623, 2017

  21. [21]

    The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects

    Zhanxing Zhu, Jingfeng Wu, Bing Yu, Lei Wu, and Jinwen Ma. The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects.arXiv preprint arXiv:1803.00195, 2018

  22. [22]

    Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

    Pratik Chaudhari and Stefano Soatto. Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. In2018 Information Theory and Applications Workshop (ITA), pages 1–10. IEEE, 2018

  23. [23]

    Fluctuation-dissipation relations for stochastic gradient descent

    Sho Yaida. Fluctuation-dissipation relations for stochastic gradient descent.arXiv preprint arXiv:1810.00004, 2018

  24. [24]

    What happens after SGD reaches zero loss? –a mathemati- cal framework

    Zhiyuan Li, Tianhao Wang, and Sanjeev Arora. What happens after SGD reaches zero loss? –a mathemati- cal framework. InInternational Conference on Learning Representations, 2022

  25. [25]

    Stochastic gradient descent-induced drift of representation in a two-layer neural network

    Farhad Pashakhanloo and Alexei Koulakov. Stochastic gradient descent-induced drift of representation in a two-layer neural network. InInternational Conference on Machine Learning, pages 27401–27419. PMLR, 2023

  26. [26]

    Simplified neuron model as a principal component analyzer.Journal of mathematical biology, 15:267–273, 1982

    Erkki Oja. Simplified neuron model as a principal component analyzer.Journal of mathematical biology, 15:267–273, 1982

  27. [27]

    Introduction to the theory of neural computation, 1991

    John Hertz, Anders Krogh, Richard G Palmer, and Heinz Horner. Introduction to the theory of neural computation, 1991

  28. [28]

    Stochastic Gradient Descent as Approximate Bayesian Inference

    Stephan Mandt, Matthew D Hoffman, and David M Blei. Stochastic gradient descent as approximate bayesian inference.arXiv preprint arXiv:1704.04289, 2017

  29. [29]

    The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

    Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

  30. [30]

    Why do similarity matching objectives lead to hebbian/anti-hebbian networks?Neural computation, 30(1):84–124, 2017

    Cengiz Pehlevan, Anirvan M Sengupta, and Dmitri B Chklovskii. Why do similarity matching objectives lead to hebbian/anti-hebbian networks?Neural computation, 30(1):84–124, 2017

  31. [31]

    Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

    Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

  32. [32]

    Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

  33. [33]

    Differential stability of task variable representations in retrosplenial cortex.Nature Communications, 15(1):6872, 2024

    Luis M Franco and Michael J Goard. Differential stability of task variable representations in retrosplenial cortex.Nature Communications, 15(1):6872, 2024

  34. [34]

    Stimulus-dependent representational drift in primary visual cortex

    Tyler D Marks and Michael J Goard. Stimulus-dependent representational drift in primary visual cortex. Nature communications, 12(1):1–16, 2021

  35. [35]

    Novel off-context experience constrains hippocampal representa- tional drift.Current Biology, 34(24):5769–5773, 2024

    Gal Elyasaf, Alon Rubin, and Yaniv Ziv. Novel off-context experience constrains hippocampal representa- tional drift.Current Biology, 34(24):5769–5773, 2024

  36. [36]

    Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

    Andrew M Saxe, James L McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks.arXiv preprint arXiv:1312.6120, 2013

  37. [37]

    The limiting dynamics of sgd: Modified loss, phase-space oscillations, and anomalous diffusion.Neural Computation, 36(1):151–174, 2023

    Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, and Daniel LK Yamins. The limiting dynamics of sgd: Modified loss, phase-space oscillations, and anomalous diffusion.Neural Computation, 36(1):151–174, 2023. 12

  38. [38]

    Zico Kolter, and Ameet Talwalkar

    Jeremy M Cohen, Simran Kaur, Yuanzhi Li, J Zico Kolter, and Ameet Talwalkar. Gradient descent on neural networks typically occurs at the edge of stability.arXiv preprint arXiv:2103.00065, 2021

  39. [39]

    Time and experience differentially affect distinct aspects of hippocampal representational drift.Neuron, 111(15):2357–2366, 2023

    Nitzan Geva, Daniel Deitch, Alon Rubin, and Yaniv Ziv. Time and experience differentially affect distinct aspects of hippocampal representational drift.Neuron, 111(15):2357–2366, 2023

  40. [40]

    An olfactory cocktail party: figure-ground segregation of odorants in rodents.Nature neuroscience, 17(9):1225–1232, 2014

    Dan Rokni, Vivian Hemmelder, Vikrant Kapoor, and Venkatesh N Murthy. An olfactory cocktail party: figure-ground segregation of odorants in rodents.Nature neuroscience, 17(9):1225–1232, 2014

  41. [41]

    springer Berlin, 1985

    Crispin W Gardiner et al.Handbook of stochastic methods, volume 3. springer Berlin, 1985. 13 Appendix • A: Summary of Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 • B: Oja Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

  42. [42]

    Replacing the above and the corresponding λµ,ν H in Eq

    Here, xµ =v T µ x is a component of stimulus in the principal subspace, and xν =v T m+νx a component in the task-irrelevant subspace. Replacing the above and the corresponding λµ,ν H in Eq. 20, we obtain the covariance of fluctuations associated with this subspace: ⟨ρ2 µ,ν⟩= η⟨x2 µx2 m+ν⟩ 2(λµ −λ m+ν) = ηλm+ν 2(1− λm+ν λµ ) µ∈[m], ν∈[n−m],(30) and the oth...