Contribution of task-irrelevant stimuli to drift of neural representations

Farhad Pashakhanloo

arxiv: 2510.21588 · v2 · submitted 2025-10-24 · 🧬 q-bio.NC · cs.LG

Contribution of task-irrelevant stimuli to drift of neural representations

Farhad Pashakhanloo This is my paper

Pith reviewed 2026-05-18 05:12 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.LG

keywords representational drifttask-irrelevant stimulionline learningneural representationsHebbian learninggradient descentlifelong learning

0 comments

The pith

Task-irrelevant stimuli that learners ignore still produce gradual long-term drift in representations of relevant stimuli.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines representational drift, where neural codes for stimuli change over time even as task performance stays stable. It demonstrates that in continuous online learning, the noise from task-irrelevant inputs leaks into weight updates and shifts the representation of task-relevant inputs. Theory and simulations across Hebbian rules and gradient descent show the drift rate rises with the variance and dimension of the irrelevant data subspace. This source of drift yields distinct predictions for geometry and scaling compared with random synaptic noise. The work ties stimulus structure and task demands directly to observed drift in both biological and artificial systems.

Core claim

In an online learning setup with a mixed stream of inputs, the component of the data that the agent learns to treat as irrelevant still injects persistent noise into synaptic updates, producing cumulative drift in the representation of the relevant component. This occurs under Oja's rule, similarity matching, autoencoder gradient descent, and supervised two-layer networks, with the drift rate scaling positively with the variance and dimensionality of the irrelevant subspace.

What carries the argument

Online weight updates on a continuous mixed stream of task-relevant and task-irrelevant vectors, where the irrelevant subspace contributes additive noise to the representation of the relevant subspace despite being ignored for the task objective.

If this is right

Drift rate grows monotonically with the variance and dimension of the task-irrelevant data subspace.
The effect appears consistently across Hebbian learning rules and stochastic gradient descent applied to autoencoders and supervised networks.
Geometry and dimension scaling of drift differ qualitatively from those produced by additive Gaussian synaptic noise.
Drift measurements could be used to infer which aspects of the input an agent is treating as irrelevant in a given context.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the mechanism holds, artificial lifelong learners may need explicit input segregation rather than relying solely on learning to ignore irrelevant features.
The same drift source could interact with other noise processes in real neural circuits, producing observable signatures that combine both.
Experimental designs that control the statistics of distractor stimuli during learning could test whether drift rates match the predicted dependence on irrelevant variance.

Load-bearing premise

Learning proceeds by continuous weight updates on an unsegregated stream of relevant and irrelevant stimuli, with no separate gating or blocking mechanism that would stop irrelevant inputs from influencing the updates.

What would settle it

Record drift magnitude in a network or brain region while parametrically varying the variance or dimension of added task-irrelevant stimuli; if the observed drift rate fails to increase with that variance or dimension, the proposed mechanism is falsified.

Figures

Figures reproduced from arXiv: 2510.21588 by Farhad Pashakhanloo.

**Figure 2.** Figure 2: Neural network models studied in this work. a) Multi-dimensional Oja network, b) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Drift rate for different architectures as a function of: a) learning rate [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Example of drift in a non-linear network. a) Schematic of a two-layer network trained to [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Drift in MNIST data. a) Two snapshots of hidden layer representations for 10 sample digits [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of drift induced by learning noise from task-irrelevant data and by Gaussian [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

Biological and artificial learners are inherently exposed to a stream of data and experience throughout their lifetimes and must constantly adapt to, learn from, or selectively ignore the ongoing input. Recent findings reveal that, even when the performance remains stable, the underlying neural representations can change gradually over time, a phenomenon known as representational drift. Studying the different sources of data and noise that may contribute to drift is essential for understanding lifelong learning in neural systems. However, a systematic study of drift across architectures and learning rules, and the connection to task, are missing. Here, in an online learning setup, we characterize drift as a function of data distribution, and specifically show that the learning noise induced by task-irrelevant stimuli, which the agent learns to ignore in a given context, can create long-term drift in the representation of task-relevant stimuli. Using theory and simulations, we demonstrate this phenomenon both in Hebbian-based learning -- Oja's rule and Similarity Matching -- and in stochastic gradient descent applied to autoencoders and a supervised two-layer network. We consistently observe that the drift rate increases with the variance and the dimension of the data in the task-irrelevant subspace. We further show that this yields different qualitative predictions for the geometry and dimension-dependency of drift than those arising from Gaussian synaptic noise. Overall, our study links the structure of stimuli, task, and learning rule to representational drift and could pave the way for using drift as a signal for uncovering underlying computation in the brain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Task-irrelevant stimuli drive ongoing drift in relevant representations via ordinary learning rules, with scaling and geometry distinct from synaptic noise.

read the letter

The main takeaway is that drift can come from the statistics of inputs the system has learned to ignore, rather than only from internal noise. In an online mixed-stream setup, the paper shows drift rate rising with variance and dimension of the task-irrelevant subspace, and the resulting geometry differs from Gaussian synaptic noise predictions. This holds across Oja's rule, similarity matching, autoencoders, and a supervised two-layer net, backed by both theory and simulations. The breadth across learning rules is useful and gives the result some robustness that single-rule studies often lack. The online setup without explicit gating matches the kind of continuous exposure real systems face, and the link to stimulus structure is a concrete mechanistic angle. On the softer side, the performance-stability question is not fully settled by the abstract. The claim needs the relevant mapping to stay accurate while irrelevant inputs keep updating weights and shifting the representation; it is unclear whether drift keeps going indefinitely or eventually saturates, and whether task loss on the relevant subspace remains flat once initial convergence occurs. More explicit long-timescale checks on that point would tighten the argument. The paper is aimed at people measuring or modeling representational drift and at those working on continual learning. A reader who wants stimulus-driven explanations rather than purely noise-based ones will get value from the scaling predictions. It is coherent enough and has enough cross-architecture evidence to deserve a serious referee, even if revisions on the stability checks would help.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that in an online mixed-stream learning setup, task-irrelevant stimuli induce learning noise that produces long-term representational drift in task-relevant stimuli even after the system has learned to ignore the irrelevant inputs for task performance. This is demonstrated via theory and simulations across Hebbian rules (Oja's rule, similarity matching) and SGD on autoencoders plus a supervised two-layer network, with the drift rate increasing as a function of variance and dimension in the irrelevant subspace and yielding distinct geometric predictions from those of Gaussian synaptic noise.

Significance. If the central results hold, the work supplies a concrete, input-driven mechanism for representational drift under stable behavior, directly linking stimulus statistics, task structure, and learning rule. The cross-architecture simulations and explicit contrast with synaptic-noise predictions are strengths that could generate falsifiable experimental tests in neuroscience; the absence of free parameters in the core derivations further strengthens the contribution.

major comments (1)

[Simulation results (supervised network and autoencoder)] Simulation results for the supervised two-layer network and autoencoder: the central claim requires that task performance remains high and flat after initial convergence while drift continues indefinitely in the relevant subspace. The manuscript provides no explicit numerical check (e.g., loss/accuracy trajectories or subspace projection metrics over long timescales) confirming that performance plateaus in the mixed-stream regime; without this, it is unclear whether the irrelevant inputs truly leave task performance unaffected while still driving ongoing weight updates.

minor comments (2)

[Abstract] Abstract: the term 'learning noise' is used without an immediate operational definition; a brief parenthetical clarifying that it refers to the residual Hebbian/SGD updates from the irrelevant component would improve immediate readability.
[Methods] Methods: data-generation and update-rule parameters (learning rates, subspace variances, input dimensions, number of trials) should be tabulated for each architecture to support reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for identifying an important point regarding the clarity of our simulation results. We address the major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: Simulation results (supervised network and autoencoder)] Simulation results for the supervised two-layer network and autoencoder: the central claim requires that task performance remains high and flat after initial convergence while drift continues indefinitely in the relevant subspace. The manuscript provides no explicit numerical check (e.g., loss/accuracy trajectories or subspace projection metrics over long timescales) confirming that performance plateaus in the mixed-stream regime; without this, it is unclear whether the irrelevant inputs truly leave task performance unaffected while still driving ongoing weight updates.

Authors: We agree that explicit verification of stable task performance alongside ongoing drift is essential for supporting the central claim. In the simulations presented, task performance (measured via loss or accuracy) converges rapidly and remains high and flat in the mixed-stream regime, while drift in the task-relevant subspace persists due to the ongoing updates from irrelevant inputs. However, we acknowledge that the manuscript does not include dedicated long-timescale trajectories or subspace projection metrics to make this explicit. In the revised version, we will add figures showing performance metrics (e.g., loss/accuracy) and relevant subspace projections over extended training periods for both the supervised network and autoencoder. These will confirm that performance plateaus while drift continues, directly addressing the concern and clarifying that irrelevant inputs drive weight updates without degrading task performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; drift characterization follows directly from standard update rules on mixed data streams

full rationale

The paper derives representational drift by applying standard learning rules (Oja's rule, similarity matching, SGD on autoencoders and supervised networks) to an online stream containing both task-relevant and task-irrelevant stimuli. Drift rate is shown to increase with variance and dimension of the irrelevant subspace through explicit theory and simulations. No load-bearing step reduces by construction to a fitted parameter, self-defined quantity, or self-citation chain; the central claim is obtained by direct substitution of the mixed input distribution into the update equations without re-labeling inputs as predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions of online learning and the existence of separable task-relevant and task-irrelevant subspaces; no new entities or fitted parameters are introduced in the abstract.

axioms (1)

domain assumption Continuous online exposure to a data stream containing both task-relevant and task-irrelevant stimuli
Invoked in the description of the learning setup and the source of learning noise.

pith-pipeline@v0.9.0 · 5793 in / 1169 out tokens · 26700 ms · 2026-05-18T05:12:09.421343+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

drift rate increases with the variance and the dimension of the data in the task-irrelevant subspace
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

decompose the dynamics near a point θ̃ on the solution manifold into local normal (N) and tangential (T) spaces

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 5 internal anchors

[1]

Continual lifelong learning with neural networks: A review.Neural networks, 113:54–71, 2019

German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review.Neural networks, 113:54–71, 2019

work page 2019
[2]

Continual task learning in natural and artificial agents.Trends in neurosciences, 46(3):199–210, 2023

Timo Flesch, Andrew Saxe, and Christopher Summerfield. Continual task learning in natural and artificial agents.Trends in neurosciences, 46(3):199–210, 2023

work page 2023
[3]

Representational drift in primary olfactory cortex.Nature, pages 1–6, 2021

Carl E Schoonover, Sarah N Ohashi, Richard Axel, and Andrew JP Fink. Representational drift in primary olfactory cortex.Nature, pages 1–6, 2021

work page 2021
[4]

Long-term dynamics of ca1 hippocampal place codes.Nature neuroscience, 16(3):264–266, 2013

Yaniv Ziv, Laurie D Burns, Eric D Cocker, Elizabeth O Hamel, Kunal K Ghosh, Lacey J Kitch, Ab- bas El Gamal, and Mark J Schnitzer. Long-term dynamics of ca1 hippocampal place codes.Nature neuroscience, 16(3):264–266, 2013

work page 2013
[5]

Causes and consequences of representational drift.Current opinion in neurobiology, 58:141–147, 2019

Michael E Rule, Timothy O’Leary, and Christopher D Harvey. Causes and consequences of representational drift.Current opinion in neurobiology, 58:141–147, 2019

work page 2019
[6]

Representational drift in the mouse visual cortex.Current Biology, 31(19):4327–4339, 2021

Daniel Deitch, Alon Rubin, and Yaniv Ziv. Representational drift in the mouse visual cortex.Current Biology, 31(19):4327–4339, 2021

work page 2021
[7]

Representational drift: Emerging theories for continual learning and experimental future directions.Current Opinion in Neurobiology, 76:102609, 2022

Laura N Driscoll, Lea Duncker, and Christopher D Harvey. Representational drift: Emerging theories for continual learning and experimental future directions.Current Opinion in Neurobiology, 76:102609, 2022

work page 2022
[8]

Drifting neuronal representations: Bug or feature? Biological cybernetics, pages 1–14, 2022

Paul Masset, Shanshan Qin, and Jacob A Zavatone-Veth. Drifting neuronal representations: Bug or feature? Biological cybernetics, pages 1–14, 2022

work page 2022
[9]

Coordinated drift of receptive fields in hebbian/anti-hebbian network models during noisy representation learning.Nature Neuroscience, pages 1–11, 2023

Shanshan Qin, Shiva Farashahi, David Lipshutz, Anirvan M Sengupta, Dmitri B Chklovskii, and Cengiz Pehlevan. Coordinated drift of receptive fields in hebbian/anti-hebbian network models during noisy representation learning.Nature Neuroscience, pages 1–11, 2023

work page 2023
[10]

The geometry of representational drift in natural and artificial neural networks.PLOS Computational Biology, 18(11):e1010716, 2022

Kyle Aitken, Marina Garrett, Shawn Olsen, and Stefan Mihalas. The geometry of representational drift in natural and artificial neural networks.PLOS Computational Biology, 18(11):e1010716, 2022

work page 2022
[11]

Representational drift as a result of implicit regularization

Aviv Ratzon, Dori Derdikman, and Omri Barak. Representational drift as a result of implicit regularization. Elife, 12:RP90069, 2024

work page 2024
[12]

Stability through plasticity: Finding robust memories through representational drift.bioRxiv, pages 2024–12, 2024

Maanasa Natrajan and James E Fitzgerald. Stability through plasticity: Finding robust memories through representational drift.bioRxiv, pages 2024–12, 2024

work page 2024
[13]

Representa- tional drift reflects ongoing balancing of stochastic changes by hebbian learning.bioRxiv, pages 2025–01, 2025

Jens-Bastian Eppler, Thomas Lai, Dominik Aschauer, Simon Rumpel, and Matthias Kaschube. Representa- tional drift reflects ongoing balancing of stochastic changes by hebbian learning.bioRxiv, pages 2025–01, 2025

work page 2025
[14]

Representational drift as the consequence of ongoing memory storage.bioRxiv, pages 2024–06, 2024

Federico Devalle, Licheng Zou, Gloria Cecchini, and Alex Roxin. Representational drift as the consequence of ongoing memory storage.bioRxiv, pages 2024–06, 2024

work page 2024
[15]

Network plasticity as bayesian inference.PLoS computational biology, 11(11):e1004485, 2015

David Kappel, Stefan Habenschuss, Robert Legenstein, and Wolfgang Maass. Network plasticity as bayesian inference.PLoS computational biology, 11(11):e1004485, 2015

work page 2015
[16]

Motor learning with unstable neural representations.Neuron, 54(4):653–666, 2007

Uri Rokni, Andrew G Richardson, Emilio Bizzi, and H Sebastian Seung. Motor learning with unstable neural representations.Neuron, 54(4):653–666, 2007. 11

work page 2007
[17]

Representational drift as a window into neural and behavioural plasticity.Current opinion in neurobiology, 81:102746, 2023

Charles Micou and Timothy O’Leary. Representational drift as a window into neural and behavioural plasticity.Current opinion in neurobiology, 81:102746, 2023

work page 2023
[18]

Intrinsic volatility of synaptic connec- tions—a challenge to the synaptic trace theory of memory.Current opinion in neurobiology, 46:7–13, 2017

Gianluigi Mongillo, Simon Rumpel, and Yonatan Loewenstein. Intrinsic volatility of synaptic connec- tions—a challenge to the synaptic trace theory of memory.Current opinion in neurobiology, 46:7–13, 2017

work page 2017
[19]

A dynamic connectome supports the emergence of stable computational function of neural circuits through reward- based learning.eneuro, 5(2), 2018

David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, and Wolfgang Maass. A dynamic connectome supports the emergence of stable computational function of neural circuits through reward- based learning.eneuro, 5(2), 2018

work page 2018
[20]

Three Factors Influencing Minima in SGD

Stanislaw Jastrzkebski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, and Amos Storkey. Three factors influencing minima in sgd.arXiv preprint arXiv:1711.04623, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects

Zhanxing Zhu, Jingfeng Wu, Bing Yu, Lei Wu, and Jinwen Ma. The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects.arXiv preprint arXiv:1803.00195, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Pratik Chaudhari and Stefano Soatto. Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. In2018 Information Theory and Applications Workshop (ITA), pages 1–10. IEEE, 2018

work page 2018
[23]

Fluctuation-dissipation relations for stochastic gradient descent

Sho Yaida. Fluctuation-dissipation relations for stochastic gradient descent.arXiv preprint arXiv:1810.00004, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

What happens after SGD reaches zero loss? –a mathemati- cal framework

Zhiyuan Li, Tianhao Wang, and Sanjeev Arora. What happens after SGD reaches zero loss? –a mathemati- cal framework. InInternational Conference on Learning Representations, 2022

work page 2022
[25]

Stochastic gradient descent-induced drift of representation in a two-layer neural network

Farhad Pashakhanloo and Alexei Koulakov. Stochastic gradient descent-induced drift of representation in a two-layer neural network. InInternational Conference on Machine Learning, pages 27401–27419. PMLR, 2023

work page 2023
[26]

Simplified neuron model as a principal component analyzer.Journal of mathematical biology, 15:267–273, 1982

Erkki Oja. Simplified neuron model as a principal component analyzer.Journal of mathematical biology, 15:267–273, 1982

work page 1982
[27]

Introduction to the theory of neural computation, 1991

John Hertz, Anders Krogh, Richard G Palmer, and Heinz Horner. Introduction to the theory of neural computation, 1991

work page 1991
[28]

Stochastic Gradient Descent as Approximate Bayesian Inference

Stephan Mandt, Matthew D Hoffman, and David M Blei. Stochastic gradient descent as approximate bayesian inference.arXiv preprint arXiv:1704.04289, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

work page 1998
[30]

Why do similarity matching objectives lead to hebbian/anti-hebbian networks?Neural computation, 30(1):84–124, 2017

Cengiz Pehlevan, Anirvan M Sengupta, and Dmitri B Chklovskii. Why do similarity matching objectives lead to hebbian/anti-hebbian networks?Neural computation, 30(1):84–124, 2017

work page 2017
[31]

Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

work page 1989
[32]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998
[33]

Differential stability of task variable representations in retrosplenial cortex.Nature Communications, 15(1):6872, 2024

Luis M Franco and Michael J Goard. Differential stability of task variable representations in retrosplenial cortex.Nature Communications, 15(1):6872, 2024

work page 2024
[34]

Stimulus-dependent representational drift in primary visual cortex

Tyler D Marks and Michael J Goard. Stimulus-dependent representational drift in primary visual cortex. Nature communications, 12(1):1–16, 2021

work page 2021
[35]

Novel off-context experience constrains hippocampal representa- tional drift.Current Biology, 34(24):5769–5773, 2024

Gal Elyasaf, Alon Rubin, and Yaniv Ziv. Novel off-context experience constrains hippocampal representa- tional drift.Current Biology, 34(24):5769–5773, 2024

work page 2024
[36]

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Andrew M Saxe, James L McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks.arXiv preprint arXiv:1312.6120, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[37]

The limiting dynamics of sgd: Modified loss, phase-space oscillations, and anomalous diffusion.Neural Computation, 36(1):151–174, 2023

Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, and Daniel LK Yamins. The limiting dynamics of sgd: Modified loss, phase-space oscillations, and anomalous diffusion.Neural Computation, 36(1):151–174, 2023. 12

work page 2023
[38]

Zico Kolter, and Ameet Talwalkar

Jeremy M Cohen, Simran Kaur, Yuanzhi Li, J Zico Kolter, and Ameet Talwalkar. Gradient descent on neural networks typically occurs at the edge of stability.arXiv preprint arXiv:2103.00065, 2021

work page arXiv 2021
[39]

Time and experience differentially affect distinct aspects of hippocampal representational drift.Neuron, 111(15):2357–2366, 2023

Nitzan Geva, Daniel Deitch, Alon Rubin, and Yaniv Ziv. Time and experience differentially affect distinct aspects of hippocampal representational drift.Neuron, 111(15):2357–2366, 2023

work page 2023
[40]

An olfactory cocktail party: figure-ground segregation of odorants in rodents.Nature neuroscience, 17(9):1225–1232, 2014

Dan Rokni, Vivian Hemmelder, Vikrant Kapoor, and Venkatesh N Murthy. An olfactory cocktail party: figure-ground segregation of odorants in rodents.Nature neuroscience, 17(9):1225–1232, 2014

work page 2014
[41]

springer Berlin, 1985

Crispin W Gardiner et al.Handbook of stochastic methods, volume 3. springer Berlin, 1985. 13 Appendix • A: Summary of Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 • B: Oja Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

work page 1985
[42]

Replacing the above and the corresponding λµ,ν H in Eq

Here, xµ =v T µ x is a component of stimulus in the principal subspace, and xν =v T m+νx a component in the task-irrelevant subspace. Replacing the above and the corresponding λµ,ν H in Eq. 20, we obtain the covariance of fluctuations associated with this subspace: ⟨ρ2 µ,ν⟩= η⟨x2 µx2 m+ν⟩ 2(λµ −λ m+ν) = ηλm+ν 2(1− λm+ν λµ ) µ∈[m], ν∈[n−m],(30) and the oth...

work page

[1] [1]

Continual lifelong learning with neural networks: A review.Neural networks, 113:54–71, 2019

German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review.Neural networks, 113:54–71, 2019

work page 2019

[2] [2]

Continual task learning in natural and artificial agents.Trends in neurosciences, 46(3):199–210, 2023

Timo Flesch, Andrew Saxe, and Christopher Summerfield. Continual task learning in natural and artificial agents.Trends in neurosciences, 46(3):199–210, 2023

work page 2023

[3] [3]

Representational drift in primary olfactory cortex.Nature, pages 1–6, 2021

Carl E Schoonover, Sarah N Ohashi, Richard Axel, and Andrew JP Fink. Representational drift in primary olfactory cortex.Nature, pages 1–6, 2021

work page 2021

[4] [4]

Long-term dynamics of ca1 hippocampal place codes.Nature neuroscience, 16(3):264–266, 2013

Yaniv Ziv, Laurie D Burns, Eric D Cocker, Elizabeth O Hamel, Kunal K Ghosh, Lacey J Kitch, Ab- bas El Gamal, and Mark J Schnitzer. Long-term dynamics of ca1 hippocampal place codes.Nature neuroscience, 16(3):264–266, 2013

work page 2013

[5] [5]

Causes and consequences of representational drift.Current opinion in neurobiology, 58:141–147, 2019

Michael E Rule, Timothy O’Leary, and Christopher D Harvey. Causes and consequences of representational drift.Current opinion in neurobiology, 58:141–147, 2019

work page 2019

[6] [6]

Representational drift in the mouse visual cortex.Current Biology, 31(19):4327–4339, 2021

Daniel Deitch, Alon Rubin, and Yaniv Ziv. Representational drift in the mouse visual cortex.Current Biology, 31(19):4327–4339, 2021

work page 2021

[7] [7]

Representational drift: Emerging theories for continual learning and experimental future directions.Current Opinion in Neurobiology, 76:102609, 2022

Laura N Driscoll, Lea Duncker, and Christopher D Harvey. Representational drift: Emerging theories for continual learning and experimental future directions.Current Opinion in Neurobiology, 76:102609, 2022

work page 2022

[8] [8]

Drifting neuronal representations: Bug or feature? Biological cybernetics, pages 1–14, 2022

Paul Masset, Shanshan Qin, and Jacob A Zavatone-Veth. Drifting neuronal representations: Bug or feature? Biological cybernetics, pages 1–14, 2022

work page 2022

[9] [9]

Coordinated drift of receptive fields in hebbian/anti-hebbian network models during noisy representation learning.Nature Neuroscience, pages 1–11, 2023

Shanshan Qin, Shiva Farashahi, David Lipshutz, Anirvan M Sengupta, Dmitri B Chklovskii, and Cengiz Pehlevan. Coordinated drift of receptive fields in hebbian/anti-hebbian network models during noisy representation learning.Nature Neuroscience, pages 1–11, 2023

work page 2023

[10] [10]

The geometry of representational drift in natural and artificial neural networks.PLOS Computational Biology, 18(11):e1010716, 2022

Kyle Aitken, Marina Garrett, Shawn Olsen, and Stefan Mihalas. The geometry of representational drift in natural and artificial neural networks.PLOS Computational Biology, 18(11):e1010716, 2022

work page 2022

[11] [11]

Representational drift as a result of implicit regularization

Aviv Ratzon, Dori Derdikman, and Omri Barak. Representational drift as a result of implicit regularization. Elife, 12:RP90069, 2024

work page 2024

[12] [12]

Stability through plasticity: Finding robust memories through representational drift.bioRxiv, pages 2024–12, 2024

Maanasa Natrajan and James E Fitzgerald. Stability through plasticity: Finding robust memories through representational drift.bioRxiv, pages 2024–12, 2024

work page 2024

[13] [13]

Representa- tional drift reflects ongoing balancing of stochastic changes by hebbian learning.bioRxiv, pages 2025–01, 2025

Jens-Bastian Eppler, Thomas Lai, Dominik Aschauer, Simon Rumpel, and Matthias Kaschube. Representa- tional drift reflects ongoing balancing of stochastic changes by hebbian learning.bioRxiv, pages 2025–01, 2025

work page 2025

[14] [14]

Representational drift as the consequence of ongoing memory storage.bioRxiv, pages 2024–06, 2024

Federico Devalle, Licheng Zou, Gloria Cecchini, and Alex Roxin. Representational drift as the consequence of ongoing memory storage.bioRxiv, pages 2024–06, 2024

work page 2024

[15] [15]

Network plasticity as bayesian inference.PLoS computational biology, 11(11):e1004485, 2015

David Kappel, Stefan Habenschuss, Robert Legenstein, and Wolfgang Maass. Network plasticity as bayesian inference.PLoS computational biology, 11(11):e1004485, 2015

work page 2015

[16] [16]

Motor learning with unstable neural representations.Neuron, 54(4):653–666, 2007

Uri Rokni, Andrew G Richardson, Emilio Bizzi, and H Sebastian Seung. Motor learning with unstable neural representations.Neuron, 54(4):653–666, 2007. 11

work page 2007

[17] [17]

Representational drift as a window into neural and behavioural plasticity.Current opinion in neurobiology, 81:102746, 2023

Charles Micou and Timothy O’Leary. Representational drift as a window into neural and behavioural plasticity.Current opinion in neurobiology, 81:102746, 2023

work page 2023

[18] [18]

Intrinsic volatility of synaptic connec- tions—a challenge to the synaptic trace theory of memory.Current opinion in neurobiology, 46:7–13, 2017

Gianluigi Mongillo, Simon Rumpel, and Yonatan Loewenstein. Intrinsic volatility of synaptic connec- tions—a challenge to the synaptic trace theory of memory.Current opinion in neurobiology, 46:7–13, 2017

work page 2017

[19] [19]

A dynamic connectome supports the emergence of stable computational function of neural circuits through reward- based learning.eneuro, 5(2), 2018

David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, and Wolfgang Maass. A dynamic connectome supports the emergence of stable computational function of neural circuits through reward- based learning.eneuro, 5(2), 2018

work page 2018

[20] [20]

Three Factors Influencing Minima in SGD

Stanislaw Jastrzkebski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, and Amos Storkey. Three factors influencing minima in sgd.arXiv preprint arXiv:1711.04623, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects

Zhanxing Zhu, Jingfeng Wu, Bing Yu, Lei Wu, and Jinwen Ma. The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects.arXiv preprint arXiv:1803.00195, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Pratik Chaudhari and Stefano Soatto. Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. In2018 Information Theory and Applications Workshop (ITA), pages 1–10. IEEE, 2018

work page 2018

[23] [23]

Fluctuation-dissipation relations for stochastic gradient descent

Sho Yaida. Fluctuation-dissipation relations for stochastic gradient descent.arXiv preprint arXiv:1810.00004, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

What happens after SGD reaches zero loss? –a mathemati- cal framework

Zhiyuan Li, Tianhao Wang, and Sanjeev Arora. What happens after SGD reaches zero loss? –a mathemati- cal framework. InInternational Conference on Learning Representations, 2022

work page 2022

[25] [25]

Stochastic gradient descent-induced drift of representation in a two-layer neural network

Farhad Pashakhanloo and Alexei Koulakov. Stochastic gradient descent-induced drift of representation in a two-layer neural network. InInternational Conference on Machine Learning, pages 27401–27419. PMLR, 2023

work page 2023

[26] [26]

Simplified neuron model as a principal component analyzer.Journal of mathematical biology, 15:267–273, 1982

Erkki Oja. Simplified neuron model as a principal component analyzer.Journal of mathematical biology, 15:267–273, 1982

work page 1982

[27] [27]

Introduction to the theory of neural computation, 1991

John Hertz, Anders Krogh, Richard G Palmer, and Heinz Horner. Introduction to the theory of neural computation, 1991

work page 1991

[28] [28]

Stochastic Gradient Descent as Approximate Bayesian Inference

Stephan Mandt, Matthew D Hoffman, and David M Blei. Stochastic gradient descent as approximate bayesian inference.arXiv preprint arXiv:1704.04289, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[29] [29]

The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

work page 1998

[30] [30]

Why do similarity matching objectives lead to hebbian/anti-hebbian networks?Neural computation, 30(1):84–124, 2017

Cengiz Pehlevan, Anirvan M Sengupta, and Dmitri B Chklovskii. Why do similarity matching objectives lead to hebbian/anti-hebbian networks?Neural computation, 30(1):84–124, 2017

work page 2017

[31] [31]

Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

work page 1989

[32] [32]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998

[33] [33]

Differential stability of task variable representations in retrosplenial cortex.Nature Communications, 15(1):6872, 2024

Luis M Franco and Michael J Goard. Differential stability of task variable representations in retrosplenial cortex.Nature Communications, 15(1):6872, 2024

work page 2024

[34] [34]

Stimulus-dependent representational drift in primary visual cortex

Tyler D Marks and Michael J Goard. Stimulus-dependent representational drift in primary visual cortex. Nature communications, 12(1):1–16, 2021

work page 2021

[35] [35]

Novel off-context experience constrains hippocampal representa- tional drift.Current Biology, 34(24):5769–5773, 2024

Gal Elyasaf, Alon Rubin, and Yaniv Ziv. Novel off-context experience constrains hippocampal representa- tional drift.Current Biology, 34(24):5769–5773, 2024

work page 2024

[36] [36]

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Andrew M Saxe, James L McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks.arXiv preprint arXiv:1312.6120, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[37] [37]

The limiting dynamics of sgd: Modified loss, phase-space oscillations, and anomalous diffusion.Neural Computation, 36(1):151–174, 2023

Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, and Daniel LK Yamins. The limiting dynamics of sgd: Modified loss, phase-space oscillations, and anomalous diffusion.Neural Computation, 36(1):151–174, 2023. 12

work page 2023

[38] [38]

Zico Kolter, and Ameet Talwalkar

Jeremy M Cohen, Simran Kaur, Yuanzhi Li, J Zico Kolter, and Ameet Talwalkar. Gradient descent on neural networks typically occurs at the edge of stability.arXiv preprint arXiv:2103.00065, 2021

work page arXiv 2021

[39] [39]

Time and experience differentially affect distinct aspects of hippocampal representational drift.Neuron, 111(15):2357–2366, 2023

Nitzan Geva, Daniel Deitch, Alon Rubin, and Yaniv Ziv. Time and experience differentially affect distinct aspects of hippocampal representational drift.Neuron, 111(15):2357–2366, 2023

work page 2023

[40] [40]

An olfactory cocktail party: figure-ground segregation of odorants in rodents.Nature neuroscience, 17(9):1225–1232, 2014

Dan Rokni, Vivian Hemmelder, Vikrant Kapoor, and Venkatesh N Murthy. An olfactory cocktail party: figure-ground segregation of odorants in rodents.Nature neuroscience, 17(9):1225–1232, 2014

work page 2014

[41] [41]

springer Berlin, 1985

Crispin W Gardiner et al.Handbook of stochastic methods, volume 3. springer Berlin, 1985. 13 Appendix • A: Summary of Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 • B: Oja Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

work page 1985

[42] [42]

Replacing the above and the corresponding λµ,ν H in Eq

Here, xµ =v T µ x is a component of stimulus in the principal subspace, and xν =v T m+νx a component in the task-irrelevant subspace. Replacing the above and the corresponding λµ,ν H in Eq. 20, we obtain the covariance of fluctuations associated with this subspace: ⟨ρ2 µ,ν⟩= η⟨x2 µx2 m+ν⟩ 2(λµ −λ m+ν) = ηλm+ν 2(1− λm+ν λµ ) µ∈[m], ν∈[n−m],(30) and the oth...

work page