Sequential Learning and Catastrophic Forgetting in Differentiable Resistor Networks

Maniru Ibrahim

arxiv: 2605.01383 · v1 · submitted 2026-05-02 · 💻 cs.LG · cond-mat.dis-nn· physics.comp-ph

Sequential Learning and Catastrophic Forgetting in Differentiable Resistor Networks

Maniru Ibrahim This is my paper

Pith reviewed 2026-05-09 14:05 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.dis-nnphysics.comp-ph

keywords catastrophic forgettingcontinual learningdifferentiable resistor networkssequential learningKirchhoff lawsconductance adjustmentgraph topologyphysical networks

0 comments

The pith

Differentiable resistor networks learn single input-output mappings via conductance tuning but suffer catastrophic forgetting when trained sequentially on conflicting tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how learning occurs in networks of resistors whose edge conductances can be adjusted by gradient descent while obeying Kirchhoff's laws at every step. Single tasks can be solved by changing conductances to match desired input-output relations, yet adding a second task that conflicts with the first erases the earlier solution. The extent of forgetting scales with how directly the new task opposes the old one and with how far the conductances move during the second training phase. Anchoring methods that try to preserve earlier weights only lower forgetting by leaving higher error on the current task, exposing a direct trade-off between stability and new-task performance.

Core claim

Although individual input-output mappings can be learned by gradient-based adjustment of edge conductances in resistor networks governed by Kirchhoff's laws, sequential training on conflicting tasks produces catastrophic forgetting. Forgetting is controlled by task conflict and by the degree of adaptation to the new task. Uniform anchoring and normalised gradient-weighted anchoring reduce forgetting only by increasing the final loss on the new task. Forgetting is associated with localised conductance changes on high-current edges, giving a physical interpretation as reconfiguration of dominant transport pathways. Broader random-task ensembles show that the strongest forgetting occurs when a

What carries the argument

Gradient-based adjustment of edge conductances in networks obeying Kirchhoff's current and voltage laws, which enforces physical equilibrium at every training step while allowing the conductances to serve as trainable parameters.

If this is right

Sequential training on tasks whose output orderings oppose each other produces the largest forgetting.
Anchoring conductances reduces forgetting only at the expense of higher error on the newly learned task.
Forgetting appears as concentrated conductance shifts along the highest-current pathways, reconfiguring the dominant routes for current flow.
Across different random graph families the forgetting-adaptation balance changes, with topology acting as an independent control variable.
The same networks can be used to quantify how much task similarity modulates the severity of forgetting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The link between forgetting and high-current edge reconfiguration could be tested by deliberately protecting those edges in future physical prototypes to improve retention.
If topology alters the forgetting-adaptation trade-off, then choosing or evolving network structure becomes a design parameter for building physical continual learners.
The resistor-network testbed offers a low-dimensional way to compare forgetting across equilibrium-based physical systems such as fluidic or optical networks that obey similar conservation laws.
Task reversal as the worst-case conflict suggests that output ordering, rather than input statistics alone, may be the dominant driver of interference in any equilibrium-constrained learner.

Load-bearing premise

That the simulated resistor networks with gradient-based conductance updates sufficiently capture the mechanisms of learning and forgetting to yield generalizable insights about continual learning in physical systems.

What would settle it

Running the same sequential training protocol on a physical analog resistor network and finding that conductance changes do not concentrate on high-current edges or that task reversal does not produce the largest forgetting would falsify the reported mechanism.

Figures

Figures reproduced from arXiv: 2605.01383 by Maniru Ibrahim.

**Figure 2.** Figure 2: FIG. 2: Gradient cosine similarity as a function of the view at source ↗

**Figure 3.** Figure 3: FIG. 3: Forgetting as a function of the task-similarity view at source ↗

**Figure 5.** Figure 5: FIG. 5: Forgetting–adaptation trade-off for uniform and view at source ↗

**Figure 6.** Figure 6: FIG. 6: Distribution across seeds of the fraction of total view at source ↗

**Figure 8.** Figure 8: FIG. 8: Training losses for the task sequence view at source ↗

**Figure 9.** Figure 9: FIG. 9: Effect of graph topology on the view at source ↗

**Figure 10.** Figure 10: FIG. 10: Effect of minimum input–output graph view at source ↗

**Figure 12.** Figure 12: FIG. 12: Forgetting as a function of network size for view at source ↗

read the original abstract

Differentiable physical networks provide a simple setting in which learning can be studied through the interaction between trainable parameters and physical equilibrium constraints. We investigate sequential learning in differentiable resistor networks governed by Kirchhoff's laws. Although individual input--output mappings can be learned by gradient-based adjustment of edge conductances, sequential training on conflicting tasks produces catastrophic forgetting. We show that forgetting is controlled by task conflict and by the degree of adaptation to the new task. Uniform anchoring and normalised gradient-weighted anchoring reduce forgetting only by increasing the final loss on the new task, giving a clear forgetting--adaptation trade-off. We also show that forgetting is associated with localised conductance changes on high-current edges, giving a physical interpretation as reconfiguration of dominant transport pathways. Broader random-task ensembles show that the strongest forgetting occurs when the second task reverses the output ordering imposed by the first task. Finally, comparisons across Erd\H{o}s--R\'enyi, small-world, scale-free, and random-geometric graph ensembles show that topology changes the forgetting--adaptation balance. These results position differentiable resistor networks as compact, physically interpretable testbeds for studying continual learning in tunable matter.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Differentiable resistor networks exhibit catastrophic forgetting on conflicting sequential tasks, with anchoring and graph topology modulating the trade-off in internally consistent simulations.

read the letter

This paper shows that gradient-based updates to edge conductances in Kirchhoff-governed resistor networks produce clear catastrophic forgetting when tasks conflict, and that uniform or gradient-weighted anchoring reduces forgetting only by raising loss on the new task. The work also ties forgetting to localized changes on high-current edges and finds stronger effects when the second task reverses the first task's output ordering. Across graph ensembles, topology shifts the forgetting-adaptation balance, with small-world and scale-free graphs behaving differently from Erdős–Rényi or geometric ones. These are the concrete results that stand out as new for this model class. The simulations are straightforward, use explicit conflict metrics, and give a physical reading of the conductance shifts that matches the equilibrium constraints. No circularity appears in the setup, and the reported patterns hold together without obvious post-hoc fitting. The main limitation is that everything stays in simulation on synthetic tasks; there is no hardware realization or test on more naturalistic data streams, so the insights remain tied to this specific constrained system. The anchoring methods are basic and the task ensembles are constructed for clear reversal, which is fine for a first exploration but leaves open how the effects scale outside these choices. This is useful for people building or studying physical or constrained learning systems who want a compact testbed. It is not a broad theoretical advance, but the empirical mappings are reproducible within the model and add a distinct angle. I would bring it to a reading group for the topology comparisons and the physical interpretation. It deserves peer review because the core claims are grounded and falsifiable in simulation; a referee can push on generalizability without the paper falling apart on its own terms.

Referee Report

0 major / 3 minor

Summary. The paper studies sequential learning in differentiable resistor networks governed by Kirchhoff's laws. It shows that gradient-based adjustment of edge conductances can learn individual input-output mappings, but sequential training on conflicting tasks produces catastrophic forgetting. Forgetting is shown to depend on task conflict and the degree of adaptation to the new task; uniform and normalised gradient-weighted anchoring reduce forgetting only at the cost of higher final loss on the new task. Forgetting correlates with localised conductance changes on high-current edges. Strongest forgetting occurs when the second task reverses the output ordering of the first; comparisons across Erdős–Rényi, small-world, scale-free and random-geometric graphs show that topology modulates the forgetting–adaptation trade-off. The work positions these networks as compact, physically interpretable testbeds for continual learning.

Significance. If the simulation results hold, the manuscript supplies a physically grounded, low-dimensional model in which continual-learning phenomena can be studied through explicit equilibrium constraints and measurable transport pathways. Credit is due for the use of well-defined task ensembles, explicit conflict metrics, topology comparisons, and the demonstration of a clear forgetting–adaptation trade-off under anchoring. These elements make the networks a potentially useful testbed for exploring mechanisms in physical or neuromorphic continual learning.

minor comments (3)

[Abstract] Abstract: the phrases 'task conflict' and 'normalised gradient-weighted anchoring' are used without a one-sentence definition; a brief parenthetical gloss would aid readers who have not yet reached the methods section.
[Introduction or Methods] The manuscript would benefit from an explicit statement, early in the text, of the precise form of the Kirchhoff-law equilibrium equations that are differentiated for gradient computation.
[Figures] Figure captions and axis labels should consistently indicate whether error bars represent standard deviation across graph realizations or across task pairs.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their accurate summary of our work, the positive assessment of its significance as a testbed for continual learning, and the recommendation for minor revision. The report correctly identifies the core findings on catastrophic forgetting, the role of task conflict and adaptation, the forgetting-adaptation trade-off under anchoring, the association with high-current edges, the effect of output-order reversal, and the topology dependence across graph ensembles.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper reports outcomes from explicit forward simulations of Kirchhoff-governed resistor networks with gradient-based conductance updates on defined task sequences. All central claims (catastrophic forgetting under conflict, anchoring trade-offs, localization to high-current edges, and topology-dependent balances) are presented as direct results of these computational experiments rather than as derivations that reduce to their own inputs by definition, fitted-parameter renaming, or self-citation chains. No load-bearing step equates a prediction to a prior fit or invokes an unverified uniqueness theorem; the work remains self-contained against the stated simulation benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, ad-hoc axioms, or invented entities are identifiable; the model rests on standard Kirchhoff's laws and gradient descent, which are external to the paper.

pith-pipeline@v0.9.0 · 5501 in / 1102 out tokens · 35110 ms · 2026-05-09T14:05:11.870303+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 6 canonical work pages

[1]

Our paper differs in both method and emphasis

takes an important step further by directly study- ing sequential learning in tunable resistor networks and showing that thresholded local updates can reduce catas- trophic forgetting by spatially separating task-specific tuned regions. Our paper differs in both method and emphasis. Methodologically, we study a differentiable resistor-network model traine...
[2]

L. G. Wright, T. Onodera, M. M. Stein, T. Wang, D. T. Schachter, Z. Hu, and P. L. McMahon, Nature601, 549 (2022)

2022
[3]

Momeni, B

A. Momeni, B. Rahmani, M. Mall´ ejac, P. Del Hougne, and R. Fleury, Science382, 1297 (2023)

2023
[4]

Momeni, B

A. Momeniet al., arXiv preprint arXiv:2406.03372 (2024)

work page arXiv 2024
[5]

Stern and A

M. Stern and A. Murugan, Annual Review of Condensed Matter Physics14, 417 (2023)

2023
[6]

Stern, D

M. Stern, D. Hexner, J. W. Rocks, and A. J. Liu, Physical Review X11, 021045 (2021)

2021
[7]

Stern, M

M. Stern, M. Guzman, F. Martins, A. J. Liu, and V. Bal- asubramanian, Physical Review Letters134, 147402 (2025)

2025
[8]

Guzman, F

M. Guzman, F. Martins, M. Stern, and A. J. Liu, arXiv preprint arXiv:2412.19356 (2024)

work page arXiv 2024
[9]

R. M. French, Trends in Cognitive Sciences3, 128 (1999)

1999
[10]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ra- malho, A. Grabska-Barwinska,et al., Proceedings of the National Academy of Sciences114, 3521 (2017)

2017
[11]

Zenke, B

F. Zenke, B. Poole, and S. Ganguli, inProceedings of the 34th International Conference on Machine Learning (PMLR, 2017) pp. 3987–3995

2017
[12]

Serra, D

J. Serra, D. Suris, M. Miron, and A. Karatzoglou, inPro- ceedings of the 35th International Conference on Machine Learning(PMLR, 2018) pp. 4548–4557

2018
[13]

De Lange, R

M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, IEEE Transactions on Pattern Analysis and Machine Intelli- gence44, 3366 (2022)

2022
[14]

Davies, N

M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, G. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain,et al., IEEE Micro38, 82 (2018)

2018
[15]

Furber, Journal of Neural Engineering13, 051001 (2016)

S. Furber, Journal of Neural Engineering13, 051001 (2016)

2016
[16]

Indiveri and S.-C

G. Indiveri and S.-C. Liu, Proceedings of the IEEE103, 1379 (2015)

2015
[17]

Dillavou, B

S. Dillavou, B. D. Beyer, M. Stern, A. J. Liu, M. Z. Miskin, and D. J. Durian, Proceedings of the National Academy of Sciences121, e2319718121 (2024)

2024
[18]

Stern, S

M. Stern, S. Dillavou, D. Jayaraman, D. J. Durian, and A. J. Liu, APL Machine Learning2, 016114 (2024)

2024
[19]

Chatterjee, M

P. Chatterjee, M. Guzman, and A. J. Liu, arXiv preprint arXiv:2512.03799 (2025)

work page arXiv 2025
[20]

Li and D

Z. Li and D. Hoiem, IEEE Transactions on Pattern Anal- ysis and Machine Intelligence40, 2935 (2018)

2018
[21]

Scellier and Y

B. Scellier and Y. Bengio, Frontiers in Computational Neuroscience11, 24 (2017)

2017
[22]

Jaeger and H

H. Jaeger and H. Haas, Science304, 78 (2004)

2004
[23]

Tanaka, T

G. Tanaka, T. Yamane, J. B. H´ eroux, R. Nakane, N. Kanazawa, S. Takeda, H. Numata, D. Nakano, and A. Hirose, Neural Networks115, 100 (2019)

2019
[24]

Stepney, Natural Computing 10.1007/s11047-024- 09997-y (2024)

S. Stepney, Natural Computing 10.1007/s11047-024- 09997-y (2024)

work page doi:10.1007/s11047-024- 2024
[25]

M. J. Falk, J. Wu, A. Matthews, V. Sachdeva, N. Pashine, M. L. Gardel, S. R. Nagel, and A. Muru- gan, Proceedings of the National Academy of Sciences 120, e2219558120 (2023)

2023
[26]

Dillavou, M

S. Dillavou, M. Guzman, A. J. Liu, and D. J. Durian, arXiv preprint arXiv:2505.22887 (2025)

work page arXiv 2025
[27]

M. Ibrahim, Physical learning in resistor networks, https://doi.org/10.5281/zenodo.19975054(2026), version 1.1.0, Zenodo archived software; GitHub repository:https://github.com/Manirmaths/ physical-learning-resistor-networks

work page doi:10.5281/zenodo.19975054(2026 2026

[1] [1]

Our paper differs in both method and emphasis

takes an important step further by directly study- ing sequential learning in tunable resistor networks and showing that thresholded local updates can reduce catas- trophic forgetting by spatially separating task-specific tuned regions. Our paper differs in both method and emphasis. Methodologically, we study a differentiable resistor-network model traine...

[2] [2]

L. G. Wright, T. Onodera, M. M. Stein, T. Wang, D. T. Schachter, Z. Hu, and P. L. McMahon, Nature601, 549 (2022)

2022

[3] [3]

Momeni, B

A. Momeni, B. Rahmani, M. Mall´ ejac, P. Del Hougne, and R. Fleury, Science382, 1297 (2023)

2023

[4] [4]

Momeni, B

A. Momeniet al., arXiv preprint arXiv:2406.03372 (2024)

work page arXiv 2024

[5] [5]

Stern and A

M. Stern and A. Murugan, Annual Review of Condensed Matter Physics14, 417 (2023)

2023

[6] [6]

Stern, D

M. Stern, D. Hexner, J. W. Rocks, and A. J. Liu, Physical Review X11, 021045 (2021)

2021

[7] [7]

Stern, M

M. Stern, M. Guzman, F. Martins, A. J. Liu, and V. Bal- asubramanian, Physical Review Letters134, 147402 (2025)

2025

[8] [8]

Guzman, F

M. Guzman, F. Martins, M. Stern, and A. J. Liu, arXiv preprint arXiv:2412.19356 (2024)

work page arXiv 2024

[9] [9]

R. M. French, Trends in Cognitive Sciences3, 128 (1999)

1999

[10] [10]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ra- malho, A. Grabska-Barwinska,et al., Proceedings of the National Academy of Sciences114, 3521 (2017)

2017

[11] [11]

Zenke, B

F. Zenke, B. Poole, and S. Ganguli, inProceedings of the 34th International Conference on Machine Learning (PMLR, 2017) pp. 3987–3995

2017

[12] [12]

Serra, D

J. Serra, D. Suris, M. Miron, and A. Karatzoglou, inPro- ceedings of the 35th International Conference on Machine Learning(PMLR, 2018) pp. 4548–4557

2018

[13] [13]

De Lange, R

M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, IEEE Transactions on Pattern Analysis and Machine Intelli- gence44, 3366 (2022)

2022

[14] [14]

Davies, N

M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, G. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain,et al., IEEE Micro38, 82 (2018)

2018

[15] [15]

Furber, Journal of Neural Engineering13, 051001 (2016)

S. Furber, Journal of Neural Engineering13, 051001 (2016)

2016

[16] [16]

Indiveri and S.-C

G. Indiveri and S.-C. Liu, Proceedings of the IEEE103, 1379 (2015)

2015

[17] [17]

Dillavou, B

S. Dillavou, B. D. Beyer, M. Stern, A. J. Liu, M. Z. Miskin, and D. J. Durian, Proceedings of the National Academy of Sciences121, e2319718121 (2024)

2024

[18] [18]

Stern, S

M. Stern, S. Dillavou, D. Jayaraman, D. J. Durian, and A. J. Liu, APL Machine Learning2, 016114 (2024)

2024

[19] [19]

Chatterjee, M

P. Chatterjee, M. Guzman, and A. J. Liu, arXiv preprint arXiv:2512.03799 (2025)

work page arXiv 2025

[20] [20]

Li and D

Z. Li and D. Hoiem, IEEE Transactions on Pattern Anal- ysis and Machine Intelligence40, 2935 (2018)

2018

[21] [21]

Scellier and Y

B. Scellier and Y. Bengio, Frontiers in Computational Neuroscience11, 24 (2017)

2017

[22] [22]

Jaeger and H

H. Jaeger and H. Haas, Science304, 78 (2004)

2004

[23] [23]

Tanaka, T

G. Tanaka, T. Yamane, J. B. H´ eroux, R. Nakane, N. Kanazawa, S. Takeda, H. Numata, D. Nakano, and A. Hirose, Neural Networks115, 100 (2019)

2019

[24] [24]

Stepney, Natural Computing 10.1007/s11047-024- 09997-y (2024)

S. Stepney, Natural Computing 10.1007/s11047-024- 09997-y (2024)

work page doi:10.1007/s11047-024- 2024

[25] [25]

M. J. Falk, J. Wu, A. Matthews, V. Sachdeva, N. Pashine, M. L. Gardel, S. R. Nagel, and A. Muru- gan, Proceedings of the National Academy of Sciences 120, e2219558120 (2023)

2023

[26] [26]

Dillavou, M

S. Dillavou, M. Guzman, A. J. Liu, and D. J. Durian, arXiv preprint arXiv:2505.22887 (2025)

work page arXiv 2025

[27] [27]

M. Ibrahim, Physical learning in resistor networks, https://doi.org/10.5281/zenodo.19975054(2026), version 1.1.0, Zenodo archived software; GitHub repository:https://github.com/Manirmaths/ physical-learning-resistor-networks

work page doi:10.5281/zenodo.19975054(2026 2026