pith. sign in

arxiv: 1906.09852 · v1 · pith:WUXMHHQ7new · submitted 2019-06-24 · 💻 cs.LG · stat.ML

Lifelong Learning Starting From Zero

Pith reviewed 2026-05-25 17:24 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords lifelong learningcontinual learningneural networksneuroplasticitydynamic architecturesblank slateadaptive networksnode expansion
0
0 comments X

The pith

A neural network that starts with zero nodes develops lifelong learning using four rules of expansion, generalization, forgetting, and backpropagation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a neural network model that begins as a blank slate with no nodes and grows continuously in response to environmental signals. It applies expansion to add nodes for new input combinations, generalization to create nodes that cover broader patterns, forgetting to remove low-use nodes, and backpropagation to adjust parameters. The model is evaluated on accuracy, energy efficiency, and versatility, with claims of better performance than other network models in several cases. A sympathetic reader would care because the approach addresses continual adaptation without fixed initial structures or the need for retraining from scratch on new data.

Core claim

The central claim is that a deep neural-network model inspired by neuroplasticity, beginning as a blank slate with no nodes, can develop continuously according to four rules—expansion, generalization, forgetting, and backpropagation—and thereby achieve competitive or superior performance in accuracy, energy efficiency, and versatility compared to other network models.

What carries the argument

The four rules—expansion (adding nodes to memorize new input combinations), generalization (adding nodes that generalize from existing ones), forgetting (removing nodes of relatively little use), and backpropagation (fine-tuning parameters)—that together drive network development from an initial state of zero nodes.

If this is right

  • The network can adapt to new data indefinitely without requiring a predefined initial size or structure.
  • Energy use stays low because only useful nodes are retained after forgetting.
  • Versatility grows as the network builds both specific and generalized representations over time.
  • In several evaluated cases the model shows higher accuracy than comparison networks while using fewer resources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the rules function as stated, networks could in principle scale their capacity exactly to the complexity of experienced data rather than over- or under-provisioning in advance.
  • The forgetting rule might reduce interference between old and new tasks, but this would need direct measurement on long task sequences.
  • Energy-efficiency gains could be quantified by tracking total node count and forward-pass cost across an extended sequence of tasks.
  • The same developmental rules might be combined with other plasticity mechanisms to handle even more abrupt distribution shifts.

Load-bearing premise

That the four rules of expansion, generalization, forgetting, and backpropagation can be implemented together to deliver the claimed gains in accuracy, energy efficiency, and versatility.

What would settle it

Implement the model and test it on sequential lifelong learning benchmarks such as permuted MNIST or split CIFAR-10; if accuracy or resource metrics do not exceed those of fixed-architecture networks or other continual-learning baselines over multiple tasks, the performance claim does not hold.

Figures

Figures reproduced from arXiv: 1906.09852 by Claes Stranneg{\aa}rd, Filip Slottner Seholm, Fredrik M\"akel\"ainen, Herman Carlstr\"om, Morteza Haghir Chehreghani, Niklas Engsner.

Figure 1
Figure 1. Figure 1: Illustration of the extension rule. Yellow diamonds represent value nodes, [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The network shown is created following receipt of the first data point. The [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the generalization rule. Presuppose the network to the left. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Left: The network produced by LL0 on the spirals data set, with the two output nodes and their connections omitted for sake of readability. The architecture converged after less than one epoch with about 160 nodes, depth six, and max fan-in five. The yellow node was created by the generalization rule. Right: The spirals data set with the generated decision boundary. Input points that triggered the extensio… view at source ↗
Figure 5
Figure 5. Figure 5: Results on the spirals data set. Left: LL0 reaches 100% accuracy on the test set after less than one epoch. By contrast, the best baseline model FC10*3 reaches 80% accuracy after about 350 epochs. Right: FC10*3 consumes over 1000 times more energy than LL0 to reach 80% accuracy [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Results on the digits data set. Left: All models eventually reach approxi￾mately the same accuracy. LL0 learns relatively fast. Right: The energy curves converge [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Results on the radiology data set. Left: LL0 learns about ten times faster than the baselines. Right: LL0 consumes about 10% as much energy [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Results on the wine data set. Left: LL0 learns much more quickly, but peaks at an accuracy level slightly below the best baseline. Right: Energy con￾sumption. 4 Conclusion This paper has presented a model for lifelong learning inspired by four types of neuroplasticity. The LLO model can be used for constructing networks auto￾matically instead of manually. It starts from a blank slate and develops its deep … view at source ↗
read the original abstract

We present a deep neural-network model for lifelong learning inspired by several forms of neuroplasticity. The neural network develops continuously in response to signals from the environment. In the beginning, the network is a blank slate with no nodes at all. It develops according to four rules: (i) expansion, which adds new nodes to memorize new input combinations; (ii) generalization, which adds new nodes that generalize from existing ones; (iii) forgetting, which removes nodes that are of relatively little use; and (iv) backpropagation, which fine-tunes the network parameters. We analyze the model from the perspective of accuracy, energy efficiency, and versatility and compare it to other network models, finding better performance in several cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a deep neural-network model for lifelong learning that starts with zero nodes and evolves continuously via four rules: (i) expansion to add nodes for new input combinations, (ii) generalization to add nodes that generalize from existing ones, (iii) forgetting to remove low-utility nodes, and (iv) backpropagation to fine-tune parameters. The model is claimed to outperform existing networks in accuracy, energy efficiency, and versatility in several cases.

Significance. A working implementation of the four rules that demonstrably improves the three metrics while starting from a blank slate would be a notable contribution to lifelong learning, especially given the reported zero free parameters. However, the manuscript supplies no implementation, experiments, data, or quantitative results, so the significance cannot be evaluated from the given text.

major comments (2)
  1. Abstract: performance claims ('better performance in several cases') are stated without any experimental setup, datasets, baselines, metrics, or numerical results, rendering the central claims unverifiable.
  2. Abstract: the four rules are described at a high level but no pseudocode, algorithmic specification, or reduction to concrete operations is provided, so the weakest assumption (implementability yielding the claimed gains) cannot be checked.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the comments. We respond point by point to the major comments.

read point-by-point responses
  1. Referee: Abstract: performance claims ('better performance in several cases') are stated without any experimental setup, datasets, baselines, metrics, or numerical results, rendering the central claims unverifiable.

    Authors: The manuscript is a conceptual proposal whose analysis of accuracy, energy efficiency, and versatility rests on reasoning from the model's structural properties rather than on empirical runs. The abstract phrasing therefore overstates what is shown. We will revise the abstract to remove the unverifiable performance claim and to state explicitly that the comparison is theoretical. revision: yes

  2. Referee: Abstract: the four rules are described at a high level but no pseudocode, algorithmic specification, or reduction to concrete operations is provided, so the weakest assumption (implementability yielding the claimed gains) cannot be checked.

    Authors: The current text introduces the rules at a conceptual level. We agree that pseudocode and a reduction to concrete operations are required before implementability can be assessed. We will add an algorithmic section containing pseudocode for each rule in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces a neural network model that begins with zero nodes and evolves via four explicitly stated rules (expansion, generalization, forgetting, backpropagation). Performance claims rest on empirical comparisons rather than any reduction of outputs to fitted parameters, self-definitions, or self-citation chains. No equations or steps in the provided material equate a claimed result to its inputs by construction, and the model description does not invoke uniqueness theorems or ansatzes from prior self-work as load-bearing justification. The central claims remain independently falsifiable through implementation and benchmarking.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unelaborated premise that the four rules can be combined into a working system; no free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption The four rules (expansion, generalization, forgetting, backpropagation) suffice to produce effective lifelong learning.
    This premise is invoked by the abstract's description of the model and its performance claims.
invented entities (1)
  • Dynamic node addition/removal mechanism no independent evidence
    purpose: To enable lifelong learning from a blank slate
    New mechanism introduced to realize the four rules.

pith-pipeline@v0.9.0 · 5676 in / 1132 out tokens · 29673 ms · 2026-05-25T17:24:13.643700+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 3 internal anchors

  1. [1]

    Child Development Perspectives 12(3), 183–188 (2018)

    Cangelosi, A., Schlesinger, M.: From babies to robots: the contribution of devel- opmental robotics to developmental psychology. Child Development Perspectives 12(3), 183–188 (2018)

  2. [2]

    In: International Conference on Machine Learning (2014) 10 C

    Chen,Z.,Liu,B.:Topicmodelingusingtopicsfrommanydomains,lifelonglearning and big data. In: International Conference on Machine Learning (2014) 10 C. Strannegård et al

  3. [3]

    In: Proceedings of the 34th International Conference on Machine Learning-Volume

    Cortes, C., et al.: Adanet: Adaptive structural learning of artificial neural networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume

  4. [4]

    pp. 874–883. JMLR. org (2017)

  5. [5]

    IEEE Computational Intelligence Magazine10(4), 12–25 (2015)

    Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environ- ments: A survey. IEEE Computational Intelligence Magazine10(4), 12–25 (2015)

  6. [6]

    In: 2017 International Joint Conference on Neural Net- works (IJCNN)

    Draelos, T.J., et al.: Neurogenesis deep learning: Extending deep networks to ac- commodate new classes. In: 2017 International Joint Conference on Neural Net- works (IJCNN). pp. 526–533. IEEE (2017)

  7. [7]

    Behavioural brain research192(1), 137–142 (2008)

    Draganski, B., May, A.: Training-induced structural changes in the adult human brain. Behavioural brain research192(1), 137–142 (2008)

  8. [8]

    In: Ad- vances in neural information processing systems

    Fahlman, S.E., Lebiere, C.: The cascade-correlation learning architecture. In: Ad- vances in neural information processing systems. pp. 524–532 (1990)

  9. [9]

    Trends in cogni- tive sciences 3(4), 128–135 (1999)

    French, R.M.: Catastrophic forgetting in connectionist networks. Trends in cogni- tive sciences 3(4), 128–135 (1999)

  10. [10]

    MIT press (2016)

    Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT press (2016)

  11. [11]

    Trends in Neurosciences27(12) (2004)

    Greenspan, R.J., Van Swinderen, B.: Cognitive consonance: complex brain func- tions in the fruit fly and its relatives. Trends in Neurosciences27(12) (2004)

  12. [12]

    Grossberg, S.: How Does a Brain Build a Cognitive Code?, pp. 1–52. Springer Netherlands, Dordrecht (1982)

  13. [13]

    Neuron95, 245–258 (2017)

    Hassabis, D., Kumaran, D., Summerfield, C., Botvinick, M.: Neuroscience-inspired artificial intelligence. Neuron95, 245–258 (2017)

  14. [14]

    IEEE Access6, 24411–24432 (2018)

    Hatcher, W.G., Yu, W.: A survey of deep learning: platforms, applications and emerging research trends. IEEE Access6, 24411–24432 (2018)

  15. [15]

    Kandel, E.R., Schwartz, J.H., Jessell, T.M., et al.: Principles of neural science, vol. 4. McGraw-Hill New York (2000)

  16. [16]

    Pro- ceedings of the National Academy of Sciences114(13), 3521–3526 (2017)

    Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Pro- ceedings of the National Academy of Sciences114(13), 3521–3526 (2017)

  17. [17]

    Cognition 110(3), 380–394 (2009)

    Krueger, K.A., Dayan, P.: Flexible shaping: How learning in small steps helps. Cognition 110(3), 380–394 (2009)

  18. [18]

    Lifelong Learning with Dynamically Expandable Networks

    Lee, J., Yoon, J., Yang, E., Hwang, S.J.: Lifelong learning with dynamically ex- pandable networks. CoRRabs/1708.01547 (2018)

  19. [19]

    IEEE Transactions on Pattern Analysis and Machine Intelligence40, 2935–2947 (2018)

    Li, Z., Hoiem, D.: Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence40, 2935–2947 (2018)

  20. [20]

    McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: Thesequentiallearningproblem.In:Psychologyoflearningandmotivation,vol.24, pp. 109–165. Elsevier (1989)

  21. [21]

    Frontiers in psychology4, 504 (2013)

    Mermillod, M., Bugaiska, A., Bonin, P.: The stability-plasticity dilemma: Investi- gating the continuum from catastrophic forgetting to age-limited learning effects. Frontiers in psychology4, 504 (2013)

  22. [22]

    Com- munications of the ACM61(5), 103–115 (2018)

    Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Yang, B., Betteridge, J., Carlson, A., Dalvi, B., Gardner, M., Kisiel, B., et al.: Never-ending learning. Com- munications of the ACM61(5), 103–115 (2018)

  23. [23]

    Annual review of neuroscience14(1), 453–501 (1991)

    Oppenheim, R.W.: Cell death during development of the nervous system. Annual review of neuroscience14(1), 453–501 (1991)

  24. [24]

    Science 333(6048), 1456–1458 (2011)

    Paolicelli, R.C., et al.: Synaptic pruning by microglia is necessary for normal brain development. Science 333(6048), 1456–1458 (2011)

  25. [25]

    Neural networks: the official journal of the International Neural Network Society113, 54–71 (2019)

    Parisi, G., Kemker, R., Part, J., Kanan, C., Wermter, S.: Continual lifelong learn- ing with neural networks: A review. Neural networks: the official journal of the International Neural Network Society113, 54–71 (2019)

  26. [26]

    Wiley Interdis- ciplinary Reviews: Developmental Biology6(1), e216 (2017) Lifelong Learning Starting From Zero 11

    Power, J.D., Schlaggar, B.L.: Neural plasticity across the lifespan. Wiley Interdis- ciplinary Reviews: Developmental Biology6(1), e216 (2017) Lifelong Learning Starting From Zero 11

  27. [27]

    Progressive Neural Networks

    Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)

  28. [28]

    Neural networks : the official journal of the International Neural Network Society108, 48–67 (2018)

    Soltoggio, A., Stanley, K.O., Risi, S.: Born to learn: The inspiration, progress, and future of evolved plastic artificial neural networks. Neural networks : the official journal of the International Neural Network Society108, 48–67 (2018)

  29. [29]

    Proceedings of the IEEE105(12) (2017)

    Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE105(12) (2017)

  30. [30]

    The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning

    Wolfe, N., Sharma, A., Drude, L., Raj, B.: The incredible shrinking neural network: New perspectives on learning representations through the lens of pruning. arXiv preprint arXiv:1701.04465 (2017)

  31. [31]

    In: Proceedings of the 34th International Conference on Machine Learning-Volume

    Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: Proceedings of the 34th International Conference on Machine Learning-Volume

  32. [32]

    3987–3995

    pp. 3987–3995. JMLR. org (2017)

  33. [33]

    In: Artificial intelligence and statistics

    Zhou, G., Sohn, K., Lee, H.: Online incremental feature learning with denoising autoencoders. In: Artificial intelligence and statistics. pp. 1453–1461 (2012)