Learn&Drop: Fast Learning of CNNs based on Layer Dropping

Giorgio Cruciata; Jan van Gemert; Liliana Lo Presti; Luca Cruciata; Marco La Cascia

arxiv: 2604.23403 · v1 · submitted 2026-04-25 · 💻 cs.CV · cs.AI· cs.NE

Learn&Drop: Fast Learning of CNNs based on Layer Dropping

Giorgio Cruciata , Luca Cruciata , Liliana Lo Presti , Jan van Gemert , Marco La Cascia This is my paper

Pith reviewed 2026-05-08 08:33 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.NE

keywords CNN training accelerationlayer droppingforward propagation reductionVGGResNetimage classification efficiencytraining time speedup

0 comments

The pith

By scoring each layer's parameter change and future learning potential, dropping low-scoring layers during training cuts forward operations and more than halves CNN training time with comparable accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Learn&Drop, a training method that computes two scores per layer: one measuring how much its parameters are still changing and another assessing whether it will continue to learn meaningfully. Layers with low scores are dropped, shrinking the active network so that each forward pass performs fewer operations and training proceeds faster. This approach differs from prior work by targeting the forward phase of training rather than inference compression or backpropagation savings. Experiments on MNIST, CIFAR-10, and Imagenette using VGG and ResNet families demonstrate training time reductions exceeding 50 percent and forward FLOPs cuts between 18 and 84 percent, while final accuracy stays close to that of full models. The technique is positioned as especially practical for fine-tuning or sequential data scenarios.

Core claim

During training, layer-change and continuation scores can be used to identify and drop layers whose removal reduces the number of parameters that must be updated, thereby lowering the computational cost of every forward pass while still allowing the remaining network to reach comparable final accuracy on image classification tasks.

What carries the argument

Layer-change score and continuation score, which together decide which layers to drop so the network shrinks dynamically during training.

If this is right

Training time for VGG and ResNet models is more than halved on MNIST, CIFAR-10, and Imagenette.
FLOPs in forward propagation during training are reduced from 17.83 percent for VGG-11 up to 83.74 percent for ResNet-152.
Final accuracy remains comparable to training the full network.
The method applies directly to fine-tuning and online learning settings where data arrives sequentially.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same scoring idea could be tested on other families such as EfficientNet or Vision Transformers to check whether similar early redundancy appears.
If scores are recomputed periodically, previously dropped layers might be reinserted when their contribution later increases.
In memory-limited hardware the dynamic reduction could allow training of deeper models than would otherwise fit.
The approach implicitly suggests that many layers in over-parameterized networks contribute little during substantial portions of training.

Load-bearing premise

That the two scores reliably flag layers whose removal will not block the remaining network from reaching comparable final accuracy, and that this holds without extra tuning across datasets and architectures.

What would settle it

Apply the method to a new architecture or dataset and observe that final test accuracy falls substantially below the accuracy obtained by training the identical architecture without any drops.

read the original abstract

This paper proposes a new method to improve the training efficiency of deep convolutional neural networks. During training, the method evaluates scores to measure how much each layer's parameters change and whether the layer will continue learning or not. Based on these scores, the network is scaled down such that the number of parameters to be learned is reduced, yielding a speed up in training. Unlike state-of-the-art methods that try to compress the network to be used in the inference phase or to limit the number of operations performed in the backpropagation phase, the proposed method is novel in that it focuses on reducing the number of operations performed by the network in the forward propagation during training. The proposed training strategy has been validated on two widely used architecture families: VGG and ResNet. Experiments on MNIST, CIFAR-10 and Imagenette show that, with the proposed method, the training time of the models is more than halved without significantly impacting accuracy. The FLOPs reduction in the forward propagation during training ranges from 17.83\% for VGG-11 to 83.74\% for ResNet-152. These results demonstrate the effectiveness of the proposed technique in speeding up learning of CNNs. The technique will be especially useful in applications where fine-tuning or online training of convolutional models is required, for instance because data arrive sequentially.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Learn&Drop halves training time via layer dropping but the scoring mechanism lacks supporting ablations.

read the letter

The key point here is that Learn&Drop uses two scores—one for how much a layer's parameters are still changing and another for whether it will keep learning—to drop layers partway through training. This cuts the forward-pass compute and reportedly halves the overall training time on MNIST, CIFAR-10, and Imagenette while accuracy stays roughly the same for both VGG and ResNet models. The new angle is the emphasis on trimming forward operations during the training phase itself. Most prior work targets inference speed or reduces back-propagation cost, so this is a different slice of the efficiency problem. The experiments give concrete wins: more than 50% faster training and forward FLOPs cuts between 18% and 84% depending on the model. That kind of number is useful for anyone who runs repeated training jobs. The paper handles the validation across two common architecture families reasonably well for an initial report. The soft spot is the lack of evidence that the specific scores are what make the difference. We do not see ablations that compare against dropping layers at random, dropping the earliest ones, or simply training a narrower network from the start. If those controls are missing, the speed-up could just be the result of lower capacity rather than intelligent layer selection. The description also leaves out the exact formula for the scores and the dropping schedule, which makes it hard to reproduce or extend the method. If the full paper includes those details and the ablations, the concern shrinks. This paper is for practitioners who need quicker training loops for convolutional models, especially in fine-tuning or streaming data settings. A reader working on training efficiency would find the empirical results worth looking at. It is worth sending to peer review. The idea is practical and the benchmarks are standard, so referees can check the claims directly. I would recommend review with a request for the missing ablations and implementation specifics.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Learn&Drop, a training heuristic for CNNs that computes two per-layer scores (parameter-change magnitude and a continuation prediction) during training and drops selected layers to reduce the number of parameters and forward-propagation FLOPs. The method is evaluated on VGG and ResNet families using MNIST, CIFAR-10, and Imagenette, with the central empirical claim that training time is more than halved while accuracy remains comparable and forward FLOPs are reduced between 17.83% (VGG-11) and 83.74% (ResNet-152). The approach is positioned as distinct from inference-time compression or back-propagation-focused techniques.

Significance. If the layer-selection scores prove reliable across architectures and datasets, the technique could meaningfully accelerate training loops in online or fine-tuning settings where data arrive sequentially. The emphasis on forward-pass savings during training rather than inference is a potentially useful distinction. However, the absence of methodological detail and controls in the current version makes it difficult to judge whether the reported speed-ups exceed what would be obtained by simpler capacity-reduction baselines.

major comments (3)

[Abstract / Experiments] Abstract and Experiments section: The claims of '>50% training-time reduction' and specific FLOPs savings (17.83%–83.74%) are presented without any description of the measurement protocol (hardware, batch size, epoch count, wall-clock vs. FLOPs accounting), number of independent runs, or statistical significance tests. These omissions are load-bearing because they prevent verification of the central efficiency claim.
[Method] Method section: No ablation is reported that isolates the contribution of the layer-change and continuation scores from generic early-layer dropping or from simply training a statically thinner network from scratch. Without such controls it remains unclear whether the scores are predictive or whether the observed speed-up is an artifact of reduced model capacity.
[Method] Method section: The exact definitions, formulas, and dropping schedule for the two scores are not provided (no equations, pseudocode, or hyper-parameter values). This prevents reproduction and makes it impossible to assess whether the scores are robust or require extensive per-architecture tuning.

minor comments (2)

[Abstract] The abstract states that the method was validated on 'two widely used architecture families' but only names VGG-11 and ResNet-152; a table listing all evaluated depths and variants would improve clarity.
[Introduction] Related-work discussion could more explicitly contrast the forward-pass focus with existing dynamic-pruning or early-exit methods to strengthen the novelty claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important areas for improving reproducibility and methodological clarity, and we will revise the manuscript to address them.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: The claims of '>50% training-time reduction' and specific FLOPs savings (17.83%–83.74%) are presented without any description of the measurement protocol (hardware, batch size, epoch count, wall-clock vs. FLOPs accounting), number of independent runs, or statistical significance tests. These omissions are load-bearing because they prevent verification of the central efficiency claim.

Authors: We agree that the measurement protocol requires explicit documentation. In the revised manuscript we will add a dedicated paragraph in the Experiments section specifying the hardware (GPU model and memory), batch sizes per dataset, epoch counts, wall-clock timing procedure (including overhead from score computation), FLOPs accounting method, number of independent runs (averaged over 5 seeds with standard deviations), and statistical tests (paired t-tests on accuracy). revision: yes
Referee: [Method] Method section: No ablation is reported that isolates the contribution of the layer-change and continuation scores from generic early-layer dropping or from simply training a statically thinner network from scratch. Without such controls it remains unclear whether the scores are predictive or whether the observed speed-up is an artifact of reduced model capacity.

Authors: We acknowledge the need for controls that separate the effect of the proposed scores from simple capacity reduction. We will include new ablation experiments comparing Learn&Drop against (i) random layer dropping at matched rates, (ii) position-based dropping without scores, and (iii) training statically thinner networks of equivalent parameter count from scratch, reporting both accuracy and training-time metrics. revision: yes
Referee: [Method] Method section: The exact definitions, formulas, and dropping schedule for the two scores are not provided (no equations, pseudocode, or hyper-parameter values). This prevents reproduction and makes it impossible to assess whether the scores are robust or require extensive per-architecture tuning.

Authors: We apologize for the missing formal definitions. The revised version will supply the exact equations for the parameter-change magnitude and continuation scores, the complete dropping schedule, algorithm pseudocode, and all hyper-parameter values (thresholds, weighting coefficients, evaluation frequency) used in the reported experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical heuristic without derivations or self-referential claims

full rationale

The paper describes a practical training heuristic that computes layer-change and continuation scores to decide which layers to drop mid-training, then reports empirical speed-ups and accuracy on MNIST/CIFAR-10/Imagenette for VGG and ResNet families. No equations, derivations, uniqueness theorems, or first-principles predictions appear; the central claims rest on experimental measurements rather than any reduction of outputs to fitted inputs or self-citations. The method is therefore self-contained as an engineering technique whose validity is tested externally against full-model baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no equations, parameters, or assumptions are stated, so the ledger remains empty.

pith-pipeline@v0.9.0 · 5554 in / 1187 out tokens · 36233 ms · 2026-05-08T08:33:26.132614+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references

[1]

Deep learn- ing

LeCun Y, Bengio Y, Hinton G. Deep learn- ing. nature. 2015;521(7553):436–444. Springer Nature 2021 LATEX template Learn&Drop15

2015
[2]

Efficient deep learning: A survey on making deep learning models smaller, faster, and better

Menghani G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. arXiv preprint arXiv:210608962. 2021

2021
[3]

Importance estimation for neu- ral network pruning

Molchanov P, Mallya A, Tyree S, Frosio I, Kautz J. Importance estimation for neu- ral network pruning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 11264– 11272

2019
[4]

Heuristic-based auto- matic pruning of deep neural networks

Choudhary T, Mishra V, Goswami A, Sarangapani J. Heuristic-based auto- matic pruning of deep neural networks. Neural Computing and Applications. 2022;34(6):4889–4903

2022
[5]

A new growing pruning deep learning neural network algorithm (GP- DLNN)

Zemouri R, Omri N, Fnaiech F, Zerhouni N, Fnaiech N. A new growing pruning deep learning neural network algorithm (GP- DLNN). Neural Computing and Applica- tions. 2020;32:18143–18159

2020
[6]

Channel pruning for accelerating very deep neural networks

He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international confer- ence on computer vision; 2017. p. 1389–1397

2017
[7]

Efficient structured pruning based on deep feature stabilization

Xu S, Chen H, Gong X, Liu K, L¨ u J, Zhang B. Efficient structured pruning based on deep feature stabilization. Neural Computing and Applications. 2021;33(13):7409–7420

2021
[8]

Fast deep learning training through intel- ligently freezing layers

Xiao X, Mudiyanselage TB, Ji C, Hu J, Pan Y. Fast deep learning training through intel- ligently freezing layers. In: 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communi- cations (GreenCom) and IEEE Cyber, Phys- ical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). IEEE; 2019. p. 1225–1232

2019
[9]

Efficient and effec- tive training of sparse recurrent neural net- works

Liu S, Ni’mah I, Menkovski V, Mocanu DC, Pechenizkiy M. Efficient and effec- tive training of sparse recurrent neural net- works. Neural Computing and Applications. 2021;33:9625–9636

2021
[10]

Eager prun- ing: Algorithm and architecture support for fast training of deep neural networks

Zhang J, Chen X, Song M, Li T. Eager prun- ing: Algorithm and architecture support for fast training of deep neural networks. In: 2019 ACM/IEEE 46th Annual International Sym- posium on Computer Architecture (ISCA). IEEE; 2019. p. 292–303

2019
[11]

Very deep convo- lutional networks for large-scale image recog- nition

Simonyan K, Zisserman A. Very deep convo- lutional networks for large-scale image recog- nition. arXiv preprint arXiv:14091556. 2014

2014
[12]

Deep residual learning for image recognition

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on com- puter vision and pattern recognition; 2016. p. 770–778

2016
[13]

Optimal brain damage

LeCun Y, Denker J, Solla S. Optimal brain damage. Advances in neural information processing systems. 1989;2

1989
[14]

Optimal brain surgeon and general network pruning

Hassibi B, Stork DG, Wolff GJ. Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks. IEEE; 1993. p. 293–299

1993
[15]

Thinet: A filter level pruning method for deep neural network compression

Luo JH, Wu J, Lin W. Thinet: A filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision
[16]

Deep com- pression: Compressing deep neural networks with pruning, trained quantization and huff- man coding

Han S, Mao H, Dally WJ. Deep com- pression: Compressing deep neural networks with pruning, trained quantization and huff- man coding. arXiv preprint arXiv:151000149. 2015

2015
[17]

Learn- ing both weights and connections for efficient neural network

Han S, Pool J, Tran J, Dally W. Learn- ing both weights and connections for efficient neural network. Advances in neural informa- tion processing systems. 2015;28

2015
[18]

Residual net- works behave like ensembles of relatively shal- low networks

Veit A, Wilber MJ, Belongie S. Residual net- works behave like ensembles of relatively shal- low networks. Advances in neural information processing systems. 2016;29

2016
[19]

Channel-level acceleration of deep face representations

Polyak A, Wolf L. Channel-level acceleration of deep face representations. IEEE Access. 2015;3:2163–2175. Springer Nature 2021 LATEX template 16Learn&Drop

2015
[20]

Shallowing deep net- works: Layer-wise pruning based on fea- ture representations

Chen S, Zhao Q. Shallowing deep net- works: Layer-wise pruning based on fea- ture representations. IEEE transactions on pattern analysis and machine intelligence. 2018;41(12):3048–3056

2018
[21]

Layer pruning via fusible residual convolu- tional block for deep neural networks

Xu P, Cao J, Shang F, Sun W, Li P. Layer pruning via fusible residual convolu- tional block for deep neural networks. arXiv preprint arXiv:201114356. 2020

2020
[22]

To filter prune, or to layer prune, that is the question

Elkerdawy S, Elhoushi M, Singh A, Zhang H, Ray N. To filter prune, or to layer prune, that is the question. In: Proceedings of the Asian Conference on Computer Vision; 2020. p. 1–17

2020
[23]

Accurate and fast deep evolutionary networks structured representation through activating and freezing dense networks

Tan D, Zhong W, Peng X, Wang Q, Mahalec V. Accurate and fast deep evolutionary networks structured representation through activating and freezing dense networks. IEEE Transactions on Cognitive and Developmen- tal Systems. 2020

2020
[24]

The Python Library Refer- ence, release 3.8.2

Van Rossum G. The Python Library Refer- ence, release 3.8.2. Python Software Foun- dation; 2020. https://github.com/python/ cpython/blob/3.11/Lib/pickle.py

2020
[25]

Array programming with NumPy

Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Courna- peau D, et al. Array programming with NumPy. Nature. 2020 Sep;585(7825):357–

2020
[26]

https://doi.org/10

https://numpy.org/. https://doi.org/10. 1038/s41586-020-2649-2
[27]

PyTorch: An Imper- ative Style, High-Performance Deep Learning Library

Paszke A, Gross S, Massa F, Lerer A, Brad- bury J, Chanan G, et al. PyTorch: An Imper- ative Style, High-Performance Deep Learning Library. In: Advances in Neural Information Processing Systems 32. Curran Associates, Inc.; 2019. p. 8024–8035. https://pytorch. org/

2019
[28]

Convolutional networks for images, speech, and time series

LeCun Y, Bengio Y, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks. 1995;3361(10):1995

1995
[29]

Learning Mul- tiple Layers of Features from Tiny Images

Krizhevsky A, Hinton G, et al. Learning Mul- tiple Layers of Features from Tiny Images. Technical Report. 2009;p. 32–33

2009
[30]

Imagenette;https://github.com/ fastai/imagenette/

Howard J. Imagenette;https://github.com/ fastai/imagenette/
[31]

Imagenet large scale visual recognition challenge

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. Imagenet large scale visual recognition challenge. International journal of computer vision. 2015;115:211–252

2015

[1] [1]

Deep learn- ing

LeCun Y, Bengio Y, Hinton G. Deep learn- ing. nature. 2015;521(7553):436–444. Springer Nature 2021 LATEX template Learn&Drop15

2015

[2] [2]

Efficient deep learning: A survey on making deep learning models smaller, faster, and better

Menghani G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. arXiv preprint arXiv:210608962. 2021

2021

[3] [3]

Importance estimation for neu- ral network pruning

Molchanov P, Mallya A, Tyree S, Frosio I, Kautz J. Importance estimation for neu- ral network pruning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 11264– 11272

2019

[4] [4]

Heuristic-based auto- matic pruning of deep neural networks

Choudhary T, Mishra V, Goswami A, Sarangapani J. Heuristic-based auto- matic pruning of deep neural networks. Neural Computing and Applications. 2022;34(6):4889–4903

2022

[5] [5]

A new growing pruning deep learning neural network algorithm (GP- DLNN)

Zemouri R, Omri N, Fnaiech F, Zerhouni N, Fnaiech N. A new growing pruning deep learning neural network algorithm (GP- DLNN). Neural Computing and Applica- tions. 2020;32:18143–18159

2020

[6] [6]

Channel pruning for accelerating very deep neural networks

He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international confer- ence on computer vision; 2017. p. 1389–1397

2017

[7] [7]

Efficient structured pruning based on deep feature stabilization

Xu S, Chen H, Gong X, Liu K, L¨ u J, Zhang B. Efficient structured pruning based on deep feature stabilization. Neural Computing and Applications. 2021;33(13):7409–7420

2021

[8] [8]

Fast deep learning training through intel- ligently freezing layers

Xiao X, Mudiyanselage TB, Ji C, Hu J, Pan Y. Fast deep learning training through intel- ligently freezing layers. In: 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communi- cations (GreenCom) and IEEE Cyber, Phys- ical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). IEEE; 2019. p. 1225–1232

2019

[9] [9]

Efficient and effec- tive training of sparse recurrent neural net- works

Liu S, Ni’mah I, Menkovski V, Mocanu DC, Pechenizkiy M. Efficient and effec- tive training of sparse recurrent neural net- works. Neural Computing and Applications. 2021;33:9625–9636

2021

[10] [10]

Eager prun- ing: Algorithm and architecture support for fast training of deep neural networks

Zhang J, Chen X, Song M, Li T. Eager prun- ing: Algorithm and architecture support for fast training of deep neural networks. In: 2019 ACM/IEEE 46th Annual International Sym- posium on Computer Architecture (ISCA). IEEE; 2019. p. 292–303

2019

[11] [11]

Very deep convo- lutional networks for large-scale image recog- nition

Simonyan K, Zisserman A. Very deep convo- lutional networks for large-scale image recog- nition. arXiv preprint arXiv:14091556. 2014

2014

[12] [12]

Deep residual learning for image recognition

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on com- puter vision and pattern recognition; 2016. p. 770–778

2016

[13] [13]

Optimal brain damage

LeCun Y, Denker J, Solla S. Optimal brain damage. Advances in neural information processing systems. 1989;2

1989

[14] [14]

Optimal brain surgeon and general network pruning

Hassibi B, Stork DG, Wolff GJ. Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks. IEEE; 1993. p. 293–299

1993

[15] [15]

Thinet: A filter level pruning method for deep neural network compression

Luo JH, Wu J, Lin W. Thinet: A filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision

[16] [16]

Deep com- pression: Compressing deep neural networks with pruning, trained quantization and huff- man coding

Han S, Mao H, Dally WJ. Deep com- pression: Compressing deep neural networks with pruning, trained quantization and huff- man coding. arXiv preprint arXiv:151000149. 2015

2015

[17] [17]

Learn- ing both weights and connections for efficient neural network

Han S, Pool J, Tran J, Dally W. Learn- ing both weights and connections for efficient neural network. Advances in neural informa- tion processing systems. 2015;28

2015

[18] [18]

Residual net- works behave like ensembles of relatively shal- low networks

Veit A, Wilber MJ, Belongie S. Residual net- works behave like ensembles of relatively shal- low networks. Advances in neural information processing systems. 2016;29

2016

[19] [19]

Channel-level acceleration of deep face representations

Polyak A, Wolf L. Channel-level acceleration of deep face representations. IEEE Access. 2015;3:2163–2175. Springer Nature 2021 LATEX template 16Learn&Drop

2015

[20] [20]

Shallowing deep net- works: Layer-wise pruning based on fea- ture representations

Chen S, Zhao Q. Shallowing deep net- works: Layer-wise pruning based on fea- ture representations. IEEE transactions on pattern analysis and machine intelligence. 2018;41(12):3048–3056

2018

[21] [21]

Layer pruning via fusible residual convolu- tional block for deep neural networks

Xu P, Cao J, Shang F, Sun W, Li P. Layer pruning via fusible residual convolu- tional block for deep neural networks. arXiv preprint arXiv:201114356. 2020

2020

[22] [22]

To filter prune, or to layer prune, that is the question

Elkerdawy S, Elhoushi M, Singh A, Zhang H, Ray N. To filter prune, or to layer prune, that is the question. In: Proceedings of the Asian Conference on Computer Vision; 2020. p. 1–17

2020

[23] [23]

Accurate and fast deep evolutionary networks structured representation through activating and freezing dense networks

Tan D, Zhong W, Peng X, Wang Q, Mahalec V. Accurate and fast deep evolutionary networks structured representation through activating and freezing dense networks. IEEE Transactions on Cognitive and Developmen- tal Systems. 2020

2020

[24] [24]

The Python Library Refer- ence, release 3.8.2

Van Rossum G. The Python Library Refer- ence, release 3.8.2. Python Software Foun- dation; 2020. https://github.com/python/ cpython/blob/3.11/Lib/pickle.py

2020

[25] [25]

Array programming with NumPy

Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Courna- peau D, et al. Array programming with NumPy. Nature. 2020 Sep;585(7825):357–

2020

[26] [26]

https://doi.org/10

https://numpy.org/. https://doi.org/10. 1038/s41586-020-2649-2

[27] [27]

PyTorch: An Imper- ative Style, High-Performance Deep Learning Library

Paszke A, Gross S, Massa F, Lerer A, Brad- bury J, Chanan G, et al. PyTorch: An Imper- ative Style, High-Performance Deep Learning Library. In: Advances in Neural Information Processing Systems 32. Curran Associates, Inc.; 2019. p. 8024–8035. https://pytorch. org/

2019

[28] [28]

Convolutional networks for images, speech, and time series

LeCun Y, Bengio Y, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks. 1995;3361(10):1995

1995

[29] [29]

Learning Mul- tiple Layers of Features from Tiny Images

Krizhevsky A, Hinton G, et al. Learning Mul- tiple Layers of Features from Tiny Images. Technical Report. 2009;p. 32–33

2009

[30] [30]

Imagenette;https://github.com/ fastai/imagenette/

Howard J. Imagenette;https://github.com/ fastai/imagenette/

[31] [31]

Imagenet large scale visual recognition challenge

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. Imagenet large scale visual recognition challenge. International journal of computer vision. 2015;115:211–252

2015