Scaling few-shot spoken word classification with generative meta-continual learning

Batsirayi Mupamhi Ziki; Louise Beyers; Ruan van der Merwe

arxiv: 2605.13075 · v2 · pith:YRTVKMRPnew · submitted 2026-05-13 · 💻 cs.CL · cs.AI

Scaling few-shot spoken word classification with generative meta-continual learning

Louise Beyers , Batsirayi Mupamhi Ziki , Ruan van der Merwe This is my paper

Pith reviewed 2026-05-15 05:59 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords few-shot learningspoken word classificationcontinual learningmeta-learningspeech processingHuBERTcatastrophic forgetting

0 comments

The pith

Generative meta-continual learning scales few-shot spoken word classification to 1000 classes while matching strong baselines at far lower adaptation cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether a spoken word classifier can learn to distinguish 1000 different words when each word appears in only five training examples and classes arrive one after another. It trains a HuBERT-based model with the Generative Meta-Continual Learning algorithm and compares the results against baselines that either fully retrain or repeatedly train a new classifier head. The GeMCL model keeps accuracy stable across the entire sequence of classes, reaches performance levels close to the strongest baseline, and does so after training on less than half the data and for two orders of magnitude less time. This efficiency gain arises because the model adapts to each new class roughly 2000 times faster than full retraining. A sympathetic reader would care because real-world voice systems must keep adding new words over time without incurring repeated heavy compute costs.

Core claim

Applying the Generative Meta-Continual Learning algorithm to a HuBERT backbone produces a classifier that sequentially incorporates 1000 spoken-word classes from five shots each, maintains stable accuracy throughout the sequence, and delivers performance comparable to a frozen HuBERT model with a repeatedly trained head while adapting two thousand times faster after exposure to less than half the total data and two orders of magnitude less training time.

What carries the argument

The Generative Meta-Continual Learning (GeMCL) algorithm, whose generative component supplies synthetic replay to prevent catastrophic forgetting during sequential class addition.

If this is right

New spoken-word classes can be added sequentially without measurable loss on earlier classes.
Adaptation to each fresh set of words requires orders of magnitude less compute and data than full retraining.
Stable accuracy holds up to 1000 classes without per-task hyperparameter changes.
Total training data and wall-clock time needed to reach 1000-class coverage fall below half and two orders of magnitude, respectively, of the repeated-finetuning baseline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same generative replay approach could be tested on sequential audio tasks beyond isolated words, such as speaker verification or environmental sound detection.
The large reduction in adaptation cost opens a route to on-device incremental learning for personalized voice interfaces.
Extending the class sequence past 1000 while keeping the same fixed hyper-parameters would directly test the scaling limit of the generative component.

Load-bearing premise

The generative component inside GeMCL is enough to stop catastrophic forgetting once the sequence reaches 1000 classes, without any task-specific hyperparameter retuning or extra regularization.

What would settle it

A clear drop in accuracy on the earliest classes after the model finishes learning all 1000 classes, measured against the frozen-HuBERT-plus-retrained-head baseline, would falsify the stability claim.

Figures

Figures reproduced from arXiv: 2605.13075 by Batsirayi Mupamhi Ziki, Louise Beyers, Ruan van der Merwe.

**Figure 1.** Figure 1: A single step in the meta-training procedure of GeMCL. Each task is an N-way-K-shot episode. The encoder, fϕ, embeds the input. The support set is used to calculate the class statistics from the embeddings. The matrix X˜ contains the samples from the combined query set of all classes, totalling M samples, whereas the vector Y˜ contains the corresponding labels. The model produces a vector Yˆ containing M p… view at source ↗

**Figure 2.** Figure 2: A comparison of the training and evaluation flows of the baselines and GeMCL ated with evaluating at different stages of learning. However, our baselines must be finetuned from scratch each time new classes are considered. There can be an additional hyperparameter tuning cost associated with this. The baselines therefore do not provide a comparison in terms of adaptability, but they instead provide a mea… view at source ↗

**Figure 3.** Figure 3: The average accuracy of GeMCL, full FT and the CH models with 95% confidence intervals. shot adaptation for each model. 4. Results [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Few-shot spoken word classification has largely been developed for applications where a small number of classes is considered, and so the potential of larger-scale few-shot spoken word classification remains untapped. This paper investigates the potential of a spoken word classifier to sequentially learn to distinguish between 1000 classes when it is given only five shots per class. We demonstrate that this scaling capability exists by training a model using the Generative Meta-Continual Learning (GeMCL) algorithm and comparing it to repeatedly trained or finetuned baselines. We find that GeMCL produces exceptionally stable performance, and although it does not always outperform a repeatedly fully-finetuned HuBERT model nor a frozen HuBERT model with a repeatedly trained classifier head, it produces comparable performance to the latter while adapting 2000 times faster, having been trained less than half of the data for two orders of magnitude less time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GeMCL scales to 1000 spoken classes in 5-shot continual learning with comparable accuracy and large efficiency gains over retraining baselines.

read the letter

This paper shows that generative meta-continual learning can handle the sequential addition of up to 1000 spoken word classes using only five examples per class. The GeMCL model keeps performance steady across the sequence and delivers results comparable to a frozen HuBERT model with a repeatedly trained head, but it does so with dramatically lower compute: 2000 times faster adaptation, less than half the training data, and two orders of magnitude less wall time. The new element is the scale. Few-shot spoken word work usually stays at small numbers of classes, so demonstrating that the approach works at 1000 classes is a legitimate extension of prior GeMCL results. The efficiency comparison to full finetuning and repeated head training is straightforward and highlights a practical advantage for settings where you cannot afford to retrain everything from scratch each time a new word is added. The paper does well in laying out the protocol for continual addition and in choosing relevant baselines from the speech recognition literature. The claim that no extra hyperparameter tuning or regularization is needed beyond the generative component is interesting if it holds. The soft spots are mostly around verification. The abstract mentions stable performance but gives no error bars, no details on how the 1000 classes were ordered or split, and no statistical tests. That makes it hard to judge whether the stability is robust or sensitive to the particular sequence. The weakest assumption is that the generative part alone prevents forgetting at this scale without further tweaks. This is for researchers focused on continual learning in audio or speech applications. Anyone building adaptive voice systems that need to grow their vocabulary over time could find the efficiency numbers useful. I would recommend sending it for peer review. The scaling demonstration is concrete and the efficiency angle is worth checking in detail, even if some methodological details need to be filled in.

Referee Report

2 major / 1 minor

Summary. The paper introduces Generative Meta-Continual Learning (GeMCL) to scale few-shot spoken word classification to 1000 classes using only 5 shots per class in a sequential setting. It compares GeMCL against repeatedly fully-finetuned HuBERT and frozen HuBERT with a repeatedly trained classifier head, claiming comparable accuracy to the latter while achieving 2000x faster adaptation, using less than half the data, and requiring two orders of magnitude less wall-clock training time.

Significance. If the efficiency and stability claims are substantiated with proper controls, the work would be significant for continual learning in speech, as it addresses scaling few-shot classification to large numbers of classes without catastrophic forgetting and with practical computational savings. The protocol of sequential 5-shot addition of 1000 classes is a demanding test case that, if successful, could influence meta-learning and speech applications.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): The claims of 'exceptionally stable performance' and comparability to baselines are presented without error bars, standard deviations across runs, exact data splits for the 1000 classes, or statistical significance tests, rendering the central efficiency and stability assertions unverifiable from the provided information.
[§3] §3 (Methods): The description of the generative component in GeMCL does not provide sufficient detail on how it prevents catastrophic forgetting when scaling to 1000 sequential classes, nor does it clarify whether task-specific hyperparameter retuning or additional regularization beyond the described method is required.

minor comments (1)

[Abstract] The abstract would benefit from a brief statement of the dataset(s) used and the precise definition of 'adapting 2000 times faster' (e.g., wall time per new class or total training time).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their constructive feedback on our manuscript. We address each major comment below and outline the specific revisions planned to enhance verifiability and clarity.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): The claims of 'exceptionally stable performance' and comparability to baselines are presented without error bars, standard deviations across runs, exact data splits for the 1000 classes, or statistical significance tests, rendering the central efficiency and stability assertions unverifiable from the provided information.

Authors: We agree that the current version of the manuscript does not include error bars, standard deviations across runs, exact data splits, or statistical significance tests, which limits the verifiability of the stability and efficiency claims. In the revised manuscript, we will update the abstract and §4 to report results averaged over multiple random seeds with standard deviations, provide the precise data splits used for the 1000 classes, and include statistical significance tests comparing GeMCL to the baselines. revision: yes
Referee: [§3] §3 (Methods): The description of the generative component in GeMCL does not provide sufficient detail on how it prevents catastrophic forgetting when scaling to 1000 sequential classes, nor does it clarify whether task-specific hyperparameter retuning or additional regularization beyond the described method is required.

Authors: We thank the referee for highlighting this gap in the methods description. The current §3 is concise and does not fully elaborate on the mechanisms. In the revised version, we will expand §3 to detail how the generative component enables prevention of catastrophic forgetting through generative replay within the meta-continual learning process when scaling to 1000 classes. We will also clarify that no task-specific hyperparameter retuning was performed and that no additional regularization beyond the core GeMCL method was used. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical study of GeMCL for sequential 5-shot spoken word classification across 1000 classes, with performance claims resting on direct comparisons to repeatedly finetuned or frozen HuBERT baselines. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or protocol description. Efficiency metrics (2000x faster adaptation, <1/2 data, 100x less time) are stated as measurable outcomes of the experimental setup rather than self-referential constructs. The derivation chain is self-contained as standard empirical ML evaluation without reduction to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities can be extracted. The approach implicitly assumes standard neural network continual-learning assumptions such as the ability of generative rehearsal to mitigate forgetting.

pith-pipeline@v0.9.0 · 5457 in / 1064 out tokens · 60154 ms · 2026-05-15T05:59:08.722142+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

Banayeeanzade, Mohammadamin and Mirzaiezadeh, Rasoul and Hasani, Hosein and Baghshah, Mahdieh Soleymani , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

work page 2021
[2]

Three types of incremental learning , volume =

van de Ven, Gido and Tuytelaars, Tinne and Tolias, Andreas , year =. Three types of incremental learning , volume =

work page
[3]

Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , year=

Multilingual Spoken Words Corpus , author=. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , year=

work page
[4]

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , year=

Hsu, Wei-Ning and Bolte, Benjamin and Tsai, Yao-Hung Hubert and Lakhotia, Kushal and Salakhutdinov, Ruslan and Mohamed, Abdelrahman , journal=. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , year=

work page
[5]

Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

Baevski, Alexei and Zhou, Henry and Mohamed, Abdelrahman and Auli, Michael , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =

work page 2020
[6]

Librispeech: An ASR corpus based on public domain audio books , year=

Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev , booktitle=. Librispeech: An ASR corpus based on public domain audio books , year=

work page
[7]

Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =

Snell, Jake and Swersky, Kevin and Zemel, Richard , title =. Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =. 2017 , isbn =

work page 2017
[8]

NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications , year=

A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning , author=. NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications , year=

work page 2022
[9]

International Conference on Learning Representations , year=

Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=

work page
[10]

Efficient Continual Learning in Keyword Spotting using Binary Neural Networks , year=

Vu, Quynh Nguyen-Phuong and Martinez-Rau, Luciano Sebastian and Zhang, Yuxuan and Tran, Nho-Duc and Oelmann, Bengt and Magno, Michele and Bader, Sebastian , booktitle=. Efficient Continual Learning in Keyword Spotting using Binary Neural Networks , year=

work page
[11]

2025 , month =

Luthra, Mahi and Shen, Jiayi and Poli, Maxime and Ortiz, Angelo and Higuchi, Yosuke and Benchekroun, Youssef and Gleize, Martin and Saint-James, Charles-Eric and Lin, Dongyan and Rust, Phillip and Villar, Angel and Parimi, Surya and Stark, Vanessa and Moritz, Rashel and Pino, Juan and LeCun, Yann and Dupoux, Emmanuel , journal=. 2025 , month =

work page 2025
[12]

Yangbin Chen and Tom Ko and Jianping Wang , year =

work page
[13]

2023 , booktitle =

Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification Through Meta-Learning , author =. 2023 , booktitle =

work page 2023
[14]

Proceedings of Interspeech 2020 , pages =

Chen, Yangbin and Ko, Tom and Shang, Lifeng and Chen, Xiao and Jiang, Xin and Li, Qing , title =. Proceedings of Interspeech 2020 , pages =. 2020 , month =

work page 2020
[15]

Manuele Rusci and Tinne Tuytelaars , year =

work page
[16]

Junming Yuan and Ying Shi and LanTian Li and Dong Wang and Askar Hamdulla , year =

work page
[17]

Proceedings of the 2022 7th International Conference on Machine Learning Technologies , pages =

Parnami, Archit and Lee, Minwoo , title =. Proceedings of the 2022 7th International Conference on Machine Learning Technologies , pages =. 2022 , isbn =

work page 2022
[18]

On the Efficiency of Integrating Self-Supervised Learning and Meta-Learning for User-Defined Few-Shot Keyword Spotting , year=

Kao, Wei-Tsung and Wu, Yuan-Kuei and Chen, Chia-Ping and Chen, Zhi-Sheng and Tsai, Yu-Pao and Lee, Hung-Yi , booktitle=. On the Efficiency of Integrating Self-Supervised Learning and Meta-Learning for User-Defined Few-Shot Keyword Spotting , year=

work page
[19]

Ashish Mittal and Samarth Bharadwaj and Shreya Khare and Saneem Chemmengath and Karthik Sankaranarayanan and Brian Kingsbury , year =

work page
[20]

Self-Learning for Personalized Keyword Spotting on Ultralow-Power Audio Sensors , year=

Rusci, Manuele and Paci, Francesco and Fariselli, Marco and Flamand, Eric and Tuytelaars, Tinne , journal=. Self-Learning for Personalized Keyword Spotting on Ultralow-Power Audio Sensors , year=

work page
[21]

Proceedings of The 1st Conference on Lifelong Learning Agents , pages =

Online Continual Learning for Embedded Devices , author =. Proceedings of The 1st Conference on Lifelong Learning Agents , pages =. 2022 , editor =

work page 2022
[22]

Self-Incremental Training for Personalized Voice Command Recognition in a Wireless Audio Sensor Network , year=

Rusci, Manuele and Van Hamme, Hugo and Tuytelaars, Tinne , booktitle=. Self-Incremental Training for Personalized Voice Command Recognition in a Wireless Audio Sensor Network , year=

work page
[23]

When Meta-Learning Meets Online and Continual Learning: A Survey , year=

Son, Jaehyeon and Lee, Soochan and Kim, Gunhee , journal=. When Meta-Learning Meets Online and Continual Learning: A Survey , year=

work page
[24]

Learning to C ontinually L earn with the B ayesian P rinciple

Lee, Soochan and Jeon, Hyeonseong and Son, Jaehyeon and Kim, Gunhee. Learning to C ontinually L earn with the B ayesian P rinciple. International Conference on Machine Learning

work page
[25]

Meta-Learning in Neural Networks: A Survey , year=

Hospedales, Timothy and Antoniou, Antreas and Micaelli, Paul and Storkey, Amos , journal=. Meta-Learning in Neural Networks: A Survey , year=

work page
[26]

Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem

McCloskey, Michael and Cohen, Neal J. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. Psychology of Learning and Motivation. 1989 , doi =

work page 1989
[27]

International conference on machine learning , pages=

Model-agnostic meta-learning for fast adaptation of deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017

[1] [1]

Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

Banayeeanzade, Mohammadamin and Mirzaiezadeh, Rasoul and Hasani, Hosein and Baghshah, Mahdieh Soleymani , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

work page 2021

[2] [2]

Three types of incremental learning , volume =

van de Ven, Gido and Tuytelaars, Tinne and Tolias, Andreas , year =. Three types of incremental learning , volume =

work page

[3] [3]

Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , year=

Multilingual Spoken Words Corpus , author=. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , year=

work page

[4] [4]

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , year=

Hsu, Wei-Ning and Bolte, Benjamin and Tsai, Yao-Hung Hubert and Lakhotia, Kushal and Salakhutdinov, Ruslan and Mohamed, Abdelrahman , journal=. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , year=

work page

[5] [5]

Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

Baevski, Alexei and Zhou, Henry and Mohamed, Abdelrahman and Auli, Michael , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =

work page 2020

[6] [6]

Librispeech: An ASR corpus based on public domain audio books , year=

Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev , booktitle=. Librispeech: An ASR corpus based on public domain audio books , year=

work page

[7] [7]

Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =

Snell, Jake and Swersky, Kevin and Zemel, Richard , title =. Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =. 2017 , isbn =

work page 2017

[8] [8]

NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications , year=

A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning , author=. NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications , year=

work page 2022

[9] [9]

International Conference on Learning Representations , year=

Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=

work page

[10] [10]

Efficient Continual Learning in Keyword Spotting using Binary Neural Networks , year=

Vu, Quynh Nguyen-Phuong and Martinez-Rau, Luciano Sebastian and Zhang, Yuxuan and Tran, Nho-Duc and Oelmann, Bengt and Magno, Michele and Bader, Sebastian , booktitle=. Efficient Continual Learning in Keyword Spotting using Binary Neural Networks , year=

work page

[11] [11]

2025 , month =

Luthra, Mahi and Shen, Jiayi and Poli, Maxime and Ortiz, Angelo and Higuchi, Yosuke and Benchekroun, Youssef and Gleize, Martin and Saint-James, Charles-Eric and Lin, Dongyan and Rust, Phillip and Villar, Angel and Parimi, Surya and Stark, Vanessa and Moritz, Rashel and Pino, Juan and LeCun, Yann and Dupoux, Emmanuel , journal=. 2025 , month =

work page 2025

[12] [12]

Yangbin Chen and Tom Ko and Jianping Wang , year =

work page

[13] [13]

2023 , booktitle =

Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification Through Meta-Learning , author =. 2023 , booktitle =

work page 2023

[14] [14]

Proceedings of Interspeech 2020 , pages =

Chen, Yangbin and Ko, Tom and Shang, Lifeng and Chen, Xiao and Jiang, Xin and Li, Qing , title =. Proceedings of Interspeech 2020 , pages =. 2020 , month =

work page 2020

[15] [15]

Manuele Rusci and Tinne Tuytelaars , year =

work page

[16] [16]

Junming Yuan and Ying Shi and LanTian Li and Dong Wang and Askar Hamdulla , year =

work page

[17] [17]

Proceedings of the 2022 7th International Conference on Machine Learning Technologies , pages =

Parnami, Archit and Lee, Minwoo , title =. Proceedings of the 2022 7th International Conference on Machine Learning Technologies , pages =. 2022 , isbn =

work page 2022

[18] [18]

On the Efficiency of Integrating Self-Supervised Learning and Meta-Learning for User-Defined Few-Shot Keyword Spotting , year=

Kao, Wei-Tsung and Wu, Yuan-Kuei and Chen, Chia-Ping and Chen, Zhi-Sheng and Tsai, Yu-Pao and Lee, Hung-Yi , booktitle=. On the Efficiency of Integrating Self-Supervised Learning and Meta-Learning for User-Defined Few-Shot Keyword Spotting , year=

work page

[19] [19]

Ashish Mittal and Samarth Bharadwaj and Shreya Khare and Saneem Chemmengath and Karthik Sankaranarayanan and Brian Kingsbury , year =

work page

[20] [20]

Self-Learning for Personalized Keyword Spotting on Ultralow-Power Audio Sensors , year=

Rusci, Manuele and Paci, Francesco and Fariselli, Marco and Flamand, Eric and Tuytelaars, Tinne , journal=. Self-Learning for Personalized Keyword Spotting on Ultralow-Power Audio Sensors , year=

work page

[21] [21]

Proceedings of The 1st Conference on Lifelong Learning Agents , pages =

Online Continual Learning for Embedded Devices , author =. Proceedings of The 1st Conference on Lifelong Learning Agents , pages =. 2022 , editor =

work page 2022

[22] [22]

Self-Incremental Training for Personalized Voice Command Recognition in a Wireless Audio Sensor Network , year=

Rusci, Manuele and Van Hamme, Hugo and Tuytelaars, Tinne , booktitle=. Self-Incremental Training for Personalized Voice Command Recognition in a Wireless Audio Sensor Network , year=

work page

[23] [23]

When Meta-Learning Meets Online and Continual Learning: A Survey , year=

Son, Jaehyeon and Lee, Soochan and Kim, Gunhee , journal=. When Meta-Learning Meets Online and Continual Learning: A Survey , year=

work page

[24] [24]

Learning to C ontinually L earn with the B ayesian P rinciple

Lee, Soochan and Jeon, Hyeonseong and Son, Jaehyeon and Kim, Gunhee. Learning to C ontinually L earn with the B ayesian P rinciple. International Conference on Machine Learning

work page

[25] [25]

Meta-Learning in Neural Networks: A Survey , year=

Hospedales, Timothy and Antoniou, Antreas and Micaelli, Paul and Storkey, Amos , journal=. Meta-Learning in Neural Networks: A Survey , year=

work page

[26] [26]

Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem

McCloskey, Michael and Cohen, Neal J. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. Psychology of Learning and Motivation. 1989 , doi =

work page 1989

[27] [27]

International conference on machine learning , pages=

Model-agnostic meta-learning for fast adaptation of deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017