Theoretical analysis of continual factual knowledge acquisition shows data replay stabilizes pretrained knowledge by shifting convergence dynamics while regularization only slows forgetting, leading to the STOC method for attention-based replay selection.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.
FTN achieves near-zero forgetting on continual learning benchmarks by isolating task subnetworks via self-organizing binary masks generated through gradient descent, smoothing, and k-winner-take-all.
CAMEL is a scaling law capturing nonlinear model-size and mixture interactions to extrapolate optimal data mixtures for large LLMs from small-model experiments, reducing optimization cost by 50% and improving benchmarks by up to 3%.
Domain-adapted LLMs and SLMs do not consistently outperform general models on STRIDE threat classification for 5G, with decoding strategies and model scale affecting validity but gains remaining insufficient for reliable use.
citing papers explorer
-
Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights
Domain-adapted LLMs and SLMs do not consistently outperform general models on STRIDE threat classification for 5G, with decoding strategies and model scale affecting validity but gains remaining insufficient for reliable use.