REMIX uses Laplace kernel parameterization to enable scalable full-covariance modeling in model inversion, improving synthetic sample quality and performance in data-free continual learning.
Catastrophic interference in connectionist networks: The sequential learning problem
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8roles
background 2polarities
background 2representative citing papers
Online kernel regression equals offline regression with shifted targets; correcting the targets lets online learning match offline performance and outperform true targets in continual image classification.
TRC² is a brain-inspired decoder-only architecture that localizes fast plasticity and uses thalamic and hippocampal pathways to substantially reduce cumulative forgetting in sequential language model training on streams like C4, WikiText-103, and GSM8K.
UniSD unifies self-distillation components for autoregressive LLMs and its full integrated version improves base models by 5.4 points and baselines by 2.8 points across six benchmarks.
A state distribution view of post-training shows that on-policy supervision from the learner itself can outperform fixed-dataset SFT and preserve retention better than aggressive supervised updates.
FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.
An attribution-based continual learning framework for LLMs modulates per-parameter gradients using task-specific importance scores to reduce forgetting of prior tasks.
citing papers explorer
-
Stop Marginalizing My Dreams: Model Inversion via Laplace Kernel for Continual Learning
REMIX uses Laplace kernel parameterization to enable scalable full-covariance modeling in model inversion, improving synthetic sample quality and performance in data-free continual learning.
-
Characterizing and Correcting Effective Target Shift in Online Learning
Online kernel regression equals offline regression with shifted targets; correcting the targets lets online learning match offline performance and outperform true targets in continual image classification.
-
Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns
TRC² is a brain-inspired decoder-only architecture that localizes fast plasticity and uses thalamic and hippocampal pathways to substantially reduce cumulative forgetting in sequential language model training on streams like C4, WikiText-103, and GSM8K.
-
UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
UniSD unifies self-distillation components for autoregressive LLMs and its full integrated version improves base models by 5.4 points and baselines by 2.8 points across six benchmarks.
-
Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation
A state distribution view of post-training shows that on-policy supervision from the learner itself can outperform fixed-dataset SFT and preserve retention better than aggressive supervised updates.
-
Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates
FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.
-
Attribution-Guided Continual Learning for Large Language Models
An attribution-based continual learning framework for LLMs modulates per-parameter gradients using task-specific importance scores to reduce forgetting of prior tasks.
- CogniFold: Always-On Proactive Memory via Cognitive Folding