Catastrophic interference in connectionist networks: The sequential learning problem

Michael McCloskey, Neal J Cohen · 1989

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Stop Marginalizing My Dreams: Model Inversion via Laplace Kernel for Continual Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

REMIX uses Laplace kernel parameterization to enable scalable full-covariance modeling in model inversion, improving synthetic sample quality and performance in data-free continual learning.

Characterizing and Correcting Effective Target Shift in Online Learning

stat.ML · 2026-05-08 · unverdicted · novelty 7.0

Online kernel regression equals offline regression with shifted targets; correcting the targets lets online learning match offline performance and outperform true targets in continual image classification.

Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns

cs.LG · 2026-02-25 · unverdicted · novelty 7.0

TRC² is a brain-inspired decoder-only architecture that localizes fast plasticity and uses thalamic and hippocampal pathways to substantially reduce cumulative forgetting in sequential language model training on streams like C4, WikiText-103, and GSM8K.

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

cs.CL · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

UniSD unifies self-distillation components for autoregressive LLMs and its full integrated version improves base models by 5.4 points and baselines by 2.8 points across six benchmarks.

Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

A state distribution view of post-training shows that on-policy supervision from the learner itself can outperform fixed-dataset SFT and preserve retention better than aggressive supervised updates.

Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.

Attribution-Guided Continual Learning for Large Language Models

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

An attribution-based continual learning framework for LLMs modulates per-parameter gradients using task-specific importance scores to reduce forgetting of prior tasks.

CogniFold: Always-On Proactive Memory via Cognitive Folding

cs.AI · 2026-05-13

citing papers explorer

Showing 8 of 8 citing papers.

Stop Marginalizing My Dreams: Model Inversion via Laplace Kernel for Continual Learning cs.LG · 2026-05-12 · unverdicted · none · ref 1
REMIX uses Laplace kernel parameterization to enable scalable full-covariance modeling in model inversion, improving synthetic sample quality and performance in data-free continual learning.
Characterizing and Correcting Effective Target Shift in Online Learning stat.ML · 2026-05-08 · unverdicted · none · ref 6
Online kernel regression equals offline regression with shifted targets; correcting the targets lets online learning match offline performance and outperform true targets in continual image classification.
Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns cs.LG · 2026-02-25 · unverdicted · none · ref 20
TRC² is a brain-inspired decoder-only architecture that localizes fast plasticity and uses thalamic and hippocampal pathways to substantially reduce cumulative forgetting in sequential language model training on streams like C4, WikiText-103, and GSM8K.
UniSD: Towards a Unified Self-Distillation Framework for Large Language Models cs.CL · 2026-05-07 · unverdicted · none · ref 36 · 2 links
UniSD unifies self-distillation components for autoregressive LLMs and its full integrated version improves base models by 5.4 points and baselines by 2.8 points across six benchmarks.
Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation cs.LG · 2026-05-21 · unverdicted · none · ref 17
A state distribution view of post-training shows that on-policy supervision from the learner itself can outperform fixed-dataset SFT and preserve retention better than aggressive supervised updates.
Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates cs.LG · 2026-05-19 · unverdicted · none · ref 44
FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.
Attribution-Guided Continual Learning for Large Language Models cs.LG · 2026-05-06 · unverdicted · none · ref 17
An attribution-based continual learning framework for LLMs modulates per-parameter gradients using task-specific importance scores to reduce forgetting of prior tasks.
CogniFold: Always-On Proactive Memory via Cognitive Folding cs.AI · 2026-05-13 · unreviewed · ref 37

Catastrophic interference in connectionist networks: The sequential learning problem

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer