Infinite-width transformers exhibit an inductive bias against high-complexity polynomial-time algorithms, with derived upper bounds on capturable tasks like sorting and string matching.
cc/paper_files/paper/2022/file/ 884baf65392170763b27c914087bde01-Paper-Conference
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5representative citing papers
In the NTK regime, new-task training induces old-task prediction drift through the cross-task kernel, yielding an exact closed-form forgetting predictor under frozen linear heads and a low-rank concentration result.
Proposes Architecture-driven Shift (ADS) as an architecture-based proxy for logit shift in continual learning, derived from spectral norm scaling, optimization path length and task conflict, with monotonic correlation rs >= 0.731 across 175 architectures and utility as expected calibration error pro
KAN-CL cuts catastrophic forgetting by 88-93% on Split-CIFAR-10/5T and Split-CIFAR-100/10T by anchoring KAN parameters at per-knot granularity while matching baseline accuracy.
Online kernel regression equals offline regression with shifted targets; correcting the targets lets online learning match offline performance and outperform true targets in continual image classification.
citing papers explorer
-
Algorithmic Task Capture, Computational Complexity, and Inductive Bias of Infinite Transformers
Infinite-width transformers exhibit an inductive bias against high-complexity polynomial-time algorithms, with derived upper bounds on capturable tasks like sorting and string matching.
-
Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual Adaptation
In the NTK regime, new-task training induces old-task prediction drift through the cross-task kernel, yielding an exact closed-form forgetting predictor under frozen linear heads and a low-rank concentration result.
-
Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift
Proposes Architecture-driven Shift (ADS) as an architecture-based proxy for logit shift in continual learning, derived from spectral norm scaling, optimization path length and task conflict, with monotonic correlation rs >= 0.731 across 175 architectures and utility as expected calibration error pro
-
Characterizing and Correcting Effective Target Shift in Online Learning
Online kernel regression equals offline regression with shifted targets; correcting the targets lets online learning match offline performance and outperform true targets in continual image classification.