Training dynamics of in-context learning in linear attention

barticle [author] Zhang , Yedi Y · 2025 · arXiv 2501.16265

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

Transformers Can Implement Preconditioned Richardson Iteration for In-Context Gaussian Kernel Regression

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

A single-head softmax transformer with O(log(1/ε)) blocks and O(√(N/ε)) MLP width implements preconditioned Richardson iteration to achieve ε-accurate Gaussian KRR predictions on length-N prompts under bounded data.

An Asymptotic Theory of Chain-of-Thought in In-Context Learning

stat.ML · 2026-06-02 · unverdicted · novelty 6.0

Exact RMT-derived formula for CoT generalization error in linear ICL reveals phase transition between exponential/polynomial improvement, saturation, and overthinking regimes depending on depth, pretraining, and context length.

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Large-step GD in deep linear multi-pathway networks drives re-balancing of signals across pathways via edge-of-stability oscillations after early depth-driven symmetry breaking.

Learning to Adapt: In-Context Learning Beyond Stationarity

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.

Towards personalised intervention: A causal-dynamical framework to determine psychological treatment trajectories

nlin.AO · 2026-06-08 · unverdicted · novelty 5.0

Proposes a causal-dynamical framework that constructs causal graphs from longitudinal patient data, simulates intervention effects, and selects personalized treatment focuses for mental health care.

Understanding LoRA as Knowledge Memory: An Empirical Analysis

cs.LG · 2026-03-01

citing papers explorer

Showing 1 of 1 citing paper after filters.

An Asymptotic Theory of Chain-of-Thought in In-Context Learning stat.ML · 2026-06-02 · unverdicted · none · ref 27
Exact RMT-derived formula for CoT generalization error in linear ICL reveals phase transition between exponential/polynomial improvement, saturation, and overthinking regimes depending on depth, pretraining, and context length.

Training dynamics of in-context learning in linear attention

fields

years

verdicts

representative citing papers

citing papers explorer