CoT transformers simulate any Word RAM algorithm with poly-logarithmic overhead in three architectures, improving on quadratic TM overhead.
super hub Canonical reference
In-context Learning and Induction Heads
Canonical reference. 80% of citing Pith papers cite this work as background.
abstract
"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract "Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguin
authors
co-cited works
representative citing papers
Looped linear transformers with LN provably converge via GD to implement the power method on principal component prediction.
Presents a solver-verifiable framework for Transformer circuits, with exhaustive checks on small symbolic tasks and surrogate methods for larger models.
WriteSAE introduces sparse autoencoders with rank-1 matrix atoms for recurrent state updates, allowing replacement tests that outperform deletion on 92.4% of positions and a formula predicting logit changes with R²=0.98.
LLM activations encode current and prior entities in orthogonal slots, but models only use the current slot for explicit factual retrieval despite prior-slot information being linearly decodable.
Transformer weight spectra exhibit transient compression waves that propagate layer-wise, persistent non-monotonic depth gradients in power-law exponents, and Q/K-V asymmetry, with the spectral exponent alpha predicting layer importance and enabling pruning gains of 1.1x-3.6x over Last-N baselines.
The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.
KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.
Path patching provides a method to express and quantitatively test hypotheses that neural network behaviors are localized to sets of paths.
GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
RECONTEXT is a recursive evidence replay technique that improves long-context reasoning in LLMs by constructing and replaying a query-conditioned evidence pool before final generation.
A 0.6B LM with length-aware attention adjustments performs competitive in-context retrieval at million-token scale on MS MARCO, NQ, and LIMIT benchmarks.
LOCOS scores attention heads via OV-circuit output projection onto answer-token unembedding directions and identifies non-literal retrieval heads whose ablation collapses performance on non-literal benchmarks more than prior literal-copy detectors.
SemRF supplies fixed semantic anchors and pseudo-inverse tying to produce stable coordinates for residual dynamics, Voronoi traces, and minimum-action canonical paths that link to parameter efficiency under controlled interface error.
ECHO is a clipped policy-gradient method that uses posterior-sensitive rewards to give turn-level epistemic credit in multi-turn information-seeking tasks, outperforming trajectory-level GRPO on a new Clue Selector Game benchmark.
SMDA fits ridge regression on SAE features to distill symbolic policies then decomposes each SFT example's influence via feature-activation and output-probability deltas, demonstrated on refusal behavior in Llama-3.2-3B-Instruct.
During pretraining, language models exhibit natural ungrokking where learned rules are forgotten based on their support frequency in the corpus, with asymmetric editability of rule survival.
DREAM enables training of dense retrieval embeddings using autoregressive next-token prediction from LLMs by modulating attention with retriever scores.
HeRA aligns least-aligned attention heads in MLLMs using an MKNN-based contrastive objective to preserve cross-modal topological structure, yielding gains on vision-centric tasks and reduced hallucinations across 18 benchmarks.
Relinking is a new compression-boundary attack on LLM agents where summarization of split benign fragments produces malicious instructions, shown via Relink tool at 86.9% success rate and mitigated by KBRA defense to 0%.
Reroute turns irreversible visual-token pruning into recoverable routing that reuses existing attention scores, improving grounding performance under aggressive reduction on LLaVA-1.5 and Qwen while preserving TFLOPs and KV-cache budgets.
Bayesian reduction of attention posterior on copy task predicts first-order phase transition for softmax attention and second-order followed by crossover for linear attention.
STRIDE formulates TDA as sparse recovery using steering operators that mimic subset training effects in activation space, claiming SOTA LLM pre-training attribution at 13x prior speed.
CoT probe-time gains arise primarily from lexical activation and short-range token co-occurrence rather than sentence-level logical derivation.
citing papers explorer
-
Large Vision-Language Models Get Lost in Attention
In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.
-
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
In LLM agents, memory routing circuits emerge at 0.6B scale while content circuits appear only at 4B, and write/read operations recruit a pre-existing late-layer context hub instead of creating a new one, enabling a 76% accurate unsupervised failure diagnostic.
-
Weight Patching: Toward Source-Level Mechanistic Localization in LLMs
Weight Patching localizes capabilities to specific parameter modules in LLMs by replacing weights from a behavior-specialized model into a base model and validating recovery via a vector-anchor interface, revealing a hierarchy of source, routing, and execution components.
-
Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents
Persistent memory is necessary and sufficient for LLM poker agents to reach ToM levels 3-5 and use strategic deception, while agents without memory stay at level 0.
-
HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory
HyperLens reveals that deeper transformer layers magnify small confidence changes into fine-grained trajectories, allowing quantification of cognitive effort where complex tasks demand more and standard SFT can reduce it.
- Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models