pith. sign in

super hub Canonical reference

In-context Learning and Induction Heads

Canonical reference. 80% of citing Pith papers cite this work as background.

145 Pith papers citing it
Background 80% of classified citations
abstract

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.

hub tools

citation-role summary

background 18 dataset 1 other 1

citation-polarity summary

claims ledger

  • abstract "Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguin

authors

co-cited works

clear filters

representative citing papers

WriteSAE: Sparse Autoencoders for Recurrent State

cs.LG · 2026-05-12 · unverdicted · novelty 8.0 · 3 refs

WriteSAE introduces sparse autoencoders with rank-1 matrix atoms for recurrent state updates, allowing replacement tests that outperform deletion on 92.4% of positions and a formula predicting logit changes with R²=0.98.

Slot Machines: How LLMs Keep Track of Multiple Entities

cs.CL · 2026-04-22 · unverdicted · novelty 8.0

LLM activations encode current and prior entities in orthogonal slots, but models only use the current slot for explicit factual retrieval despite prior-slot information being linearly decodable.

KAN: Kolmogorov-Arnold Networks

cs.LG · 2024-04-30 · conditional · novelty 8.0

KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.

Localizing Model Behavior with Path Patching

cs.LG · 2023-04-12 · unverdicted · novelty 8.0

Path patching provides a method to express and quantitatively test hypotheses that neural network behaviors are localized to sets of paths.

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

cs.CL · 2026-07-01 · unverdicted · novelty 7.0

LOCOS scores attention heads via OV-circuit output projection onto answer-token unembedding directions and identifies non-literal retrieval heads whose ablation collapses performance on non-literal benchmarks more than prior literal-copy detectors.

citing papers explorer

Showing 3 of 3 citing papers after filters.

  • Jamba: A Hybrid Transformer-Mamba Language Model cs.CL · 2024-03-28 · conditional · none · ref 35 · internal anchor

    Jamba presents a hybrid Transformer-Mamba MoE architecture for LLMs that delivers state-of-the-art benchmark performance and strong results up to 256K token contexts while fitting in one 80GB GPU with high throughput.

  • SnapKV: LLM Knows What You are Looking for Before Generation cs.CL · 2024-04-22 · conditional · none · ref 15 · internal anchor

    SnapKV selects clustered important KV positions per attention head from an observation window at the prompt end, yielding 3.6x faster generation and 8.2x better memory efficiency on 16K-token inputs with comparable performance across 16 datasets.

  • Language models fail at extended rule following cs.CL · 2026-05-03 · unverdicted · none · ref 16 · internal anchor

    LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.