pith. sign in
Pith Number

pith:PFZHD3TF

pith:2022:PFZHD3TFIWOEKN4ZU3CDZP3APP
not attested not anchored not stored refs resolved

What learning algorithm is in-context learning? Investigations with linear models

Dale Schuurmans, Denny Zhou, Ekin Aky\"urek, Jacob Andreas, Tengyu Ma

Transformers implement gradient descent and ridge regression implicitly when doing in-context learning on linear tasks.

arxiv:2211.15661 v3 · 2022-11-28 · cs.LG · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PFZHD3TFIWOEKN4ZU3CDZP3APP}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths.

C2weakest assumption

That results on linear regression as a prototypical problem will extend to the more complex, non-linear tasks typical of real in-context learning in language models.

C3one line summary

Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.

References

31 extracted · 31 resolved · 6 Pith anchors

[1] Understanding intermediate layers using linear classifier probes 2016 · arXiv:1610.01644
[2] Hoffman, David Pfau, Tom Schaul, and Nando de Freitas 2016
[3] Layer Normalization 2016 · arXiv:1607.06450
[4] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert - Voss, Gretc 2020
[5] Thread: circuits 2020

Formal links

1 machine-checked theorem link

Cited by

17 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.930507Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

797271ee65459c453799a6c43cbf607be37dc8edbd11f2091c68b86538df2cc0

Aliases

arxiv: 2211.15661 · arxiv_version: 2211.15661v3 · doi: 10.48550/arxiv.2211.15661 · pith_short_12: PFZHD3TFIWOE · pith_short_16: PFZHD3TFIWOEKN4Z · pith_short_8: PFZHD3TF
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PFZHD3TFIWOEKN4ZU3CDZP3APP \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 797271ee65459c453799a6c43cbf607be37dc8edbd11f2091c68b86538df2cc0
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "e8bab37bc0e56fc58a49bf89d5cf2bbb25839cce6a80f49b692916efa402136b",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2022-11-28T18:59:51Z",
    "title_canon_sha256": "6caaadf80372d564f29b370e6c37cced5e9e052c272d7b54f59c819c6937e6d2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2211.15661",
    "kind": "arxiv",
    "version": 3
  }
}