A trans- former takes a sequence of vector-embedded states as input and produces a probability distribution over the next state as output [1]

Network architecture Here, we describe our primary architecture, which is the two-layer transformer illustrated in Figure 2a

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Distinct mechanisms underlying in-context learning in transformers

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Transformers develop four algorithmic phases of in-context learning on Markov chains via two distinct multi-layer subcircuit mechanisms, with phase boundaries set by data diversity K.

citing papers explorer

Showing 1 of 1 citing paper.

Distinct mechanisms underlying in-context learning in transformers cs.LG · 2026-04-14 · unverdicted · none · ref 3
Transformers develop four algorithmic phases of in-context learning on Markov chains via two distinct multi-layer subcircuit mechanisms, with phase boundaries set by data diversity K.

A trans- former takes a sequence of vector-embedded states as input and produces a probability distribution over the next state as output [1]

fields

years

verdicts

representative citing papers

citing papers explorer