Language model features form an early stable carrier scaffold of about 50 sparse features that is load-bearing, predictable from onset firing, and recruits most later features.
Language models can explain neurons in language models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
q-bio.NC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Features have life history. And we should care
Language model features form an early stable carrier scaffold of about 50 sparse features that is load-bearing, predictable from onset firing, and recruits most later features.