pith. sign in

arxiv: 2606.03990 · v1 · pith:FJX4TI5Xnew · submitted 2026-06-02 · 💻 cs.LG · cs.CL· cs.CV

Neuron Populations Exhibit Divergent Selectivity with Scale

Pith reviewed 2026-06-28 10:47 UTC · model grok-4.3

classification 💻 cs.LG cs.CLcs.CV
keywords Rosetta Neuronsscaling lawsneuron selectivitymonosemanticityneural network interpretabilitymodel scalelanguage modelsvision models
0
0 comments X

The pith

Rosetta Neurons increase in absolute number but shrink as a fraction of total neurons while growing more selective and monosemantic with model scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks a class of neurons whose activation patterns match across separately trained models and measures how their population changes as model size grows from small to 30B parameters in language models and 5B in vision models. Their count rises according to a sublinear power law, so the shared neurons become rarer relative to the expanding total neuron population. At the same time these neurons polarize: they activate on narrower sets of inputs and align more closely with single semantic features. An analytical model that trades off the usefulness of representing a feature against the cost of dedicating a neuron to it accounts for both the sublinear growth and the polarization. The same neurons also concentrate on narrower domains as scale increases, which the authors demonstrate can be used to filter data for continued pretraining.

Core claim

The population of Rosetta Neurons follows a sublinear power law in model size, growing in absolute number but occupying a shrinking fraction of the total neuron count; these neurons simultaneously become more selective and increasingly monosemantic, separating from a growing non-Rosetta population that remains less selective, with the pattern explained by an analytical model balancing feature utility against limited neuron capacity.

What carries the argument

Rosetta Neurons, neurons whose activation patterns are similar across independently trained models, tracked for population scaling and selectivity changes across model sizes.

If this is right

  • The absolute number of Rosetta Neurons continues to rise even while their relative share falls.
  • Selectivity and monosemanticity of Rosetta Neurons increase steadily with scale.
  • Rosetta Neurons concentrate on narrower domains as models enlarge.
  • The polarization separates Rosetta Neurons from a less selective non-Rosetta population.
  • Rosetta Neurons can be used to filter training data for continued pretraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the sublinear scaling holds, very large models will contain a small but stable core of highly shared, interpretable neurons amid mostly model-specific ones.
  • The polarization effect suggests that interpretability tools focused on shared neurons may become relatively more powerful at larger scales.
  • The analytical capacity-utility tradeoff could be tested by varying the total neuron budget while holding feature set fixed in controlled synthetic settings.

Load-bearing premise

The analytical model balancing feature utility against limited neuron capacity is derived independently of the observed data and is not fitted post-hoc to reproduce the sublinear exponent or polarization pattern.

What would settle it

Measure the Rosetta Neuron population fraction and average selectivity in a new family of models at least ten times larger than 30B parameters; if the fraction stops shrinking or selectivity stops increasing, the claimed scaling and polarization laws are falsified.

Figures

Figures reproduced from arXiv: 2606.03990 by Alexei A. Efros, Amil Dravid, Yasaman Bahri, Yossi Gandelsman.

Figure 1
Figure 1. Figure 1: Neuron populations across scale. To study how neuron populations scale, we use Rosetta Neurons: units that recur across different models. (A) Features compete for representation in a finite set of neurons, leaving them isolated, mixed, or unrepresented at a given scale. This picture guides our analysis of universality, selectivity, and specialization. In panels B–D, each column shows top-activating context… view at source ↗
Figure 2
Figure 2. Figure 2: Identifying Rosetta Neurons. We compare MLP neuron activations across independently trained models on the same inputs and identify mutual nearest-neighbor pairs under Pearson correla￾tion. The language and vision examples show individual matched neuron pairs firing on the same high-activating inputs, revealing similar activation patterns and coherent shared concepts. 3.2 Quantifying Pairwise Neuron Similar… view at source ↗
Figure 3
Figure 3. Figure 3: Scaling laws for Rosetta Neurons in language and vision models. We plot the number of discovered Rosetta Neurons for various model families at different scales. Dashed lines show power-law fits in log-log space. Across all family comparisons, the fitted exponents are sublinear, and the corresponding fits achieve R2 values around 0.99. Further details are provided in Section C. et al., 2020). For vision mod… view at source ↗
Figure 4
Figure 4. Figure 4: Rosetta Neuron counts in un￾trained networks lack systematic scaling. Power-law scaling is absent in untrained networks. To test whether our previously observed scaling laws could be induced by the matching procedure itself, we apply the same pipeline to untrained networks initialized according to their architecture-specific ran￾dom initialization schemes. We report the results across three random seeds in… view at source ↗
Figure 5
Figure 5. Figure 5: Feature-isolation frontiers. Features are ordered by decreasing importance wr ∝ r −β . The optimal allocation partitions the spectrum into Rosetta-detectable features with sr ≥ τ , partially isolated features with 0 < sr < τ , strongly superposed features with sr = 0, and features beyond the represented set A(N). The frontiers rτ (N) and r0(N) scale as Θ(N1/β), yielding the sublinear Rosetta Neuron count R… view at source ↗
Figure 6
Figure 6. Figure 6: The Neuron Polarization Effect in language and vision models. (a) In language models, Rosetta Neurons show increasing mean excess kurtosis of vocabulary-space projections with scale. Non-Rosetta neurons remain near zero, indicating weak selectivity. (b) In vision models, VLM-judged monosemanticity increases with scale for Rosetta Neurons and decreases for non-Rosetta neurons. Prediction: Neuron Polarizatio… view at source ↗
Figure 7
Figure 7. Figure 7: Rosetta Neuron document-type firing in Pythia. For each Pythia model size, we plot how often top-activating Rosetta Neuron contexts fall into a document category, normalized by that category’s token frequency in the validation set. The dashed line marks the corpus baseline. With scale, Rosetta Neuron firing shifts toward specialized categories such as code and math. Shaham et al., 2024). At each scale, we … view at source ↗
Figure 8
Figure 8. Figure 8: LLM annotations for Rosetta Neurons. Comparison between Pythia-160M, GPT2-124M, OPT-125M. (a) Pythia-410M: L10/U3047. GPT2-355M: L19/U1698. OPT-350M: L12/U927. (b) Pythia-410M: L9/U1693. GPT2-355M: L15/U1724. OPT-350M: L13/U846. (c) Pythia-410M: L23/U39. GPT2-355M: L23/U149. OPT-350M: L23/U2921. (d) Pythia-410M: L23/U3863. GPT2-355M: L23/U1026. OPT-350M: L22/U1841 [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: LLM annotations for Rosetta Neurons. Comparison between Pythia-410M, GPT2-355M, OPT-350M. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: LLM annotations for Rosetta Neurons. Comparison between Pythia-1.4B, GPT2-1.5B, OPT-1.3B. (a) Pythia-2.8B: L20/U9199. Qwen2.5-3B: L31/U5812. OPT-2.7B: L25/U7525. (b) Pythia-2.8B: L22/U6182. Qwen2.5-3B: L31/U1823. OPT-2.7B: L27/U3916. (c) Pythia-2.8B: L12/U5719. Qwen2.5-3B: L26/U2175. OPT-2.7B: L21/U7710. (d) Pythia-2.8B: L28/U10099. Qwen2.5-3B: L31/U6319. OPT-2.7B: L31/U2355 [PITH_FULL_IMAGE:figures/full… view at source ↗
Figure 11
Figure 11. Figure 11: LLM annotations for Rosetta Neurons. Comparison between Pythia-2.8B, Qwen2.5-3B, OPT-2.7B. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: LLM annotations for Rosetta Neurons. Comparison between Pythia-6.9B, Qwen2.5-7B, OPT-6.7B. (a) Pythia-12B: L1/U17927. Qwen2.5-14B: L0/U3786. OPT-13B: L0/U4028. (b) Pythia-12B: L1/U19682. Qwen2.5-14B: L3/U1521. OPT-13B: L0/U2590. (c) Pythia-12B: L24/U10513. Qwen2.5-14B: L33/U11681. OPT-13B: L29/U14. (d) Pythia-12B: L1/U4031. Qwen2.5-14B: L0/U6188. OPT-13B: L3/U11416 [PITH_FULL_IMAGE:figures/full_fig_p019_… view at source ↗
Figure 13
Figure 13. Figure 13: LLM annotations for Rosetta Neurons. Comparison between Pythia-12B, Qwen2.5-14B, OPT-13B. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Top-activating images for Rosetta Neurons. Comparison between DiT-B/16 and OpenCLIP ViT-B/16. (a) DiT-L/16: L0/U1228. OpenCLIP ViT￾L/14: L1/U1669. (b) DiT-L/16: L31/U15. OpenCLIP ViT￾L/14: L0/U1207. (c) DiT-L/16: L31/U1752. OpenCLIP ViT￾L/14: L0/U3894. (d) DiT-L/16: L31/U816. OpenCLIP ViT￾L/14: L0/U2394 [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Top-activating images for Rosetta Neurons. Comparison between DiT-L/16 and OpenCLIP ViT-L/14. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Top-activating images for Rosetta Neurons. Comparison between DiT-H/16 and OpenCLIP ViT-H/14. (a) DiT-1.6B: L44/U2447. OpenCLIP ViT￾bigG/14: L29/U2762. (b) DiT-1.6B: L12/U2668. OpenCLIP ViT￾bigG/14: L20/U203. (c) DiT-1.6B: L6/U3382. OpenCLIP ViT￾bigG/14: L39/U7045. (d) DiT-1.6B: L9/U2940. OpenCLIP ViT￾bigG/14: L28/U2265 [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Top-activating images for Rosetta Neurons. Comparison between DiT-1.6B and OpenCLIP ViT-bigG/14. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Top-activating images for Rosetta Neurons. Comparison between DiT-4B and Open￾CLIP ViT-4.4B. A.2 Qualitative Comparison of Rosetta and Non-Rosetta Neurons We compare Rosetta and non-Rosetta neurons by visualizing top-activating examples. In language, we use randomly selected Pythia-6.9B Rosetta Neurons and non-Rosetta neurons from the same layers (Figures 19 and 20); in vision, we conduct a similar compar… view at source ↗
Figure 19
Figure 19. Figure 19: Top-5 activating sequences for Pythia-6.9B Rosetta Neurons. Rosetta Neurons demonstrate selective firing for coherent concepts. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Top-5 activating sequences for Pythia-6.9B non-Rosetta neurons. Neurons are randomly selected from the same layers as those in [PITH_FULL_IMAGE:figures/full_fig_p023_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Top-5 activating images for OpenCLIP ViT-L/14 Rosetta Neurons. Rosetta Neurons demonstrate selective firing for coherent concepts. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Top-5 activating images for OpenCLIP ViT-L/14 non-Rosetta neurons. Neurons are randomly selected from the same layers as those in [PITH_FULL_IMAGE:figures/full_fig_p024_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Aligning text tokens via shared byte boundaries. We align the tokens from Model A and Model B by keeping only the byte boundaries that both tokenizers share (red dashed lines). By finding the tokens that live in these new Aligned Spans, we can pool the activations on these tokens, creating a shared set of positions that we can then use to compare neuron responses. Patch Activations Aligned Activations Mod… view at source ↗
Figure 24
Figure 24. Figure 24: Aligning spatial grids for neuron comparison. We align the different patch grids from Model A and Model B by choosing a single target resolution (the canonical grid defined by Model B in this case). Any mismatched native grids are resampled using bilinear interpolation to fit this shared grid. This results in aligned activation maps that can be used for measuring neuron similarity. 25 [PITH_FULL_IMAGE:fi… view at source ↗
Figure 25
Figure 25. Figure 25: Robustness to the mutual top-k matching criterion. We repeat the scaling analysis for Pythia–OPT and Diffusion–OpenCLIP for different values of k in the nearest neighbor criterion. Increasing k results in more discovered neuron pairs, but the fitted power-law exponents remain within a narrow sublinear range. C.2 Robustness to the Mutual Top-k Criterion Our main scaling experiments in Section 4 identify Ro… view at source ↗
Figure 26
Figure 26. Figure 26: Rosetta Neuron counts under input permutation lack systematic scaling. Power-law scaling is absent under dataset permu￾tation. For each model pair, we first compute token￾level activations on the same dataset used in the main experiments. Before computing cross-model correla￾tions, we randomly permute the flattened activation positions for one model so that correlations are com￾puted between mismatched in… view at source ↗
Figure 27
Figure 27. Figure 27: Scaling behavior of Rosetta and non-Rosetta neurons in simulation. Top: the number of Rosetta Neurons follows the predicted scaling law according to our analytical model. Middle: Rosetta Neurons become more isolated with scale. Bottom: Non-Rosetta neurons become less isolated with scale, as predicted by our theory. 37 [PITH_FULL_IMAGE:figures/full_fig_p037_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: The Neuron Polarization Effect in Language Models. Rosetta Neurons exhibit increas￾ing mean excess kurtosis of vocabulary-space projections with scale, suggestive of monosemantic function. Non-Rosetta neurons remain near zero, consistent with weaker vocabulary-level selectivity under this metric. E.2 Document-Type Firing Analysis We provide additional details and results for the document-type firing analy… view at source ↗
Figure 29
Figure 29. Figure 29: Rosetta Neuron document-type firing in Pythia. For each Pythia model size, each bar shows how often top-activating Rosetta Neuron contexts fall into a document category, normalized by that category’s token frequency in the validation cache. The dashed line marks the corpus baseline. With scale, Rosetta Neuron firing shifts toward specialized categories such as code and math. 39 [PITH_FULL_IMAGE:figures/f… view at source ↗
Figure 30
Figure 30. Figure 30: Document-type firing in Qwen2.5. We use the same normalized document-type firing statistic as in [PITH_FULL_IMAGE:figures/full_fig_p040_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Depth-wise distribution of Rosetta Neurons in Pythia across scale. Rosetta Neurons discovered from the Pythia–OPT matching runs. 0.00 0.08 0.08 0.17 0.17 0.25 0.25 0.33 0.33 0.42 0.42 0.50 0.50 0.58 0.58 0.67 0.67 0.75 0.75 0.83 0.83 0.92 0.92 1.00 Normalized Depth Bin 86M 300M 986M 1.8B 4.4B Model Size 0.09 0.07 0.14 0.12 0.15 0.17 0.11 0.07 0.05 0.01 0.01 0.00 0.44 0.26 0.11 0.07 0.04 0.03 0.02 0.01 0.0… view at source ↗
Figure 32
Figure 32. Figure 32: Depth-wise distribution of Rosetta Neurons in OpenCLIP across scale. Rosetta Neurons discovered from the Diffusion–OpenCLIP matching runs. F Data Filtering Experimental Details Data. We use CodeSearchNet (Husain et al., 2019), a function-level code corpus extracted from publicly available GitHub repositories. This dataset spans six programming languages: Python, JavaScript, Java, Go, Ruby, PHP. Each examp… view at source ↗
Figure 33
Figure 33. Figure 33: Language matching stability as a function of dataset size. We vary the number of tokens used to match neurons between Pythia-6.9B and OPT-6.7B. Left: number of Rosetta Neurons discovered at each data scale. Right: overlap with the Rosetta Neuron set from the previous data scale. The discovered Rosetta Neuron set becomes increasingly stable as the token budget grows. 10 0 10 1 10 2 10 3 10 4 # Images 5 × 1… view at source ↗
Figure 34
Figure 34. Figure 34: Diffusion-to-discriminative matching stability as a function of dataset size. We vary the number of generated images used to match neurons between pMF DiT-B/16 and OpenCLIP ViT-B/16. Left: number of Rosetta Neurons discovered at each data scale. Right: overlap with the Rosetta Neuron set from the previous data scale. Stability improves as the number of images used for neuron matching approaches 50,000. 10… view at source ↗
Figure 35
Figure 35. Figure 35: Effect of image distribution on vision model matching. We compare Rosetta Neuron matching between OpenCLIP ViT-B/16 and DINOv2 ViT-B/14 using real and diffusion-generated images. Left: number of Rosetta Neurons identified at each data scale. Right: Jaccard index between the Rosetta Neuron sets obtained from the two image distributions. At larger data scales, the two distributions yield similar numbers of … view at source ↗
Figure 36
Figure 36. Figure 36: Effect of the number of top￾activating images shown to the VLM judge. We ablate the number of top-activating images shown to the VLM judge. Specifically, we re￾peat the monosemanticity evaluation with k ∈ {2, 5, 10, 15, 20, 50}, constructing each composite from the top-k images and their corresponding ac￾tivation maps and overlays. As shown in [PITH_FULL_IMAGE:figures/full_fig_p048_36.png] view at source ↗
Figure 37
Figure 37. Figure 37: VLM-judged monosemanticity rate in DINOv2 and diffusion models. Rosetta Neurons discovered from DINOv2–Diffusion matching runs exhibit increasing monosemanticity with scale, while non-Rosetta neurons become polysemantic according to the metric. This provides additional evidence for the Neuron Polarization Effect across vision model families. H.4 Results for Neuron Selectivity in Other Vision Models In Sec… view at source ↗
Figure 38
Figure 38. Figure 38: DINOv3 does not exhibit a clear Rosetta Neuron scaling law. A notable exception to the scaling behavior ob￾served in our main experiments from Section 4 is DINOv3 (Siméoni et al., 2025), which does not ex￾hibit a clear Rosetta Neuron scaling law. We repeat the discriminative-to-diffusion matching procedure from Section 4 with DINOv3, and report the result￾ing Rosetta Neuron counts alongside DINOv2 results… view at source ↗
read the original abstract

We investigate whether neuron populations within neural networks evolve predictably with scale, extending scaling laws beyond macroscopic observables such as loss. To probe this question, we study Rosetta Neurons, a previously characterized class of neurons whose activation patterns are similar across independently trained models (Dravid et al., 2023). In separate analyses of language models up to 30B parameters and vision models up to 5B parameters, we observe that the population of Rosetta Neurons follows a sublinear power law in model size, growing in absolute number but occupying a shrinking fraction of the total neuron count. We further observe a Neuron Polarization Effect: Rosetta Neurons become more selective and increasingly monosemantic with scale, separating from a growing non-Rosetta population that remains less selective. An analytical model balancing feature utility against limited neuron capacity explains the sublinear power-law scaling and this polarization effect. Finally, we find that Rosetta Neurons become more domain-specialized with scale and illustrate their selectivity through a targeted data-filtering case study for continued pretraining. Our results point to a scaling law for interpretable, shared neuron-level structure, linking model size to systematic changes in neuron universality, selectivity, and specialization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that Rosetta Neurons (neurons with similar activation patterns across independently trained models) follow a sublinear power-law scaling in count with model size (growing absolutely but shrinking as a fraction of total neurons) in language models up to 30B and vision models up to 5B parameters. It further claims a Neuron Polarization Effect in which Rosetta Neurons become more selective and monosemantic with scale, separating from a less-selective non-Rosetta population; both phenomena are explained by an analytical model balancing feature utility against neuron capacity. Additional claims include increasing domain specialization of Rosetta Neurons with scale and a data-filtering case study for continued pretraining.

Significance. If the empirical scaling and polarization observations hold and the analytical model is shown to be independently derived, the work would extend macroscopic scaling laws to neuron-level structure, offering a mechanistic account of how universality, selectivity, and specialization evolve with capacity. The case study provides a concrete application link to data curation.

major comments (2)
  1. [Abstract] Abstract and empirical sections: the sublinear power-law claim and Neuron Polarization Effect are presented without error bars, confidence intervals, exclusion criteria for Rosetta Neuron identification, or details on how similarity thresholds were validated across model sizes; these omissions are load-bearing because they prevent verification of the central scaling and selectivity observations.
  2. [Analytical Model] Analytical model description: the text states that the model 'explains' the sublinear exponent and polarization but does not provide the derivation or demonstrate that its parameters (e.g., utility or capacity terms) are fixed independently of the measured scaling data rather than chosen to reproduce the observed exponent; this directly affects whether the model constitutes an independent explanation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed review. We appreciate the feedback on improving the clarity and verifiability of our claims regarding Rosetta Neuron scaling and the analytical model. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and empirical sections: the sublinear power-law claim and Neuron Polarization Effect are presented without error bars, confidence intervals, exclusion criteria for Rosetta Neuron identification, or details on how similarity thresholds were validated across model sizes; these omissions are load-bearing because they prevent verification of the central scaling and selectivity observations.

    Authors: We agree that these details are essential for rigorous verification. In the revised manuscript, we will include error bars and confidence intervals on all scaling plots and polarization metrics. We will also add explicit exclusion criteria for identifying Rosetta Neurons and a section detailing the validation of similarity thresholds across different model sizes, including sensitivity analyses. revision: yes

  2. Referee: [Analytical Model] Analytical model description: the text states that the model 'explains' the sublinear exponent and polarization but does not provide the derivation or demonstrate that its parameters (e.g., utility or capacity terms) are fixed independently of the measured scaling data rather than chosen to reproduce the observed exponent; this directly affects whether the model constitutes an independent explanation.

    Authors: The analytical model is presented in the main text with a high-level derivation balancing feature utility against neuron capacity. However, we acknowledge that a full step-by-step derivation and explicit demonstration of parameter independence would strengthen the claim of it being an independent explanation. We will expand this section in the revision to include the complete derivation and clarify how the utility and capacity parameters are determined from first principles and prior literature, independent of the scaling observations. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper reports an empirical observation of sublinear scaling in the count of Rosetta Neurons (previously defined in Dravid et al. 2023) across independently trained models, plus a polarization effect in selectivity. It then introduces an analytical model derived from balancing feature utility against neuron capacity constraints to explain the observed power-law exponent and polarization. No equations or text indicate that model parameters were fitted to the measured scaling data or that the exponent is recovered by construction; the model is presented as independently derived. The self-citation to the 2023 definition of Rosetta Neurons is not load-bearing for the new scaling claim, and the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters or axioms; the analytical model is stated to balance feature utility against neuron capacity but its exact functional form and any fitted constants are not visible.

pith-pipeline@v0.9.1-grok · 5752 in / 1129 out tokens · 38743 ms · 2026-06-28T10:47:40.679178+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 1 canonical work pages

  1. [1]

    Disentangling mlp neuron weights in vocabulary space.arXiv preprint arXiv:2604.06005,

    Asaf Avrahamy, Yoav Gur-Arieh, and Mor Geva. Disentangling mlp neuron weights in vocabulary space.arXiv preprint arXiv:2604.06005,

  2. [2]

    Layer normalization.arXiv preprint arXiv:1607.06450,

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization.arXiv preprint arXiv:1607.06450,

  3. [3]

    Emergence of sparse representations from noise

    11 Trenton Bricken, Rylan Schaeffer, Bruno Olshausen, and Gabriel Kreiman. Emergence of sparse representations from noise. InProceedings of the 40th International Conference on Machine Learning, pp. 3148–3191, 2023a. Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, ...

  4. [4]

    Best-buddies similarity for robust template matching

    Tali Dekel, Shaul Oron, Michael Rubinstein, Shai Avidan, and William T Freeman. Best-buddies similarity for robust template matching. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 2021–2029,

  5. [5]

    Rosetta neurons: Mining the common units in a model zoo

    Amil Dravid, Yossi Gandelsman, Alexei A Efros, and Assaf Shocher. Rosetta neurons: Mining the common units in a model zoo. InProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1934–1943,

  6. [6]

    Toy models of superposition

    Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, et al. Toy models of superposition. arXiv preprint arXiv:2209.10652,

  7. [7]

    The pile: An 800gb dataset of diverse text for language modeling.arXiv preprint arXiv:2101.00027,

    Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The pile: An 800gb dataset of diverse text for language modeling.arXiv preprint arXiv:2101.00027,

  8. [8]

    Transformer feed-forward layers are key-value memories

    Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. Transformer feed-forward layers are key-value memories. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 5484–5495,

  9. [9]

    URLhttps://www.science.org/doi/abs/10.1126/science.1089506

    doi: 10.1126/ science.1089506. URLhttps://www.science.org/doi/abs/10.1126/science.1089506. 12 James V Haxby, M Ida Gobbini, Maura L Furey, Alumit Ishai, Jennifer L Schouten, and Pietro Pietrini. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539):2425–2430,

  10. [10]

    Deep learning scaling is predictable, empirically.arXiv preprint arXiv:1712.00409,

    Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou. Deep learning scaling is predictable, empirically.arXiv preprint arXiv:1712.00409,

  11. [11]

    Training compute-optimal large language models.arXiv preprint arXiv:2203.15556, 10,

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, DDL Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556, 10,

  12. [12]

    Why larger models learn more: Effects of capacity, interference, and rare-task retention.arXiv preprint arXiv:2605.29548,

    Jing Huang, Daniel Wurgaft, Rachit Bansal, Laura Ruis, Naomi Saphra, David Alvarez-Melis, Andrew Kyle Lampinen, Christopher Potts, and Ekdeep Singh Lubana. Why larger models learn more: Effects of capacity, interference, and rare-task retention.arXiv preprint arXiv:2605.29548,

  13. [13]

    Code- searchnet challenge: Evaluating the state of semantic code search.arXiv preprint arXiv:1909.09436,

    Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. Code- searchnet challenge: Evaluating the state of semantic code search.arXiv preprint arXiv:1909.09436,

  14. [14]

    Scaling laws for neural language models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361,

  15. [15]

    Yizhou Liu, Ziming Liu, and Jeff Gore

    URL http: //arxiv.org/abs/1511.07543. Yizhou Liu, Ziming Liu, and Jeff Gore. Superposition yields robust neural scaling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems,

  16. [16]

    13 Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, and Kaiming He

    URL https://openreview.net/forum?id= Bkg6RiCqY7. 13 Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, and Kaiming He. One-step latent-free image generation with pixel mean flows. arXiv preprint arXiv:2601.22158,

  17. [17]

    In-context learning and induction heads

    Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895,

  18. [18]

    Learning to generate reviews and discovering sentiment.arXiv preprint arXiv:1704.01444,

    Alec Radford, Rafal Jozefowicz, and Ilya Sutskever. Learning to generate reviews and discovering sentiment.arXiv preprint arXiv:1704.01444,

  19. [19]

    Polysemanticity and capacity in neural networks.arXiv preprint arXiv:2210.01892,

    Adam Scherlis, Kshitij Sachan, Adam S Jermyn, Joe Benton, and Buck Shlegeris. Polysemanticity and capacity in neural networks.arXiv preprint arXiv:2210.01892,

  20. [20]

    Dinov3.arXiv preprint arXiv:2508.10104,

    Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104,

  21. [21]

    Qwen2.5 technical report.arXiv preprint arXiv:2412.15115,

    An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2.5 technical report.arXiv preprint arXiv:2412.15115,

  22. [22]

    In pursuit of pixel supervision for visual pre-training.arXiv preprint arXiv:2512.15715,

    Lihe Yang, Shang-Wen Li, Yang Li, Xinjie Lei, Dong Wang, Abdelrahman Mohamed, Heng- shuang Zhao, and Hu Xu. In pursuit of pixel supervision for visual pre-training.arXiv preprint arXiv:2512.15715,

  23. [23]

    Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068,

    Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068,

  24. [24]

    C.1 Model Families Language Model Families.For language models, we only consider pretrained models that have not undergone post-training

    We then present robustness checks for the matching procedure, including an ablation of the mutual-knearest-neighbor criterion and a dataset-permutation null. C.1 Model Families Language Model Families.For language models, we only consider pretrained models that have not undergone post-training. We conduct the neuron matching using models from the Pythia, ...

  25. [25]

    For the generative model, we use one-step diffusion models built on the Diffusion Transformer architecture (Peebles & Xie, 2023)

    Vision Model Families.We analyze discriminative vision models from the OpenCLIP, DINOv2, and Pixio families, spanning scales from approximately 80 million to 5 billion parameters (Cherti et al., 2023; Radford et al., 2021; Oquab et al., 2024; Yang et al., 2025). For the generative model, we use one-step diffusion models built on the Diffusion Transformer ...

  26. [26]

    neuron” and “coordinate

    Under this null, the number of discovered Rosetta Neurons collapses to roughly 20–100 matches. Moreover, these counts no longer exhibit the systematic sublinear power-law trend observed with aligned activations. This suggests that Rosetta Neuron scaling depends on shared responses to the aligned inputs, rather than being induced by the matching procedure ...

  27. [27]

    As N grows, the number of detectable Rosetta Neurons increases, and their average isolation also increases

    D.3 Prediction: Neuron Polarization The same allocation profile predicts a polarization effect. As N grows, the number of detectable Rosetta Neurons increases, and their average isolation also increases. At the same time, a growing tail of latent features remains weakly isolated, corresponding to a more crowded non-Rosetta background. Rosetta purification...

  28. [28]

    Random non-Rosetta neurons may exhibit category-specific biases, but do not exhibit the same consistent scale-dependent shift toward specialized domains

    Rosetta Neurons show an increasing shift toward specialized categories such as code and math with scale. Random non-Rosetta neurons may exhibit category-specific biases, but do not exhibit the same consistent scale-dependent shift toward specialized domains. E.3 Depth-Wise Distribution of Rosetta Neurons We analyze where Rosetta Neurons appear across netw...

  29. [29]

    striped patterns

    G.3 Ablation on the Image Distribution Used for Vision Model Matching Our vision experiments match neurons between a generative model and a discriminative model, following the GAN-based setup of (Dravid et al., 2023). For modern diffusion-based generators, this requires generated images, since activations from the generative model are only available along...

  30. [30]

    The baseline chance is a random predictor that independently marks each test image as activating with probability 0.5

    VLM-as-a-judge performs meaningfully above a random baseline. The baseline chance is a random predictor that independently marks each test image as activating with probability 0.5. In this 5-positive/5-negative setup, its expected accuracy and precision are0.5, while its expected Recall and F1 are0.4995and0.4865, respectively. Model Accuracy Precision Rec...

  31. [31]

    In contrast to DINOv2, which follows the trend observed in other vision model families, DI- NOv3 Rosetta Neuron counts do not follow a mono- tonic scaling trend. This deviation is consistent with the fact that DINOv3 modifies the DINOv2 training setup with additional constraints on intermediate rep- resentations, encouraging them to match statistics from ...