High concentration means the neuron responds cleanly to one frequency

Fourier analysis (activation-based): For modular addition, compute per-neuron Fourier concentration as the fraction of spectral power at the dominant frequency when activations are viewed as a function of (a+b) modp[Nanda et al · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers

cs.LG · 2026-05-10 · conditional · novelty 6.0 · 2 refs

Sparse MoE FFNs redistribute computation from FFN to attention in small Transformers, driven mainly by architectural sparsity rather than learned expert specialization.

citing papers explorer

Showing 1 of 1 citing paper.

Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers cs.LG · 2026-05-10 · conditional · none · ref 7 · 2 links
Sparse MoE FFNs redistribute computation from FFN to attention in small Transformers, driven mainly by architectural sparsity rather than learned expert specialization.

High concentration means the neuron responds cleanly to one frequency

fields

years

verdicts

representative citing papers

citing papers explorer