In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight subnetworks.
arXiv preprint arXiv:2409.15318 , year =
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Linear readouts incur an Omega(d^{-1/2}) crosstalk floor that caps the Hanni template at d^{3/2} capacity, while threshold recovery succeeds at quadratic loads for s = O(d/log d) sparsity, resolving the apparent contradiction via distinct readout invariants.
PAM, a complex-valued associative memory model, exhibits steeper power-law scaling in loss and perplexity than a matched real-valued baseline when trained on WikiText-103 from 5M to 100M parameters.
Strong superposition causes neural loss to scale as the inverse of model dimension due to geometric feature overlaps, explaining scaling laws for broad frequency distributions.
citing papers explorer
-
Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space
In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight subnetworks.
-
Linear-Readout Floors and Threshold Recovery in Computation in Superposition
Linear readouts incur an Omega(d^{-1/2}) crosstalk floor that caps the Hanni template at d^{3/2} capacity, while threshold recovery succeeds at quadratic loads for s = O(d/log d) sparsity, resolving the apparent contradiction via distinct readout invariants.
-
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space
PAM, a complex-valued associative memory model, exhibits steeper power-law scaling in loss and perplexity than a matched real-valued baseline when trained on WikiText-103 from 5M to 100M parameters.
-
Superposition Yields Robust Neural Scaling
Strong superposition causes neural loss to scale as the inverse of model dimension due to geometric feature overlaps, explaining scaling laws for broad frequency distributions.