Subject identity variance dominates frozen representations in three EEG foundation models by 13-89x over null, and erasing the linear subject axis improves label decoding where within-subject label variation exists.
Saxe, James L
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
RNN computation is recovered from multi-hop graph pathways, and constraining these pathways via resolvent regularization yields improved temporal sparsity and task performance over standard L1.
Weight decay controls distinct learning regimes in grokking transformers on modular arithmetic, tracked by new cheap attention-based diagnostics with empirical critical value and exponent fits.
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.
Task connectivity in graph-structured multi-task environments enhances generalization and stability, with stronger benefits for attention models than MLPs.
citing papers explorer
-
The Identity Trap in EEG Foundation Models: A Diagnostic Audit
Subject identity variance dominates frozen representations in three EEG foundation models by 13-89x over null, and erasing the linear subject axis improves label decoding where within-subject label variation exists.
-
Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
Weight decay controls distinct learning regimes in grokking transformers on modular arithmetic, tracked by new cheap attention-based diagnostics with empirical critical value and exponent fits.
-
Deep sequence models tend to memorize geometrically; it is unclear why
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.