arxiv: 2605.13930 · v1 · submitted 2026-05-13 · 💻 cs.LG · cs.HC· cs.NE

Recognition: 1 theorem link

· Lean Theorem

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders

William Lehn-Schi{\o}ler , Magnus Ruud Kj{\ae}r , Rahul Thapa , Magnus Guldberg Pedersen , Anton Storgaard Mosquera , Nick Williams , Radu Gatej , Tue Lehn-Schi{\o}ler

show 4 more authors

S\'andor Beniczky Sadasivan Puthusserypady James Zou Lars Kai Hansen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:09 UTC · model grok-4.3

classification 💻 cs.LG cs.HCcs.NE

keywords sparse autoencodersEEG foundation modelsconcept steeringmonosemanticityclinical entanglementspectral decodermechanistic interpretability

0 comments

The pith

Sparse autoencoders extract steerable clinical features from EEG foundation models while exposing age-pathology entanglements and wrecking-ball failures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains TopK sparse autoencoders on embeddings from three different EEG transformer models and grounds the resulting features in a clinical taxonomy covering abnormality, age, sex, and medication. It then uses concept steering to introduce a target-versus-off-target probe area metric that measures how selectively changing one feature affects model outputs. This metric divides the discovered features into three regimes: selectively steerable, encoded but entangled with other concepts, and not encoded at all. The same procedure also maps steered changes back through a spectral decoder to show concrete effects on the EEG amplitude spectrum, such as suppression of slow waves or restoration of alpha activity. If these claims hold, the work supplies a concrete way to diagnose why current EEG models cannot manipulate one clinical variable without corrupting another.

Core claim

TopK SAEs applied to the embeddings of SleepFM, REVE, and LaBraM yield sparse dictionaries whose individual directions can be steered; a probe-area metric on these directions reveals three regimes of selectivity and directly identifies representational failures such as global performance collapse after intervention and irreducible confounding between age and pathology labels.

What carries the argument

The target-versus-off-target probe area metric computed from concept steering on SAE-derived features, which quantifies how narrowly an intervention on one clinical concept affects model predictions.

If this is right

Some clinical concepts can be independently adjusted in model outputs without collapsing overall accuracy.
Certain pairs of concepts, such as age and pathology, remain entangled so that suppressing one necessarily corrupts the other.
Interventions that are not selective produce wrecking-ball effects that degrade performance across many downstream tasks.
Spectral decoding converts each steered feature into a concrete change in frequency-band amplitudes that clinicians can inspect.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same SAE-plus-steering pipeline could be applied to other biosignal foundation models to test whether the three-regime pattern is general.
Non-encoded concepts identified by the metric indicate specific gaps in the original training data that future collection efforts could target.
Clinicians could use selective steering to generate counterfactual EEG traces that simulate the effect of removing a medication or correcting for age.

Load-bearing premise

The assumption that SAE features grounded in the clinical taxonomy actually correspond to causally active internal variables rather than mere correlations in the training data.

What would settle it

If steering an SAE feature labeled as abnormality changes the model's abnormality predictions on held-out EEG recordings but also shifts age or sex predictions by a comparable amount, the claim of selective steerability collapses.

Figures

Figures reproduced from arXiv: 2605.13930 by Anton Storgaard Mosquera, James Zou, Lars Kai Hansen, Magnus Guldberg Pedersen, Magnus Ruud Kj{\ae}r, Nick Williams, Radu Gatej, Rahul Thapa, Sadasivan Puthusserypady, S\'andor Beniczky, Tue Lehn-Schi{\o}ler, William Lehn-Schi{\o}ler.

**Figure 1.** Figure 1: Pipeline overview. Starting from a frozen EEG foundation model: (Stage I) A shallow MLP spectral decoder translates token embeddings back into a human interpretable space. (Stage II) For each transformer layer, a TopK SAE recovers a sparse, over-complete feature dictionary from normalized encoder activations. (Stage III) SAE features are mapped to known clinical concepts using TCAV. (Stage IV) Concept stee… view at source ↗

**Figure 2.** Figure 2: SAE-faithfulness layer sweep. Test AUROC of a linear probe trained via 5-fold crossvalidation on mean-pooled embeddings of each finetuned encoder. During inference, layer-ℓ activations are replaced by their TopK-SAE reconstructions as ℓ sweeps through every transformer block. Shaded bands represent 95% confidence intervals across the CV folds; the dotted horizontal lines indicate the no-SAE baseline mean… view at source ↗

**Figure 3.** Figure 3: Monosemanticity taxonomy across SAE expansion and encoder depth. Each cell reports the fraction of concept-enriched SAE features in one of three taxonomy classes (Separable: monosemantic; Entangled: polysemantic co-activations; Dead: semantically uninformative/inactive). Columns represent encoders (SleepFM, LaBraM, REVE), with x-axes indexing the encoder layer and y-axes indexing expansion factor E ∈ {1, 2… view at source ↗

**Figure 4.** Figure 4: Concept encoding strength and steering selectivity. Top: Encoding strength (AUROC0) measured via per-layer linear probes fit to the clean SAE-decoded reconstructions. Bottom: Excess selectivity (∆˜ ), quantifying the integrated asymmetry between target and off-target probe degradation under TCAV-ranked clamping (Section 3.5). For abnormality as target, we use age as off-target. For all other targets, we us… view at source ↗

**Figure 5.** Figure 5: Steering sweeps across the encoding–selectivity landscape. Nine representative configurations from [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: grounds the abstract selectivity metrics ( [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: SAE dictionary size across encoders and expansion rates. Each cell gives the number of learned SAE features, which equals the encoder’s embedding dimension (denc = 128 for SleepFM, 200 for LaBraM, 512 for REVE) times the expansion rate E. Because REVE is 4× wider than SleepFM, an E=1 REVE SAE already exceeds the size of an E=4 SleepFM SAE, and the E=64 REVE configuration spans 32,768 features. This asymmet… view at source ↗

read the original abstract

EEG foundation models achieve state-of-the-art clinical performance, yet the internal computations driving their predictions remain opaque: a barrier to clinical trust. We apply TopK Sparse Autoencoders (SAEs) across three architecturally distinct EEG transformers: SleepFM, REVE, and LaBraM to extract sparse feature dictionaries from their embeddings. By grounding these features in a clinical taxonomy (abnormality, age, sex, and medication), we benchmark monosemanticity and entanglement across architectures. A single hyperparameter procedure, driven by an intrinsic dictionary health audit, transfers robustly across all three architectures. Via concept steering, we introduce a "target vs. off-target" probe area metric to quantify steering selectivity and reveal three operational regimes: selectively steerable, encoded but entangled, and non-encoded. This framework exposes critical representational failures: "wrecking-ball" interventions that collapse global model performance, and clinical entanglements, such as age-pathology confounding, where it is impossible to suppress one concept without corrupting the other. Finally, a spectral decoder maps these interventions back to the amplitude spectrum, translating latent manipulations into physiologically interpretable frequency signatures, such as pathological slow-wave suppression and $\alpha$-band restoration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAE pipeline on EEG models adds a steering selectivity metric and cross-architecture transfer but rests on unvalidated assumptions about feature grounding and causal effects.

read the letter

This paper takes TopK SAEs to three EEG transformers (SleepFM, REVE, LaBraM), grounds the learned features in a clinical taxonomy of abnormality, age, sex and medication, and introduces a target-versus-off-target probe-area metric to score steering selectivity. It reports three regimes for the features and flags failures such as global performance collapse on some interventions and age-pathology confounding. A spectral decoder then maps the steered changes back to amplitude spectra for physiological reading. The single hyperparameter procedure that transfers via an intrinsic dictionary audit is the most immediately usable piece; it removes the need to retune sparsity per model. The spectral mapping is also a clear practical addition for a domain where frequency content matters clinically. The cross-architecture consistency gives some evidence that the approach is not tied to one architecture. The main limitation is that the abstract and available description contain no numbers, error bars, ablation tables or controls. There is no reported test of the probe-area metric against random or permuted features, no check on whether steering effects survive alternative sparsity penalties, and no independent validation that the taxonomy labels attach to the SAE features without circularity. Without those, the regime distinctions and entanglement claims remain provisional. The work is aimed at researchers who build or audit clinical EEG models and want concrete tools to surface representational biases. Anyone already experimenting with SAEs in new modalities would get value from the transfer recipe and the spectral decoder, even if they have to supply their own validation experiments. It deserves peer review because the domain extension and the steering metric are new enough to warrant referee scrutiny on the empirical controls.

Referee Report

3 major / 2 minor

Summary. The paper applies TopK Sparse Autoencoders to the embeddings of three EEG foundation models (SleepFM, REVE, LaBraM) to extract sparse feature dictionaries. Features are grounded in a clinical taxonomy (abnormality, age, sex, medication) to benchmark monosemanticity and entanglement. Concept steering is performed with a new 'target vs. off-target' probe-area metric that identifies three operational regimes (selectively steerable, encoded but entangled, non-encoded). The framework is used to surface representational failures such as wrecking-ball interventions and age-pathology confounding, with a spectral decoder translating interventions into frequency-domain signatures.

Significance. If the central claims are supported by rigorous validation, the work supplies the first systematic mechanistic-interpretability pipeline for EEG transformers, including transferable SAE training, a quantitative steering-selectivity metric, and physiologically grounded failure modes. These contributions could materially improve clinical trust and debugging of high-stakes EEG models.

major comments (3)

[§4.2] §4.2 (probe-area metric definition): the metric is computed directly from the steered activations that also define the three regimes; without an ablation that steers on random or permuted SAE features, it is impossible to rule out that reported selectivity scores are artifacts of the TopK dictionary rather than genuine causal structure.
[§3.1–3.3] §3.1–3.3 (feature grounding procedure): the assignment of SAE features to the supplied clinical taxonomy is presented without an independent validation set or inter-rater reliability measure; any label leakage would propagate directly into the entanglement and regime classifications.
[§5.1] §5.1 (wrecking-ball and age-pathology examples): the reported performance collapse and confounding effects are shown for single interventions only; no quantitative comparison to baseline random steering or to an SAE trained with a different sparsity penalty is provided, weakening the claim that these are intrinsic representational failures.

minor comments (2)

[Figure 3] Figure 3 caption does not state the exact number of runs or random seeds used to generate the probe-area distributions.
[§4.3] The spectral decoder architecture is described only at a high level; the precise mapping from latent interventions to amplitude spectra should be given as an equation or pseudocode.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, indicating where revisions will be made to improve the manuscript.

read point-by-point responses

Referee: [§4.2] §4.2 (probe-area metric definition): the metric is computed directly from the steered activations that also define the three regimes; without an ablation that steers on random or permuted SAE features, it is impossible to rule out that reported selectivity scores are artifacts of the TopK dictionary rather than genuine causal structure.

Authors: We agree that an ablation on random or permuted SAE features is required to confirm that the probe-area metric captures genuine causal structure. In the revised manuscript we will add this ablation to §4.2, applying identical steering to randomly selected and permuted features and reporting that selectivity scores are substantially lower than those obtained from the learned dictionary, thereby supporting the reported regimes. revision: yes
Referee: [§3.1–3.3] §3.1–3.3 (feature grounding procedure): the assignment of SAE features to the supplied clinical taxonomy is presented without an independent validation set or inter-rater reliability measure; any label leakage would propagate directly into the entanglement and regime classifications.

Authors: The grounding procedure correlates SAE feature activations with clinical labels on held-out data. We acknowledge the absence of an independent validation set and inter-rater reliability statistics. We will expand §3.1–3.3 with a clearer description of the assignment protocol and include a supplementary inter-rater reliability check on a subset of features to address potential label leakage concerns. revision: partial
Referee: [§5.1] §5.1 (wrecking-ball and age-pathology examples): the reported performance collapse and confounding effects are shown for single interventions only; no quantitative comparison to baseline random steering or to an SAE trained with a different sparsity penalty is provided, weakening the claim that these are intrinsic representational failures.

Authors: The examples illustrate specific failure modes observed consistently across the three models. We will add quantitative random-steering baselines to §5.1 to demonstrate that the reported collapses exceed those from random interventions. A full comparison across alternative sparsity penalties would require substantial new training runs; we will instead note this as a limitation while emphasizing that the failures persist under the single, transferable hyperparameter procedure used for all architectures. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical framework remains self-contained

full rationale

The paper applies TopK SAEs to extract features from EEG transformers, grounds them empirically in a supplied clinical taxonomy, and introduces a probe-area metric computed from steered activations. No equations, fitted parameters, or self-citations are shown to reduce the reported operational regimes, selectivity scores, or entanglement findings to inputs by construction. The derivation chain consists of standard SAE training followed by observational steering experiments whose metrics are computed directly from the resulting activations rather than being tautological with the training objective or any prior self-cited result. Cross-architecture transfer and the intrinsic dictionary audit further keep the central claims independent of any load-bearing self-reference.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only view limits visibility into exact free parameters; the main unstated assumptions are that SAE features align with clinical concepts and that steering interventions are causally meaningful.

free parameters (1)

TopK sparsity level
Chosen via intrinsic dictionary health audit; exact value and selection procedure not visible in abstract.

axioms (1)

domain assumption SAE features correspond to monosemantic clinical concepts when grounded in the taxonomy
Invoked when benchmarking monosemanticity and entanglement across architectures.

pith-pipeline@v0.9.0 · 5585 in / 1391 out tokens · 43954 ms · 2026-05-15T05:09:25.560841+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We apply TopK Sparse Autoencoders (SAEs) ... concept steering ... target vs. off-target probe area metric ... three operational regimes: selectively steerable, encoded but entangled, and non-encoded.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 7 internal anchors

[1]

Brandon Westover, Poul Jennum, Andreas Brink-Kjaer, Emmanuel Mignot, and James Zou

Rahul Thapa, Magnus Ruud Kjaer, Bryan He, Ian Covert, Hyatt Moore IV , Umaer Hanif, Gauri Ganjoo, M. Brandon Westover, Poul Jennum, Andreas Brink-Kjaer, Emmanuel Mignot, and James Zou. A multimodal sleep foundation model for disease prediction.Nature Medicine, 32: 752–762, 2026. doi: 10.1038/s41591-025-04133-4

work page doi:10.1038/s41591-025-04133-4 2026
[2]

REVE: A foundation model for EEG: Adapting to any setup with large-scale pretraining on 25,000 subjects.Advances in Neural Information Processing Systems, 2025

Yassine El Ouahidi, Jonathan Lys, Philipp Thölke, Nicolas Farrugia, Bastien Pasdeloup, Vincent Gripon, Karim Jerbi, and Giulia Lioi. REVE: A foundation model for EEG: Adapting to any setup with large-scale pretraining on 25,000 subjects.Advances in Neural Information Processing Systems, 2025. URLhttps://brain-bzh.github.io/reve/

work page 2025
[3]

Large brain model for learning generic representations with tremendous EEG data in BCI

Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representations with tremendous EEG data in BCI. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[4]

BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data

Demetres Kostas, Stéphane Aroca-Ouellette, and Frank Rudzicz. BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data. Frontiers in Human Neuroscience, 15, 2021

work page 2021
[5]

Pretraining on Sleep Data Improves non-Sleep Biosignal Tasks

William Lehn-Schiøler, Magnus Ruud Kjær, Phillip Hempel, Magnus Guldberg Pedersen, Rahul Thapa, Bryan He, Nicolai Spicher, Andreas Brink-Kjaer, Lars Kai Hansen, and Em- manuel Mignot. Pretraining on sleep data improves non-sleep biosignal tasks.arXiv preprint arXiv:2605.02500, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

Standardized computer-based organized reporting of EEG: SCORE – second version.Clinical Neurophysiology, 128(11):2334–2346, 2017

Sándor Beniczky, Harald Aurlien, Jan C Brøgger, Lawrence J Hirsch, Donald L Schomer, Eugen Trinka, et al. Standardized computer-based organized reporting of EEG: SCORE – second version.Clinical Neurophysiology, 128(11):2334–2346, 2017. doi: 10.1016/j.clinph.2017.07. 418

work page doi:10.1016/j.clinph.2017.07 2017
[7]

A mathematical framework for transformer circuits.Transformer Circuits Thread,

Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. A...

work page
[8]

URLhttps://transformer-circuits.pub/2021/framework/index.html

work page 2021
[9]

Toy models of superposition.Transformer Circuits Thread, 2022

Nelson Elhage et al. Toy models of superposition.Transformer Circuits Thread, 2022

work page 2022
[10]

Towards monosemanticity: Decomposing language models with dictio- nary learning.Transformer Circuits Thread, 2023

Trenton Bricken et al. Towards monosemanticity: Decomposing language models with dictio- nary learning.Transformer Circuits Thread, 2023

work page 2023
[11]

Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet.Transformer Circuits Thread, 2024

Adly Templeton et al. Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet.Transformer Circuits Thread, 2024. URL https://transformer-circuits.pub/ 2024/scaling-monosemanticity/

work page 2024
[12]

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Tom Lieberum et al. Gemma scope: Open sparse autoencoders everywhere all at once on Gemma 2.arXiv preprint arXiv:2408.05147, 2024

work page internal anchor Pith review arXiv 2024
[13]

Medsae: Dissecting medclip representations with sparse autoencoders.arXiv preprint arXiv:2510.26411, 2025

Riccardo Renzulli, Colas Lepoutre, Enrico Cassano, and Marco Grangetto. Medsae: Dissecting medclip representations with sparse autoencoders.arXiv preprint arXiv:2510.26411, 2025

work page arXiv 2025
[14]

Mammo-sae: Interpreting breast cancer concept learning with sparse autoencoders, 2025

Krishna Kanth Nakka. Mammo-sae: Interpreting breast cancer concept learning with sparse autoencoders, 2025

work page 2025
[15]

Interplm: discovering interpretable features in protein language models via sparse autoencoders.Nature Methods, 22(10):2107–2117, 2025

Elana Simon and James Zou. Interplm: discovering interpretable features in protein language models via sparse autoencoders.Nature Methods, 22(10):2107–2117, 2025

work page 2025
[16]

Beyond black boxes: Enhancing interpretability of transformers trained on neural data, 2025

Laurence Freeman, Philip Shamash, Vinam Arora, Caswell Barry, Tiago Branco, and Eva Dyer. Beyond black boxes: Enhancing interpretability of transformers trained on neural data, 2025

work page 2025
[17]

Mechanistic inter- pretability for transformer-based time series classification

Mat¯ıss Kaln¯are, Sofoklis Kitharidis, Thomas Bäck, and Niki van Stein. Mechanistic inter- pretability for transformer-based time series classification. InComputational Intelligence. IJCCI 2025, volume 2829 ofCommunications in Computer and Information Science. Springer,

work page 2025
[18]

doi: 10.1007/978-3-032-15638-9_15. 11

work page doi:10.1007/978-3-032-15638-9_15
[19]

k-sparse autoencoders

Alireza Makhzani and Brendan Frey. k-sparse autoencoders. InInternational Conference on Learning Representations (ICLR), 2014

work page 2014
[20]

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Hoagy Cunningham et al. Sparse autoencoders find highly interpretable features in language models.arXiv preprint arXiv:2309.08600, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022
[23]

BEiT: BERT pre-training of image transformers

Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. BEiT: BERT pre-training of image transformers. InInternational Conference on Learning Representations (ICLR), 2022

work page 2022
[24]

Interpretability beyond classification accuracy: Quantitative testing with concept activation vectors (TCA V)

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, Jesse Wexler, Fernanda Viegas, and Rory Sayres. Interpretability beyond classification accuracy: Quantitative testing with concept activation vectors (TCA V). InProceedings of the 35th International Conference on Machine Learning (ICML), 2018

work page 2018
[25]

Understanding intermediate layers using linear classifier probes

Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

Computational Linguistics , year =

Yonatan Belinkov. Probing classifiers: Promises, shortcomings, and advances.Computational Linguistics, 48(1):207–219, 2022. doi: 10.1162/coli_a_00422

work page internal anchor Pith review doi:10.1162/coli_a_00422 2022
[27]

Concept-based explainability for an eeg transformer model

Anders Gjølbye Madsen, William Theodor Lehn-Schiøler, Áshildur Jónsdóttir, Bergdís Arnardóttir, and Lars Kai Hansen. Concept-based explainability for an eeg transformer model. In2023 IEEE 33rd International Workshop on Machine Learning for Signal Pro- cessing (MLSP), pages 1–6. IEEE, September 2023. doi: 10.1109/mlsp55844.2023.10285992. URLhttp://dx.doi.o...

work page doi:10.1109/mlsp55844.2023.10285992 2023
[28]

Nomin Enkhtsetseg, William Lehn-Schiøler, Anton Storgaard Mosquera, Magnus Guldberg Ped- ersen, Dylan Rice, George Wambugu, Nshimiyimana Jules Fidele, Melita Cacic Hribljan, Anca Alina Arbune, Sidsel Armand Larsen, Sandor Beniczky, and Farrah J. Mateen. Clinical utility and feasibility of smartphone-based EEG in kenya: A multicenter observational study. a...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[29]

SPEED: Scalable preprocessing of EEG data for self-supervised learning

Anders Gjølbye, Lina Skerath, William Lehn-Schiøler, Nicolas Langer, and Lars Kai Hansen. SPEED: Scalable preprocessing of EEG data for self-supervised learning. InProceedings of the 2024 IEEE International Workshop on Machine Learning for Signal Processing, 2024

work page 2024
[30]

LEACE: Perfect linear concept erasure in closed form

Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff, and Stella Biderman. LEACE: Perfect linear concept erasure in closed form. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[31]

Amnesic probing: Behavioral explanation with amnesic counterfactuals.Transactions of the Association for Computational Linguistics, 9:160–175, 2021

Yanai Elazar, Shauli Ravfogel, Alon Jacovi, and Yoav Goldberg. Amnesic probing: Behavioral explanation with amnesic counterfactuals.Transactions of the Association for Computational Linguistics, 9:160–175, 2021. 12 A Technical appendices and supplementary material Table 3: Notation reference. Symbol Type / shape Definition First used Encoder dscalar∈Z + E...

work page 2021