Causal in- terpretation of sparse autoencoder features in vision

Sangyu Han, Yearim Kim, Nojun Kwak · 2025 · arXiv 2509.00749

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Beyond Semantics: Disentangling Information Scope in Sparse Autoencoders for CLIP

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

The paper proposes information scope as a new interpretability axis for SAE features in CLIP and introduces the Contextual Dependency Score to separate local from global scope features, showing they influence model predictions differently.

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

SAE-FT uses a sparse autoencoder on pre-trained CLIP visual representations to regularize fine-tuning by penalizing changes to semantically meaningful features, aiming for robust performance on ImageNet and distribution shifts.

citing papers explorer

Showing 2 of 2 citing papers.

Beyond Semantics: Disentangling Information Scope in Sparse Autoencoders for CLIP cs.CV · 2026-04-07 · unverdicted · none · ref 17
The paper proposes information scope as a new interpretability axis for SAE features in CLIP and introduces the Contextual Dependency Score to separate local from global scope features, showing they influence model predictions differently.
Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models cs.CV · 2026-05-15 · unverdicted · none · ref 10
SAE-FT uses a sparse autoencoder on pre-trained CLIP visual representations to regularize fine-tuning by penalizing changes to semantically meaningful features, aiming for robust performance on ImageNet and distribution shifts.

Causal in- terpretation of sparse autoencoder features in vision

fields

years

verdicts

representative citing papers

citing papers explorer