Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.
Sparse autoencoders reveal selective remapping of visual concepts during adaptation.arXiv preprint arXiv:2412.05276
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 5verdicts
UNVERDICTED 5representative citing papers
The paper proposes information scope as a new interpretability axis for SAE features in CLIP and introduces the Contextual Dependency Score to separate local from global scope features, showing they influence model predictions differently.
GeoSAE extracts a compact, interpretable feature set from frozen brain MRI foundation models that predicts MCI-to-AD conversion (AUC 0.746) with age-deconfounded annotations and replicates across cohorts.
LAKE identifies sparse anomaly-sensitive neurons in pre-trained VLMs using minimal normal samples to build compact normality representations and achieve SOTA anomaly detection with neuron-level interpretability.
VS2 constructs steering vectors from sparse SAE features on unlabeled in-domain activations to improve zero-shot accuracy of CLIP models by 0.93-4.12% on CIFAR-100, CUB-200, and Tiny-ImageNet while remaining forward-pass only.
citing papers explorer
-
Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.
-
Beyond Semantics: Disentangling Information Scope in Sparse Autoencoders for CLIP
The paper proposes information scope as a new interpretability axis for SAE features in CLIP and introduces the Contextual Dependency Score to separate local from global scope features, showing they influence model predictions differently.
-
GeoSAE: Geometric Prior-Guided Layer-Wise Sparse Autoencoder Annotation of Brain MRI Foundation Models
GeoSAE extracts a compact, interpretable feature set from frozen brain MRI foundation models that predicts MCI-to-AD conversion (AUC 0.746) with age-deconfounded annotations and replicates across cohorts.
-
Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models
LAKE identifies sparse anomaly-sensitive neurons in pre-trained VLMs using minimal normal samples to build compact normality representations and achieve SOTA anomaly detection with neuron-level interpretability.
-
Visual Sparse Steering (VS2): Unsupervised Adaptation for Image Classification using Sparsity-Guided Steering Vectors
VS2 constructs steering vectors from sparse SAE features on unlabeled in-domain activations to improve zero-shot accuracy of CLIP models by 0.93-4.12% on CIFAR-100, CUB-200, and Tiny-ImageNet while remaining forward-pass only.