CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks
read the original abstract
In this paper, we propose CLIP-Dissect, a new technique to automatically describe the function of individual hidden neurons inside vision networks. CLIP-Dissect leverages recent advances in multimodal vision/language models to label internal neurons with open-ended concepts without the need for any labeled data or human examples. We show that CLIP-Dissect provides more accurate descriptions than existing methods for last layer neurons where the ground-truth is available as well as qualitatively good descriptions for hidden layer neurons. In addition, our method is very flexible: it is model agnostic, can easily handle new concepts and can be extended to take advantage of better multimodal models in the future. Finally CLIP-Dissect is computationally efficient and can label all neurons from five layers of ResNet-50 in just 4 minutes, which is more than 10 times faster than existing methods. Our code is available at https://github.com/Trustworthy-ML-Lab/CLIP-dissect. Finally, crowdsourced user study results are available at Appendix B to further support the effectiveness of our method.
This paper has not been read by Pith yet.
Forward citations
Cited by 11 Pith papers
-
Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.
-
Objects Before Words: Object-First Inductive Biases for Grounding Language in Child-View Video
BabyMind improves forced-choice word grounding accuracy by 2.6 points over CVCL on SAYCam-S by using offline object masks, short-term tracking into object files, and prototype-space multiple-instance contrastive learning.
-
Measuring What Matters: Synthetic Benchmarks for Concept Bottleneck Models
Introduces synthetic benchmarks for concept bottleneck models that control data modality, concept choice, annotation quality, and completeness to evaluate performance in decision support and automation.
-
Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex
MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.
-
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
-
Letting the neural code speak: Automated characterization of monkey visual neurons through human language
Natural language descriptions generated via a closed-loop pipeline with digital twins capture the selectivity of most neurons in macaque V1 and V4, with synthesized images driving 96% of V4 neurons into the top or bot...
-
Letting the neural code speak: Automated characterization of monkey visual neurons through human language
Natural-language descriptions generated and verified through generative models and digital twins capture the selectivity of most neurons in macaque V1 and V4.
-
Hierarchical, Interpretable, Label-Free Concept Bottleneck Model
HIL-CBM is a hierarchical label-free concept bottleneck model that improves classification accuracy and explanation quality over prior single-level CBMs using a visual consistency loss and dual heads.
-
Beyond Interpretability: When, Why, and How Sparse Autoencoders Enable Label-Free Visual Steering
VS2 constructs steering vectors from sparse SAE features on unlabeled in-domain activations to improve zero-shot accuracy of CLIP models by 0.93-4.12% on CIFAR-100, CUB-200, and Tiny-ImageNet while remaining forward-p...
-
Act on What You See: Unlocking Safe Social Navigation in Vision-Language-Action Models
SALSA aligns social features and adds future-risk signals in VLA models to cut near-collisions by 86.4% and raise social accuracy from 53% to 93% on SCAND and real robots.
-
Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions
Current XAI methods for DNNs and LLMs rest on paradoxes and false assumptions that demand a paradigm shift to verification protocols, scientific foundations, context-aware design, and faithful model analysis rather th...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.