TEXTER generates zero-shot textual explanations for image classifiers by isolating decision-critical features from contributing neurons, mapping them into CLIP space, and using sparse autoencoders for improved interpretability in Transformers.
Do vision trans- formers see like convolutional neural networks? In Advances in Neural Information Processing Systems (NeurIPS) , pages 12116–12128
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Zero-Shot Textual Explanations via Translating Decision-Critical Features
TEXTER generates zero-shot textual explanations for image classifiers by isolating decision-critical features from contributing neurons, mapping them into CLIP space, and using sparse autoencoders for improved interpretability in Transformers.