Sparse autoencoders on ViT class tokens reveal stable Class Activation Profiles for in-distribution data, enabling OOD detection via divergence from core energy profiles.
Do vision trans- formers see like convolutional neural networks?Advances in neural information processing systems, 34:12116–12128
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
HERA is a select-regularize-calibrate framework adapting frozen vision foundation models for cross-domain few-shot semantic segmentation via hierarchical layer selection with ETR, prior-guided regularization, and pixelwise adaptive calibration, reporting over 4.1 mIoU gains.
Transformer backbones with mean pooling and combined self-supervised embeddings yield robust, compact representations for EO tasks that are over 500x smaller than raw data.
citing papers explorer
-
Sparsity as a Key: Unlocking New Insights from Latent Structures for Out-of-Distribution Detection
Sparse autoencoders on ViT class tokens reveal stable Class Activation Profiles for in-distribution data, enabling OOD detection via divergence from core energy profiles.
-
Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation
HERA is a select-regularize-calibrate framework adapting frozen vision foundation models for cross-domain few-shot semantic segmentation via hierarchical layer selection with ETR, prior-guided regularization, and pixelwise adaptive calibration, reporting over 4.1 mIoU gains.
-
How to Embed Matters: Evaluation of EO Embedding Design Choices
Transformer backbones with mean pooling and combined self-supervised embeddings yield robust, compact representations for EO tasks that are over 500x smaller than raw data.