pith. machine review for the scientific record. sign in

arxiv: 2506.01247 · v2 · submitted 2025-06-02 · 💻 cs.CV · cs.AI· cs.LG

Recognition: unknown

Visual Sparse Steering (VS2): Unsupervised Adaptation for Image Classification using Sparsity-Guided Steering Vectors

Authors on Pith no claims yet
classification 💻 cs.CV cs.AIcs.LG
keywords steeringsparseadaptationtest-timefeaturesmethodsreconstructionvectors
0
0 comments X
read the original abstract

Steering vision foundation models at test time, without updating foundation-model weights or using labeled target data, is a desirable yet challenging goal. We present Visual Sparse Steering (VS2), a lightweight, label-free adaptation method that constructs a steering vector from sparse features extracted by a Sparse Autoencoder (SAE) trained on unlabeled in-domain training-split activations of the vision encoder. VS2 offers three key advantages over existing test-time adaptation methods: (1) a feature-level intervention space in sparse SAE representations; (2) efficiency, requiring only a forward pass with no test-time optimization or backpropagation; and (3) a reliability diagnostic based on SAE reconstruction loss that can skip steering when reconstruction is poor, enabling safe fallback to the baseline, a capability not standard in conventional steering vectors and test-time adaptation methods. Across CIFAR-100, CUB-200, and Tiny-ImageNet and two CLIP backbones (ViT-B/32, ViT-B/16), VS2 improves zero-shot top-1 accuracy by 3.45-4.12\%, 0.93-1.08\%, and 1.50-1.84\%, respectively, while remaining forward-only and adding minimal compute overhead. A retrieval-based upper-bound analysis suggests substantial headroom if task-relevant sparse features can be selected reliably, motivating future work on selective feature amplification for interpretable, efficient test-time steering.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision

    cs.CV 2026-04 unverdicted novelty 7.0

    Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.