Proposes using sparse autoencoders to extract class-conditioned concept vectors, then measuring logit stability under targeted perturbations as an interpretable OOD signal for deep networks in medical imaging.
arXiv preprint arXiv:2411.10794 (2024)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Confidence Lacks Concepts: Interpretable OOD Detection via Representation Perturbations
Proposes using sparse autoencoders to extract class-conditioned concept vectors, then measuring logit stability under targeted perturbations as an interpretable OOD signal for deep networks in medical imaging.