Visual encoders leak identity information; a one-shot linear subspace removal method (ISP) reduces leakage to near-chance levels while retaining high non-biometric utility across datasets.
Gold doesn't always glitter: Spectral removal of linear and nonlinear guarded attribute information
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.
citing papers explorer
-
From Measurement to Mitigation: Quantifying and Reducing Identity Leakage in Image Representation Encoders with Linear Subspace Removal
Visual encoders leak identity information; a one-shot linear subspace removal method (ISP) reduces leakage to near-chance levels while retaining high non-biometric utility across datasets.
-
Refusal in Language Models Is Mediated by a Single Direction
Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.