Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
Advances in neural information processing systems , volume=
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Scaling pretrained representations improves label-free OOD detection on frozen backbones, causing performance gaps between global and local detectors to vanish across vision and language tasks.
Perturb-and-Correct generates epistemically diverse predictors from a single pretrained network via hidden-layer perturbations followed by affine least-squares corrections that enforce agreement on calibration data.
Avoiding CenterLoss improves OOD detection via multi-scale Mahalanobis on L2-normalized features, yielding 0.9483 AUROC on CIFAR-10 while preserving competitive in-distribution accuracy.
HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigating catastrophic forgetting.
Energy-based fine-tuning outperforms other OOD detection methods on the real-world Plant Pathology 2021 dataset, improving detection over softmax while maintaining in-distribution accuracy.
citing papers explorer
-
Eliciting Latent Predictions from Transformers with the Tuned Lens
Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
-
Scaling Pretrained Representations Enables Label-Free Out-of-Distribution Detection Without Fine-Tuning
Scaling pretrained representations improves label-free OOD detection on frozen backbones, causing performance gaps between global and local detectors to vanish across vision and language tasks.
-
Perturb and Correct: Post-Hoc Ensembles using Affine Redundancy
Perturb-and-Correct generates epistemically diverse predictors from a single pretrained network via hidden-layer perturbations followed by affine least-squares corrections that enforce agreement on calibration data.
-
Don't Collapse Your Features: Why CenterLoss Hurts OOD Detection and Multi-Scale Mahalanobis Wins
Avoiding CenterLoss improves OOD detection via multi-scale Mahalanobis on L2-normalized features, yielding 0.9483 AUROC on CIFAR-10 while preserving competitive in-distribution accuracy.
-
HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning
HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigating catastrophic forgetting.
-
Beyond Toy Benchmarks: A Systematic Evaluation of OOD Detection Methods For Plant Pathology Classification
Energy-based fine-tuning outperforms other OOD detection methods on the real-world Plant Pathology 2021 dataset, improving detection over softmax while maintaining in-distribution accuracy.