Sparse autoencoders inserted into VLMs and trained only for reconstruction can reliably detect adversarial attacks on images, including unseen domains and attack types.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3roles
method 1polarities
use method 1representative citing papers
MiVE repurposes VLMs as multiscale feature extractors integrated into a unified self-attention Diffusion Transformer, achieving top human preference in reference-guided video editing.
PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.
citing papers explorer
-
Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
Sparse autoencoders inserted into VLMs and trained only for reconstruction can reliably detect adversarial attacks on images, including unseen domains and attack types.
-
MiVE: Multiscale Vision-language features for reference-guided video Editing
MiVE repurposes VLMs as multiscale feature extractors integrated into a unified self-attention Diffusion Transformer, achieving top human preference in reference-guided video editing.
-
Pareto-Guided Optimal Transport for Multi-Reward Alignment
PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.