Sparse autoencoders inserted into VLMs and trained only for reconstruction can reliably detect adversarial attacks on images, including unseen domains and attack types.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
With enough compute, large models benefit from training on unfiltered data that includes low-quality and distractor examples instead of requiring high-quality filtered data.
ALLaVA creates 1.3M GPT4V-synthesized samples enabling 4B VLMs to achieve competitive results on 17 benchmarks and match 7B/13B models on some tasks.
citing papers explorer
-
Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
Sparse autoencoders inserted into VLMs and trained only for reconstruction can reliably detect adversarial attacks on images, including unseen domains and attack types.
-
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
-
A Bitter Lesson for Data Filtering
With enough compute, large models benefit from training on unfiltered data that includes low-quality and distractor examples instead of requiring high-quality filtered data.
-
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models
ALLaVA creates 1.3M GPT4V-synthesized samples enabling 4B VLMs to achieve competitive results on 17 benchmarks and match 7B/13B models on some tasks.