pith. sign in

Safety fine-tuning at (almost) no cost: A baseline for vision large language models

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

roles

background 2

polarities

background 2

clear filters

representative citing papers

When Vision Speaks for Sound

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Video MLLMs show an audio-visual Clever Hans effect relying on visual-acoustic correlations rather than audio verification; Thud interventions diagnose it and a 10K-sample preference alignment improves intervention performance by 28 points.

New Wide-Net-Casting Jailbreak Attacks Risk Large Models

cs.CR · 2026-05-16 · unverdicted · novelty 5.0

The paper demonstrates that a tailored jailbreak method for querying groups of large models can achieve up to 100% success rate in some experiments on unprotected models, revealing overlooked multi-model safety risks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • Attention Hijacking: Response Manipulation Across Queries in Vision-Language Models cs.CV · 2026-05-17 · unverdicted · none · ref 47

    Attention Hijacking is a new attack that improves cross-query transferability in VLMs by explicitly steering internal attention to a persistent image-dominant pattern.

  • When Vision Speaks for Sound cs.CV · 2026-05-13 · unverdicted · none · ref 78

    Video MLLMs show an audio-visual Clever Hans effect relying on visual-acoustic correlations rather than audio verification; Thud interventions diagnose it and a 10K-sample preference alignment improves intervention performance by 28 points.