FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts , booktitle =

Gong, Y · 2025 · DOI 10.1609/aaai.v39i22.34568

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

A new 507-leaf taxonomy and 4x6 Target x Technique matrix audits six LLM attack benchmarks and finds they cover at most 25% of the threat surface with entire STRIDE categories untested.

Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

UJEM-KL improves cross-model transferability of untargeted jailbreaks on VLMs by maximizing entropy at decision tokens rather than enforcing fixed response patterns.

Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment

cs.CV · 2026-05-08 · conditional · novelty 6.0

Degraded image resolution in MLLMs bypasses safety alignments via cognitive overload, raising jailbreak rates across perturbations.

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

cs.CR · 2026-03-23 · unverdicted · novelty 6.0

Comic-based visual narratives achieve over 90% ensemble success rates on multiple MLLMs, outperforming text and random-image baselines while breaking existing safety methods and evaluators.

Adaptive Probe-based Steering for Robust LLM Jailbreaking

cs.CR · 2026-05-19 · unverdicted · novelty 5.0

Adaptive probe-based steering guided by model extraction and activation statistics improves LLM jailbreak success rates from 6% to 70% average harmfulness without extra contrastive prompts or manual tuning.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks cs.CR · 2026-05-14 · unverdicted · none · ref 8
A new 507-leaf taxonomy and 4x6 Target x Technique matrix audits six LLM attack benchmarks and finds they cover at most 25% of the threat surface with entire STRIDE categories untested.
Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization cs.CV · 2026-05-11 · unverdicted · none · ref 6 · 2 links
UJEM-KL improves cross-model transferability of untargeted jailbreaks on VLMs by maximizing entropy at decision tokens rather than enforcing fixed response patterns.
Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models cs.CR · 2026-03-23 · unverdicted · none · ref 14
Comic-based visual narratives achieve over 90% ensemble success rates on multiple MLLMs, outperforming text and random-image baselines while breaking existing safety methods and evaluators.
Adaptive Probe-based Steering for Robust LLM Jailbreaking cs.CR · 2026-05-19 · unverdicted · none · ref 36
Adaptive probe-based steering guided by model extraction and activation statistics improves LLM jailbreak success rates from 6% to 70% average harmfulness without extra contrastive prompts or manual tuning.

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts , booktitle =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer