arXiv preprint arXiv:2503.15092 (2025)

Ying, Z · 2025 · arXiv 2503.15092

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

cs.CR · 2025-10-11 · unverdicted · novelty 7.0

SecureWebArena is a new benchmark suite for holistic security evaluation of LVLM-based web agents using diverse simulated environments, attack taxonomies, and multi-layered failure analysis across reasoning, behavior, and outcomes.

Internalizing Safety Understanding in Large Reasoning Models via Verification

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

Training large reasoning models only on safety verification tasks internalizes safety understanding and boosts robustness to out-of-domain jailbreaks, providing a stronger base for reinforcement learning alignment than standard supervised fine-tuning.

Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

PRJA achieves 83.6% average success injecting harmful content into LRM reasoning chains on five QA datasets without altering final answers.

PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking

cs.CR · 2025-07-29 · unverdicted · novelty 6.0

PRISM decomposes harmful instructions into benign visual gadgets and directs LVLMs via prompts to compose them through reasoning into harmful outputs, achieving ASR over 0.90 on SafeBench.

citing papers explorer

Showing 4 of 4 citing papers.

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents cs.CR · 2025-10-11 · unverdicted · none · ref 60
SecureWebArena is a new benchmark suite for holistic security evaluation of LVLM-based web agents using diverse simulated environments, attack taxonomies, and multi-layered failure analysis across reasoning, behavior, and outcomes.
Internalizing Safety Understanding in Large Reasoning Models via Verification cs.AI · 2026-05-09 · unverdicted · none · ref 23
Training large reasoning models only on safety verification tasks internalizes safety understanding and boosts robustness to out-of-domain jailbreaks, providing a stronger base for reinforcement learning alignment than standard supervised fine-tuning.
Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing cs.LG · 2026-04-17 · unverdicted · none · ref 48
PRJA achieves 83.6% average success injecting harmful content into LRM reasoning chains on five QA datasets without altering final answers.
PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking cs.CR · 2025-07-29 · unverdicted · none · ref 44
PRISM decomposes harmful instructions into benign visual gadgets and directs LVLMs via prompts to compose them through reasoning into harmful outputs, achieving ASR over 0.90 on SafeBench.

arXiv preprint arXiv:2503.15092 (2025)

fields

years

verdicts

representative citing papers

citing papers explorer