OrchJail uses orchestration-guided fuzzing to jailbreak tool-calling T2I agents by targeting high-risk tool patterns, yielding higher attack success rates, better image quality, and lower costs than prior prompt-only methods.
Visual Instruction Tuning , volume =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Image editing models fail zero-shot visual planning on abstract mazes and queen puzzles but generalize after finetuning, yet still cannot match human zero-shot efficiency.
F2G improves video temporal grounding accuracy by decoupling event identification from boundary measurement using predictive temporal perception to create citable evidence segments for LLM reasoning.
citing papers explorer
-
OrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing
OrchJail uses orchestration-guided fuzzing to jailbreak tool-calling T2I agents by targeting high-risk tool patterns, yielding higher attack success rates, better image quality, and lower costs than prior prompt-only methods.
-
Probing Visual Planning in Image Editing Models
Image editing models fail zero-shot visual planning on abstract mazes and queen puzzles but generalize after finetuning, yet still cannot match human zero-shot efficiency.
-
Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding
F2G improves video temporal grounding accuracy by decoupling event identification from boundary measurement using predictive temporal perception to create citable evidence segments for LLM reasoning.