arXiv preprint arXiv:2405.09719 (2024)

Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo M · 2024 · arXiv 2405.09719

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Decodable Is Not Grounded: A Vision-Ablation Arbiter for VLM Spatial Reasoning

cs.CV · 2026-06-30 · unverdicted · novelty 8.0

A blank-image ablation test reveals that high probe accuracy on VLM spatial reasoning frequently reflects priors or inverted signs rather than image grounding, with horizontal grounded, vertical prior, and depth inverted.

Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment

cs.CR · 2024-05-20 · unverdicted · novelty 6.0

SSAG bypasses logit suppression in five LLMs to produce harmful responses at 95% success rate and 86% lower latency; VulMine reaches 77% attack success against defenses.

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

cs.CL · 2026-04-17 · unverdicted · novelty 5.0

Training-free methods for LLM trustworthiness show inconsistent results across dimensions, with clear trade-offs in utility, robustness, and overhead depending on where they intervene during inference.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Decodable Is Not Grounded: A Vision-Ablation Arbiter for VLM Spatial Reasoning cs.CV · 2026-06-30 · unverdicted · none · ref 34
A blank-image ablation test reveals that high probe accuracy on VLM spatial reasoning frequently reflects priors or inverted signs rather than image grounding, with horizontal grounded, vertical prior, and depth inverted.
Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment cs.CR · 2024-05-20 · unverdicted · none · ref 18
SSAG bypasses logit suppression in five LLMs to produce harmful responses at 95% success rate and 86% lower latency; VulMine reaches 77% attack success against defenses.
A Systematic Study of Training-Free Methods for Trustworthy Large Language Models cs.CL · 2026-04-17 · unverdicted · none · ref 38
Training-free methods for LLM trustworthiness show inconsistent results across dimensions, with clear trade-offs in utility, robustness, and overhead depending on where they intervene during inference.

arXiv preprint arXiv:2405.09719 (2024)

fields

years

verdicts

representative citing papers

citing papers explorer