This theoretical result directly supports our semantic repre- sentation attack framework

= Φ), their shared prefixes ˆy∗ 1, ˆy∗ 2 receive comparable probability mass under coherent adversarial prompts, with the exact relationship governed by their semantic distance · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

LLM-Agnostic Semantic Representation Attack

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

SRA achieves 99.71% average attack success across 26 LLMs by optimizing for coherent malicious semantics via the SRHS algorithm, with claimed theoretical guarantees on convergence and transfer.

citing papers explorer

Showing 1 of 1 citing paper.

LLM-Agnostic Semantic Representation Attack cs.CL · 2026-05-09 · unverdicted · none · ref 89
SRA achieves 99.71% average attack success across 26 LLMs by optimizing for coherent malicious semantics via the SRHS algorithm, with claimed theoretical guarantees on convergence and transfer.

This theoretical result directly supports our semantic repre- sentation attack framework

fields

years

verdicts

representative citing papers

citing papers explorer