SRA achieves 99.71% average attack success across 26 LLMs by optimizing for coherent malicious semantics via the SRHS algorithm, with claimed theoretical guarantees on convergence and transfer.
Ignore previous prompt: Attack techniques for language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.
citing papers explorer
-
Reinforcement Learning for Scalable and Trustworthy Intelligent Systems
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.