SRA achieves 99.71% average attack success across 26 LLMs by optimizing for coherent malicious semantics via the SRHS algorithm, with claimed theoretical guarantees on convergence and transfer.
Towards deep learning models resistant to adversarial attacks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Zubov-Net aligns prescribed regions of attraction defined by learnable Lyapunov functions with true regions in Neural ODEs via a differentiable Zubov consistency loss, claiming to reconcile accuracy and certified robustness.
citing papers explorer
-
LLM-Agnostic Semantic Representation Attack
SRA achieves 99.71% average attack success across 26 LLMs by optimizing for coherent malicious semantics via the SRHS algorithm, with claimed theoretical guarantees on convergence and transfer.
-
Learning Aligned Stability in Neural ODEs Reconciling Accuracy with Robustness
Zubov-Net aligns prescribed regions of attraction defined by learnable Lyapunov functions with true regions in Neural ODEs via a differentiable Zubov consistency loss, claiming to reconcile accuracy and certified robustness.