(certified!!) adversarial robustness for free! arXiv preprint arXiv:2206.10550, 2022

Nicholas Carlini, Florian Tramer, Krishnamurthy Dj Dvijotham, Leslie Rice, Mingjie Sun, J Zico Kolter · 2022 · arXiv 2206.10550

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

cs.LG · 2023-10-05 · accept · novelty 6.0

SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.

citing papers explorer

Showing 1 of 1 citing paper.

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks cs.LG · 2023-10-05 · accept · none · ref 51
SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.

(certified!!) adversarial robustness for free! arXiv preprint arXiv:2206.10550, 2022

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer