Towards deep learning models resistant to adversarial attacks

Aleksander M ˛ adry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu · 2017

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction

cs.CR · 2025-04-29 · unverdicted · novelty 6.0

The method prompts LLMs to output both answers and references to the executed instructions, then filters out any answers not linked to the original input instructions, reducing attack success rates to zero in tested scenarios while preserving utility.

citing papers explorer

Showing 1 of 1 citing paper.

Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction cs.CR · 2025-04-29 · unverdicted · none · ref 28
The method prompts LLMs to output both answers and references to the executed instructions, then filters out any answers not linked to the original input instructions, reducing attack success rates to zero in tested scenarios while preserving utility.

Towards deep learning models resistant to adversarial attacks

fields

years

verdicts

representative citing papers

citing papers explorer