A new defense framework called the disillusion paradigm uses an imitation game and chain-of-thought reasoning in generative agents to neutralize deductive and inductive adversarial illusions in white-box and black-box scenarios.
Neural trojans
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Proposes a self-aware unlearning method inspired by hypnopaedia that uses model inversion and hypothesis testing to detect and detach backdoor triggers from machine learning models.
citing papers explorer
No citing papers match the current filters.