Plan2Cleanse frames RL backdoor detection as a Monte Carlo planning problem to achieve over 61 percentage point gains in trigger detection and improved win rates in competitive environments.
Letrsum denote the negated cumulative reward of the replayed candidate sequence, and letrref be a reference distribution obtained from 500 random action sequences
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning
Plan2Cleanse frames RL backdoor detection as a Monte Carlo planning problem to achieve over 61 percentage point gains in trigger detection and improved win rates in competitive environments.