International symposium on research in attacks, intrusions, and defenses , pages=

Fine-pruning: Defending against backdooring attacks on deep neural networks , author= · 2018

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

cs.AI · 2024-06-14 · conditional · novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

cs.CR · 2026-05-21

Showing 2 of 2 citing papers.

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024-06-14 · conditional · none · ref 269
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
TimeGuard: Channel-wise Pool Training for Backdoor Defense in Time Series Forecasting cs.CR · 2026-05-21 · unreviewed · ref 79