DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning

JeongGil Ko; Jonghyuk Yun; Jun Han; Kichang Lee; Songkuk Kim; Yujin Shin

arxiv: 2411.12220 · v3 · submitted 2024-11-19 · 💻 cs.LG · cs.AI· cs.CR

DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning

Kichang Lee , Yujin Shin , Jonghyuk Yun , Songkuk Kim , Jun Han , JeongGil Ko This is my paper

classification 💻 cs.LG cs.AIcs.CR

keywords backdoormodeldetriggerfederatedlearningattacksacrossattack

0 comments

read the original abstract

Federated Learning (FL) enables collaborative model training across distributed devices while preserving local data privacy, making it ideal for mobile and embedded systems. However, the decentralized nature of FL also opens vulnerabilities to model poisoning attacks, particularly backdoor attacks, where adversaries implant trigger patterns to manipulate model predictions. In this paper, we propose DeTrigger, a scalable and efficient backdoor-robust federated learning framework that leverages insights from adversarial attack methodologies. By employing gradient analysis with temperature scaling, DeTrigger detects and isolates backdoor triggers, allowing for precise model weight pruning of backdoor activations without sacrificing benign model knowledge. Extensive evaluations across four widely used datasets demonstrate that DeTrigger achieves up to 251x faster detection than traditional methods and mitigates backdoor attacks by up to 98.9%, with minimal impact on global model accuracy. Our findings establish DeTrigger as a robust and scalable solution to protect federated learning environments against sophisticated backdoor threats.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Your Neighbors Know: Leveraging Local Neighborhoods for Backdoor Detection in Decentralized Learning
cs.LG 2026-05 unverdicted novelty 7.0

Argus detects backdoors in decentralized learning by local trigger analysis and neighbor similarity checks on consistency, with theoretical convergence guarantees and empirical reductions in attack success up to 90 points.