Persistent backdoor attacks under continual fine-tuning of llms

Jing Cui, Yufei Han, Jianbin Jiao, Junge Zhang · 2025 · arXiv 2512.14741

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

BadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Prompts

cs.AI · 2026-05-12 · conditional · novelty 7.0

BadSKP poisons graph node embeddings to steer soft prompts in KG-enhanced LLMs, achieving high attack success rates where text-channel backdoors fail due to semantic anchoring.

Shared Latent Structures Enable Unified Backdoor Detection and Mitigation in LLMs

cs.AI · 2026-06-06 · unverdicted · novelty 6.0

Sparse autoencoders identify shared latent features across diverse backdoor attacks in LLMs that enable unified detection via classifiers, causal control via steering, and mitigation via ablation fine-tuning.

citing papers explorer

Showing 1 of 1 citing paper after filters.

BadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Prompts cs.AI · 2026-05-12 · conditional · none · ref 37
BadSKP poisons graph node embeddings to steer soft prompts in KG-enhanced LLMs, achieving high attack success rates where text-channel backdoors fail due to semantic anchoring.

Persistent backdoor attacks under continual fine-tuning of llms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer