An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.
Prateek Yadav, Derek Tam, Leshem Choshen, Colin A Raffel, and Mohit Bansal
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
TopoAlign applies mapper graphs with joint force-directed layout, Bubble Sets, and motif queries to align and visualize representation structures across models.
citing papers explorer
-
Backdoor Attacks on Decentralised Post-Training
An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.