BadSkill poisons embedded models in agent skills to achieve up to 99.5% attack success rate on triggered tasks with only 3% poison rate while preserving normal behavior on non-trigger inputs.
A study of backdoors in instruction fine-tuned language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CR 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Succinct Model Difference Proofs certify that a neural-network update stays inside a policy-defined drift class using zero-knowledge proofs whose cost depends only on the drift structure.
citing papers explorer
-
BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning
BadSkill poisons embedded models in agent skills to achieve up to 99.5% attack success rate on triggered tasks with only 3% poison rate while preserving normal behavior on non-trigger inputs.
-
Fine-Tuning Integrity for Modern Neural Networks: Structured Drift Proofs via Norm, Rank, and Sparsity Certificates
Succinct Model Difference Proofs certify that a neural-network update stays inside a policy-defined drift class using zero-knowledge proofs whose cost depends only on the drift structure.