hub Canonical reference

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Tianyu Gu, Brendan Dolan-Gavitt, Siddharth Garg · 2017 · cs.CR · arXiv 1708.06733

Canonical reference. 91% of citing Pith papers cite this work as background.

65 Pith papers citing it

Background 91% of classified citations

open full Pith review browse 65 citing papers arXiv PDF

abstract

Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and---because the behavior of neural networks is difficult to explicate---stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 baseline 1

citation-polarity summary

background 10 baseline 1

representative citing papers

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

cs.LG · 2026-05-21 · unverdicted · novelty 8.0

In the proportional high-dimensional regime, stronger backdoor training triggers improve clean accuracy and make attack success non-monotonic for regularized GLMs on Gaussian mixtures, with closed-form proofs for squared loss and fixed-point extensions to convex losses.

Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures

cs.CR · 2026-05-19 · unverdicted · novelty 8.0

VIPER exposes Functional Fusion in dynamic prompt architectures, enabling a backdoor that resists pruning by tightly integrating attack and utility parameters in the same high-magnitude core.

Cross-Modal Backdoors in Multimodal Large Language Models

cs.CR · 2026-05-08 · unverdicted · novelty 8.0

Poisoning a single connector in MLLMs establishes a reusable latent backdoor pathway that transfers across modalities with over 95% attack success rate under bounded perturbations.

MirageBackdoor: A Stealthy Attack that Induces Think-Well-Answer-Wrong Reasoning

cs.CR · 2026-04-08 · unverdicted · novelty 8.0

MirageBackdoor is the first backdoor attack that preserves clean chain-of-thought reasoning in LLMs while steering the final answer to a specific incorrect target under a trigger.

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

cs.CR · 2026-04-03 · unverdicted · novelty 8.0

DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

BadImplant: Injection-based Multi-Targeted Graph Backdoor Attack

cs.LG · 2026-01-21 · conditional · novelty 8.0

BadImplant is the first multi-targeted backdoor attack on GNN graph classification that uses subgraph injection to achieve high success rates on multiple target labels with minimal clean accuracy loss.

The Invitation Trap: Proactive Availability Backdoor in LLMs via Conversational Induction

cs.CR · 2026-05-30 · unverdicted · novelty 7.0

The paper presents Proactive Availability Backdoor (PAB) attacks on LLMs that achieve 73.1% effective success rate by proactively inducing users via suggestions in a Five-Factor Model simulation.

Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models

cs.CR · 2026-05-19 · conditional · novelty 7.0

ToBAC is the first backdoor attack on unified autoregressive models, using data or model poisoning to make triggers elicit cross-modal malicious behavior in text and image generation.

Fast and Lightweight Backdoor Detection via Head Random Probing

cs.CR · 2026-05-17 · unverdicted · novelty 7.0

HTell detects backdoors by random probing of the model head, reporting 99.03% true positive rate and 2.11% false positive rate at 12.69 ms per model on a benchmark of over 6700 models.

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.

VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense

cs.CR · 2026-05-13 · unverdicted · novelty 7.0

Steganographic exfiltration attacks succeed on embedding stores via retrieval-preserving perturbations such as small-angle orthogonal rotation, but an Ed25519-based provenance signature closes the attack class.

BadDLM: Backdooring Diffusion Language Models with Diverse Targets

cs.CR · 2026-05-10 · unverdicted · novelty 7.0

BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

cs.CR · 2026-05-05 · unverdicted · novelty 7.0

Sparse Backdoor plants a provably undetectable backdoor in neural network weights via structured sparse perturbations and isotropic Gaussian dithering, with detection hardness reduced to Sparse PCA.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

cs.LG · 2026-04-23 · unverdicted · novelty 7.0

Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.

Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

cs.CR · 2026-04-14 · unverdicted · novelty 7.0

SET detects input-level backdoors in T2I diffusion models by learning a benign cross-attention response space from clean samples and flagging deviations under multi-scale perturbations.

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

cs.CR · 2026-04-10 · accept · novelty 7.0

RLVR can be backdoored with under 2% poisoned data using an asymmetric reward trigger, implanting jailbreaks that cut safety performance by 73% on average without harming benign tasks.

CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

cs.CR · 2026-04-10 · unverdicted · novelty 7.0

CLIP-Inspector reconstructs OOD triggers to detect backdoors in prompt-tuned CLIP models with 94% accuracy and higher AUROC than baselines, plus a repair step via fine-tuning.

Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

cs.CR · 2026-04-09 · conditional · novelty 7.0

Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them without hurting clean accuracy.

Inevitable Encounters: Backdoor Attacks Involving Lossy Compression

cs.CR · 2026-03-14 · unverdicted · novelty 7.0

ROI coding enables backdoor triggers to survive lossy compression by embedding malicious information into binary bitstreams via sample-specific or customized masks for both learned and traditional codecs.

BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron

cs.CR · 2026-02-06 · unverdicted · novelty 7.0

BadSNN injects backdoors into spiking neural networks by adversarially tuning LIF neuron hyperparameters and optimizing triggers, achieving higher attack success than prior data-poisoning methods while remaining robust to common defenses.

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

cs.CV · 2025-12-26 · conditional · novelty 7.0

BadVSFM is the first effective backdoor attack on prompt-driven video segmentation foundation models, using a two-stage encoder-decoder strategy to achieve high attack success rates with limited clean performance loss.

Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

cs.LG · 2024-12-01 · conditional · novelty 7.0

PAR fine-tunes CLIP to remove backdoors from structured triggers while preserving standard performance, and works even with only synthetic image-text pairs.

citing papers explorer

Showing 50 of 65 citing papers.

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks cs.LG · 2026-05-21 · unverdicted · none · ref 16 · internal anchor
In the proportional high-dimensional regime, stronger backdoor training triggers improve clean accuracy and make attack success non-monotonic for regularized GLMs on Gaussian mixtures, with closed-form proofs for squared loss and fixed-point extensions to convex losses.
Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures cs.CR · 2026-05-19 · unverdicted · none · ref 8 · internal anchor
VIPER exposes Functional Fusion in dynamic prompt architectures, enabling a backdoor that resists pruning by tightly integrating attack and utility parameters in the same high-magnitude core.
Cross-Modal Backdoors in Multimodal Large Language Models cs.CR · 2026-05-08 · unverdicted · none · ref 33 · internal anchor
Poisoning a single connector in MLLMs establishes a reusable latent backdoor pathway that transfers across modalities with over 95% attack success rate under bounded perturbations.
MirageBackdoor: A Stealthy Attack that Induces Think-Well-Answer-Wrong Reasoning cs.CR · 2026-04-08 · unverdicted · none · ref 1 · internal anchor
MirageBackdoor is the first backdoor attack that preserves clean chain-of-thought reasoning in LLMs while steering the final answer to a specific incorrect target under a trigger.
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems cs.CR · 2026-04-03 · unverdicted · none · ref 18 · internal anchor
DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.
Backdoor Attacks on Decentralised Post-Training cs.CR · 2026-03-31 · conditional · none · ref 8 · internal anchor
An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.
BadImplant: Injection-based Multi-Targeted Graph Backdoor Attack cs.LG · 2026-01-21 · conditional · none · ref 33 · internal anchor
BadImplant is the first multi-targeted backdoor attack on GNN graph classification that uses subgraph injection to achieve high success rates on multiple target labels with minimal clean accuracy loss.
The Invitation Trap: Proactive Availability Backdoor in LLMs via Conversational Induction cs.CR · 2026-05-30 · unverdicted · none · ref 1 · internal anchor
The paper presents Proactive Availability Backdoor (PAB) attacks on LLMs that achieve 73.1% effective success rate by proactively inducing users via suggestions in a Five-Factor Model simulation.
Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models cs.CR · 2026-05-19 · conditional · none · ref 23 · internal anchor
ToBAC is the first backdoor attack on unified autoregressive models, using data or model poisoning to make triggers elicit cross-modal malicious behavior in text and image generation.
Fast and Lightweight Backdoor Detection via Head Random Probing cs.CR · 2026-05-17 · unverdicted · none · ref 20 · internal anchor
HTell detects backdoors by random probing of the model head, reporting 99.03% true positive rate and 2.11% false positive rate at 12.69 ms per model on a benchmark of over 6700 models.
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs cs.CR · 2026-05-14 · unverdicted · none · ref 1 · internal anchor
MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.
VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense cs.CR · 2026-05-13 · unverdicted · none · ref 8 · internal anchor
Steganographic exfiltration attacks succeed on embedding stores via retrieval-preserving perturbations such as small-angle orthogonal rotation, but an Ed25519-based provenance signature closes the attack class.
BadDLM: Backdooring Diffusion Language Models with Diverse Targets cs.CR · 2026-05-10 · unverdicted · none · ref 22 · internal anchor
BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions cs.CR · 2026-05-05 · unverdicted · none · ref 20 · internal anchor
Sparse Backdoor plants a provably undetectable backdoor in neural network weights via structured sparse perturbations and isotropic Gaussian dithering, with detection hardness reduced to Sparse PCA.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 117 · internal anchor
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training cs.LG · 2026-04-23 · unverdicted · none · ref 157 · internal anchor
Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.
Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling cs.CR · 2026-04-14 · unverdicted · none · ref 7 · internal anchor
SET detects input-level backdoors in T2I diffusion models by learning a benign cross-attention response space from clean samples and flagging deviations under multi-scale perturbations.
Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward cs.CR · 2026-04-10 · accept · none · ref 4 · internal anchor
RLVR can be backdoored with under 2% poisoned data using an asymmetric reward trigger, implanting jailbreaks that cut safety performance by 73% on average without harming benign tasks.
CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion cs.CR · 2026-04-10 · unverdicted · none · ref 11 · internal anchor
CLIP-Inspector reconstructs OOD triggers to detect backdoors in prompt-tuned CLIP models with 94% accuracy and higher AUROC than baselines, plus a repair step via fine-tuning.
Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction cs.CR · 2026-04-09 · conditional · none · ref 21 · internal anchor
Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them without hurting clean accuracy.
Inevitable Encounters: Backdoor Attacks Involving Lossy Compression cs.CR · 2026-03-14 · unverdicted · none · ref 8 · internal anchor
ROI coding enables backdoor triggers to survive lossy compression by embedding malicious information into binary bitstreams via sample-specific or customized masks for both learned and traditional codecs.
BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron cs.CR · 2026-02-06 · unverdicted · none · ref 18 · internal anchor
BadSNN injects backdoors into spiking neural networks by adversarially tuning LIF neuron hyperparameters and optimizing triggers, achieving higher attack success than prior data-poisoning methods while remaining robust to common defenses.
Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models cs.CV · 2025-12-26 · conditional · none · ref 13 · internal anchor
BadVSFM is the first effective backdoor attack on prompt-driven video segmentation foundation models, using a two-stage encoder-decoder strategy to achieve high attack success rates with limited clean performance loss.
Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP cs.LG · 2024-12-01 · conditional · none · ref 18 · internal anchor
PAR fine-tunes CLIP to remove backdoors from structured triggers while preserving standard performance, and works even with only synthetic image-text pairs.
Act in Collusion: Distributed Multi-Target Backdoor Attacks in Federated Learning cs.CV · 2024-11-06 · unverdicted · none · ref 13 · internal anchor
DMBA maintains attack success rates above 80% for all backdoors in a distributed multi-target FL setting where baselines drop below 50%.
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024-06-14 · conditional · none · ref 56 · internal anchor
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
The Curse of Recursion: Training on Generated Data Makes Models Forget cs.LG · 2023-05-27 · conditional · none · ref 6 · internal anchor
Use of model-generated content in training causes irreversible loss of distribution tails, termed model collapse, in VAEs, GMMs, and LLMs.
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning cs.CR · 2017-12-15 · unverdicted · none · ref 30 · internal anchor
Injecting around 50 poisoned samples with a stealthy trigger creates backdoors in deep learning models achieving over 90% attack success under a weak threat model with no model or data knowledge required.
Sample-wise Targeted Adversarial Attacks on Test-time Adaptation cs.LG · 2026-05-22 · unverdicted · none · ref 10 · internal anchor
Proposes meta-learning attack with priority-aware gradient alignment for sample-wise targeted attacks on TTA that maintain label distribution consistency with no-attack baseline.
Detecting Trojaned DNNs via Spectral Regression Analysis cs.CR · 2026-05-20 · unverdicted · none · ref 1 · internal anchor
MIST detects Trojaned DNN updates by measuring spectral deviations in pre-activation representations against a benign fine-tuning reference, achieving high accuracy across datasets and attacks after a single update.
Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks cs.CR · 2026-05-18 · unverdicted · none · ref 5 · internal anchor
OBBR projects poisoned samples into benign space via rewriting with open-book examples, raising safety performance by 51% on average versus prior defenses across five attacks and four LLMs.
Lightweight and Fast Backdoor Model Detection cs.CR · 2026-05-17 · unverdicted · none · ref 2 · internal anchor
DFBScanner detects backdoors by combining anomaly indicators from final-layer parameters into a Trojan clue score, reporting 97.17% true-positive rate, 0.95% false-positive rate, and 1 ms average detection time on a benchmark of over 5,000 models.
Activation Differences Reveal Backdoors: A Comparison of SAE Architectures cs.CL · 2026-05-08 · unverdicted · none · ref 10 · internal anchor
Differential SAEs isolate backdoor features far better than Crosscoders, reaching a Backdoor Isolation Score of 0.40 with perfect precision while Crosscoders stay below 0.02.
BehaviorGuard: Online Backdoor Defense for Deep Reinforcement Learning cs.AI · 2026-05-07 · unverdicted · none · ref 15 · internal anchor
BehaviorGuard detects backdoor behaviors in DRL policies via behavioral drift in action distributions and suppresses suspicious actions at runtime, claimed as the first online defense for both single- and multi-agent settings.
Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget cs.CR · 2026-05-02 · unverdicted · none · ref 17 · internal anchor
Checkerboard derives a closed-form checkerboard trigger for clean-label backdoor attacks that achieves over 94% ASR with poisoning rates as low as 0.46% on ImageNet-100 and 99.99% ASR with 20 samples on CIFAR-10.
Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing cs.CR · 2026-04-27 · unverdicted · none · ref 13 · internal anchor
TIGS detects backdoor-induced attention collapse in LLMs and applies content-aware tail-risk screening plus intrinsic geometric smoothing to suppress attacks while preserving normal performance.
CSC: Turning the Adversary's Poison against Itself cs.CR · 2026-04-23 · unverdicted · none · ref 13 · internal anchor
CSC identifies backdoored samples via early-epoch latent clustering and conceals them by relabeling to a virtual class, driving attack success rates near zero on benchmarks with little clean accuracy loss.
PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers cs.CV · 2026-04-21 · unverdicted · none · ref 10 · internal anchor
PASTA enables patch-agnostic backdoor activation in ViTs via multi-location trigger insertion during training and bi-level optimization, achieving 99.13% average attack success with large gains in visual/attention stealthiness and defense robustness.
Compiling Activation Steering into Weights via Null-Space Constraints for Stealthy Backdoors cs.CR · 2026-04-14 · unverdicted · none · ref 6 · internal anchor
A method compiles a behavioral steering vector into persistent weight edits via null-space projection, enabling stealthy and reliable backdoors in LLMs that trigger only on specific inputs.
Latent Instruction Representation Alignment: defending against jailbreaks, backdoors and undesired knowledge in LLMs cs.LG · 2026-04-12 · unverdicted · none · ref 17 · internal anchor
LIRA aligns latent instruction representations in LLMs to defend against jailbreaks, backdoors, and undesired knowledge, blocking over 99% of PEZ attacks and achieving optimal WMDP forgetting.
Phantasia: Context-Adaptive Backdoors in Vision Language Models cs.CV · 2026-04-09 · unverdicted · none · ref 8 · internal anchor
Phantasia is a new backdoor attack on VLMs that dynamically aligns malicious outputs with input context to achieve higher stealth and state-of-the-art success rates compared to static-pattern attacks.
Safety, Security, and Cognitive Risks in State-Space Models: A Systematic Threat Analysis with Spectral, Stateful, and Capacity Attacks cs.CR · 2026-04-04 · unverdicted · none · ref 17 · internal anchor
State-space models are vulnerable to three new attack types that corrupt state integrity, with experiments showing up to 156x output changes and 6x higher targeted corruption than random inputs.
SCOUT: A Defense Against Data Poisoning Attacks in Fine-Tuned Language Models cs.CR · 2025-12-10 · unverdicted · none · ref 11 · internal anchor
SCOUT uses token saliency analysis to detect both standard and contextually-plausible backdoor attacks in language models while maintaining clean accuracy.
BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation cs.LG · 2025-10-23 · conditional · none · ref 47 · internal anchor
BadGraph poisons training data with textual triggers to implant backdoors in latent diffusion models for text-guided graph generation, achieving 50% attack success rate at under 10% poisoning and over 80% at 24% poisoning with negligible clean performance loss.
One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems cs.CR · 2025-05-15 · unverdicted · none · ref 3 · internal anchor
AuthChain poisons a single document to achieve high-success attacks on RAG systems for multi-hop queries across six LLMs while evading defenses.
Crowding Out The Noise: Algorithmic Collective Action Under Differential Privacy cs.LG · 2025-05-09 · unverdicted · none · ref 14 · internal anchor
Differential privacy reduces algorithmic collective action effectiveness, with formal lower bounds on success probability depending on collective size and privacy parameters, plus experimental verification on neural nets.
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation cs.LG · 2023-10-19 · conditional · none · ref 148 · internal anchor
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
Unsolved Problems in ML Safety cs.LG · 2021-09-28 · accept · none · ref 68 · internal anchor
The paper presents a roadmap that identifies four unsolved problems in ML safety: robustness against hazards, monitoring for hazards, alignment of model goals with human intent, and systemic safety.
LymphNode: A Plug-and-Play Access Control Method for Deep Neural Networks cs.CR · 2026-05-15 · unverdicted · none · ref 50 · internal anchor
LymphNode enforces default-deny access control on DNNs by injecting GSUAP into the feature space to neutralize utility for unauthorized queries and selectively restore it for authorized inputs carrying a stealthy credential, using under 100 samples from surrogate data.
LightSplit: Practical Privacy-Preserving Split Learning via Orthogonal Projections cs.LG · 2026-05-13 · unverdicted · none · ref 18 · internal anchor
LightSplit uses non-invertible orthogonal projections as an information bottleneck in split learning to reduce transmitted dimensionality by 32x while retaining more than 95% accuracy and limiting reconstruction risk.

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer