← back to paper
arxiv: 2605.22373 · 2 revisions
Boundary-targeted Membership Inference Attacks on Safety Classifiers