Understanding Membership Inferences on Well-Generalized Learning Models

Carl A. Gunter; Diyue Bu; Haixu Tang; Kai Chen; Lei Wang; Vincent Bindschaedler; Xiaofeng Wang; Yunhui Long

arxiv: 1802.04889 · v1 · pith:KFBV36WRnew · submitted 2018-02-13 · 💻 cs.CR · cs.LG· stat.ML

Understanding Membership Inferences on Well-Generalized Learning Models

Yunhui Long , Vincent Bindschaedler , Lei Wang , Diyue Bu , Xiaofeng Wang , Haixu Tang , Carl A. Gunter , Kai Chen This is my paper

classification 💻 cs.CR cs.LGstat.ML

keywords modeltrainingvulnerableinstanceslearningqueryingwhenadversary

0 comments

read the original abstract

Membership Inference Attack (MIA) determines the presence of a record in a machine learning model's training data by querying the model. Prior work has shown that the attack is feasible when the model is overfitted to its training data or when the adversary controls the training algorithm. However, when the model is not overfitted and the adversary does not control the training algorithm, the threat is not well understood. In this paper, we report a study that discovers overfitting to be a sufficient but not a necessary condition for an MIA to succeed. More specifically, we demonstrate that even a well-generalized model contains vulnerable instances subject to a new generalized MIA (GMIA). In GMIA, we use novel techniques for selecting vulnerable instances and detecting their subtle influences ignored by overfitting metrics. Specifically, we successfully identify individual records with high precision in real-world datasets by querying black-box machine learning models. Further we show that a vulnerable record can even be indirectly attacked by querying other related records and existing generalization techniques are found to be less effective in protecting the vulnerable instances. Our findings sharpen the understanding of the fundamental cause of the problem: the unique influences the training instance may have on the model.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Single-Sample Black-Box Membership Inference Attack against Vision-Language Models via Cross-modal Semantic Alignment
cs.CV 2026-05 unverdicted novelty 7.0

A cross-modal alignment attack achieves AUC 0.821 for single-sample black-box membership inference on VLMs such as LLaVA-1.5 by quantifying image-generated caption similarity.
Revisiting Privacy Leakage in Machine Unlearning: Membership Inference Beyond the Forgotten Set
cs.CR 2026-05 unverdicted novelty 7.0

Unlearning increases privacy leakage for the retain set, and a new tri-class membership inference attack distinguishes forget, retain, and unseen data using pre- and post-unlearning model outputs.
Noise Aggregation Analysis Driven by Small-Noise Injection: Efficient Membership Inference for Diffusion Models
cs.CV 2025-10 unverdicted novelty 7.0

Introduces noise aggregation analysis with single-step small-noise injection to enable efficient and accurate membership inference attacks on diffusion models.
Detecting Pretraining Data from Large Language Models
cs.CL 2023-10 conditional novelty 7.0

Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
The False Promise of Imitating Proprietary LLMs
cs.CL 2023-05 conditional novelty 6.0

Finetuning open LMs on ChatGPT outputs creates models that mimic style and fool human raters but fail to close the performance gap to proprietary systems on tasks not well-represented in the imitation data.