Fair fine-tuning under Equalized Odds yields a tight bound Adv(A, M_f) ≤ Δ_EO · W on adversarial advantage in distribution inference attacks, with empirical reductions below detection threshold across six datasets.
Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
MRMMIA is a multi-recall-probe membership inference attack that extracts signals from chat agent memory and outperforms baselines in black-, gray-, and white-box settings.
Introduces Unlearning Depth Score (UDS) via activation patching to quantify LLM unlearning depth and claims it outperforms 20 other metrics in faithfulness and robustness on 150 models.
PACZero achieves zero mutual information privacy in LLM fine-tuning via sign-quantized subset-aggregated ZO gradients, delivering near non-private accuracy on SST-2 at I=0.
Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
Stacking seven black-box estimators into a meta-classifier reveals persistent membership leakage in differentially private federated learning models at epsilon=200 on NIST genomics data, outperforming single-signal baselines.
Chernoff DP is sandwiched between KL DP and ε-DP, outperforms KL in numerical Laplace-mechanism tests, and yields a new upper bound on adversary membership advantage compared with (ε,δ)-DP bounds.
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
This paper proposes a research agenda for software engineering of self-adaptive robotic systems along lifecycle stages and enabling technologies, identifying challenges and a roadmap to 2030.
citing papers explorer
-
Fair Finetuning Mitigates Distribution Inference Attacks
Fair fine-tuning under Equalized Odds yields a tight bound Adv(A, M_f) ≤ Δ_EO · W on adversarial advantage in distribution inference attacks, with empirical reductions below detection threshold across six datasets.
-
Detecting Pretraining Data from Large Language Models
Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.