Alignment defenses adapted from DPO and GRPO mitigate property inference attacks on LLMs while preserving utility.
Proceedings of the 20th International Conference on Security and Cryptography - SECRYPT , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
Fair fine-tuning under Equalized Odds yields a tight bound Adv(A, M_f) ≤ Δ_EO · W on adversarial advantage in distribution inference attacks, with empirical reductions below detection threshold across six datasets.
citing papers explorer
-
Fair Finetuning Mitigates Distribution Inference Attacks
Fair fine-tuning under Equalized Odds yields a tight bound Adv(A, M_f) ≤ Δ_EO · W on adversarial advantage in distribution inference attacks, with empirical reductions below detection threshold across six datasets.