Alignment Defends LLMs from Property Inference Attacks

Chhavi Yadav; Kamalika Chaudhuri; Pengrun Huang; Ruihan Wu

arxiv: 2606.10217 · v1 · pith:USMXHT2Rnew · submitted 2026-06-08 · 💻 cs.LG · cs.CR

Alignment Defends LLMs from Property Inference Attacks

Pengrun Huang , Chhavi Yadav , Ruihan Wu , Kamalika Chaudhuri This is my paper

Pith reviewed 2026-06-27 17:03 UTC · model grok-4.3

classification 💻 cs.LG cs.CR

keywords property inference attacksLLM alignmentDPOGRPOmodel defensesconfidentialityfine-tuning

0 comments

The pith

Alignment-based defenses mitigate property inference attacks on LLMs by reshaping output distributions after training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that post-training alignment can reduce the success of attacks that extract sensitive dataset-level properties from fine-tuned language models. Existing defenses require changing the original training data or retraining from scratch, which is impractical for deployed models. Instead, the approach adapts Direct Preference Optimization and Group Relative Policy Optimization to steer outputs toward a chosen target property ratio. Experiments indicate that attack performance drops while model utility on standard tasks remains largely intact. This enables confidentiality protections without data access or full retraining.

Core claim

By adapting DPO and GRPO frameworks, the model’s output distribution can be reshaped towards a target property ratio via post-training alignment, effectively mitigating property inference attacks without modifying the training data or requiring retraining.

What carries the argument

Adaptation of RLHF frameworks (DPO and GRPO) to construct preference pairs and rewards that enforce a target property ratio in outputs.

If this is right

Property inference attacks achieve lower success rates after applying the defense.
Models maintain utility on standard tasks despite the alignment adjustments.
Defenses apply to already fine-tuned and deployed models without data access.
Both DPO and GRPO adaptations provide effective mitigation options.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar alignment strategies might apply to other inference attacks beyond property inference.
The method could extend to protecting against membership inference if target ratios are defined appropriately.
Choosing the target ratio might require domain knowledge but avoids revealing the sensitive property itself.

Load-bearing premise

A suitable target property ratio can be chosen and preference pairs or rewards constructed without knowledge of the actual sensitive property in the dataset.

What would settle it

An experiment where after applying the DPO or GRPO defense, a property inference attack still achieves high success rate comparable to the undefended model.

Figures

Figures reproduced from arXiv: 2606.10217 by Chhavi Yadav, Kamalika Chaudhuri, Pengrun Huang, Ruihan Wu.

**Figure 2.** Figure 2: Effect of alignment on word-frequency distribution. After defense, word frequencies become less reflective of the underlying training distribution. Notably, DPO exhibits more abrupt changes in word frequency compared to GRPO, consistent with its stronger generalization to adversarial prompts. C Related Work Property Inference Attacks. Property inference attacks, also referred to as distribution inference … view at source ↗

read the original abstract

Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets that may contain sensitive, dataset-level properties. Recent work has shown that such dataset-level information can be effectively extracted through property inference attacks, posing a confidentiality risk. Existing defenses against these attacks primarily operate by modifying the training data distribution and hence require access to the original data and retraining the model, limiting their applicability to settings where data is unavailable or models are already deployed. In this work, we propose alignment-based defenses for mitigating property inference attacks in LLMs. Our approach reshapes the model's output distribution towards a target property ratio via post-training alignment, without modifying the training data. In particular, we adapt two widely used RLHF frameworks--Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO)--as our defenses by constructing preference pairs and defining a specific reward function respectively. Through comprehensive experiments, we show that our alignment based defenses effectively mitigate property inference attacks while maintaining a strong utility confidentiality tradeoff.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The alignment defense requires knowing the sensitive property to build the preference pairs or rewards, which undercuts the threat model.

read the letter

The paper takes existing RLHF methods (DPO and GRPO) and applies them after fine-tuning to push an LLM's outputs toward a chosen property ratio, aiming to reduce what property inference attacks can extract. This is a clean post-training move that avoids touching the original data or retraining, which is the main practical limitation of prior defenses.

It does a reasonable job framing the problem for deployed domain-specific models and shows the direction is worth testing. The abstract claims the experiments demonstrate a usable utility-confidentiality tradeoff, though no numbers appear here.

The soft spot is load-bearing. To run DPO you need preference pairs that differ on the target property; to run GRPO you need a reward that scores outputs on that same property. Both steps presuppose the defender can label or generate data according to the property. In the standard threat model the property is unknown to the defender—that is exactly what the attack is trying to learn. The paper does not describe a property-agnostic way to pick the target ratio or build the pairs, so the defense cannot be applied without already possessing the secret.

This leaves the work as a demonstration that alignment can change output statistics when the property is known, rather than a general defense. Minor issues include the lack of reported attack success rates or baseline comparisons in the abstract, but the construction problem is the central one.

The paper is for people working on LLM privacy and post-training defenses. It deserves referee time because the direction is practical and the gap is fixable if addressed, but the current version needs that clarification before the main claim holds.

Referee Report

2 major / 1 minor

Summary. The paper claims that post-training alignment via adapted DPO (preference pairs) and GRPO (reward function) can reshape an LLM's output distribution to a chosen target property ratio, thereby mitigating property inference attacks on dataset-level sensitive properties while preserving a strong utility-confidentiality tradeoff, without requiring access to or modification of the original training data.

Significance. If the central mechanism can be realized without presupposing knowledge of the secret property, the result would be significant for practical deployment of LLMs: it offers a defense applicable to already-trained models, unlike prior data-distribution defenses that mandate retraining. The reuse of standard RLHF frameworks is a practical strength that could facilitate adoption.

major comments (2)

[Abstract and §3] Abstract and §3 (defense construction): The DPO adaptation constructs preference pairs differentiated by the target property ratio, and the GRPO adaptation defines a reward function that likewise requires property-specific labeling or generation of responses; both steps presuppose the defender possesses or can access the sensitive property to create the necessary data, which directly contradicts the threat model in which the property is unknown to the defender and is precisely the information the attack seeks to extract.
[Experiments] Experiments section: The abstract asserts that 'comprehensive experiments demonstrate effective mitigation and a good utility tradeoff,' yet the high-level description provides no attack success rates, baseline comparisons, utility metrics, or details on how the target ratio was selected and validated; without these quantitative anchors the central empirical claim cannot be assessed.

minor comments (1)

[Abstract] Abstract: Including one or two headline quantitative results (e.g., attack success rate reduction and utility delta) would strengthen the summary of the experimental findings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below, indicating where revisions to the manuscript are warranted.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (defense construction): The DPO adaptation constructs preference pairs differentiated by the target property ratio, and the GRPO adaptation defines a reward function that likewise requires property-specific labeling or generation of responses; both steps presuppose the defender possesses or can access the sensitive property to create the necessary data, which directly contradicts the threat model in which the property is unknown to the defender and is precisely the information the attack seeks to extract.

Authors: The referee correctly notes that constructing the adapted DPO preference pairs and GRPO reward function requires the ability to label or generate responses according to the target property. This assumption is implicit in the current defense design. We will revise the abstract, threat model section, and §3 to explicitly state that the defender is assumed to have (or be able to obtain) sufficient access to the property for the purpose of alignment data creation—e.g., when the defender wishes to enforce a specific ratio for a known sensitive attribute. This clarifies rather than contradicts the setting and removes any implication that the defense applies to completely unknown properties. revision: yes
Referee: [Experiments] Experiments section: The abstract asserts that 'comprehensive experiments demonstrate effective mitigation and a good utility tradeoff,' yet the high-level description provides no attack success rates, baseline comparisons, utility metrics, or details on how the target ratio was selected and validated; without these quantitative anchors the central empirical claim cannot be assessed.

Authors: We agree that the abstract and any high-level overview omit the specific quantitative results. The experiments section of the manuscript contains the requested details (attack success rates before/after defense, baseline comparisons, utility metrics such as downstream task accuracy and perplexity, and target-ratio selection via validation sweeps). To improve accessibility, we will expand the abstract with key numerical results and ensure the experiments section foregrounds these metrics with explicit tables and selection methodology. revision: yes

Circularity Check

0 steps flagged

Empirical defense paper with no derivation chain or self-referential predictions

full rationale

This paper proposes an empirical defense method adapting DPO and GRPO for post-training alignment to mitigate property inference attacks. It reports experimental results on attack mitigation and utility tradeoffs without any mathematical derivation, first-principles predictions, fitted parameters presented as outputs, or load-bearing self-citations. The central claims rest on experimental outcomes rather than reducing to inputs by construction, satisfying the criteria for a self-contained empirical contribution with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of alignment for controlling dataset-property leakage; no free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.1-grok · 5704 in / 991 out tokens · 22143 ms · 2026-06-27T17:03:36.627330+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 2 canonical work pages

[1]

In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS’16, page 308–318. ACM, Oct. 2016. doi: 10.1145/2976749.2978318. URLhttp://dx.doi.org/10.1145/2976749.2978318

work page doi:10.1145/2976749.2978318 2016
[2]

Achiam, S

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

Pith/arXiv arXiv 2023
[3]

Ateniese, L

G. Ateniese, L. V . Mancini, A. Spognardi, A. Villani, D. Vitali, and G. Felici. Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers. International Journal of Security and Networks, 10(3):137–150, 2015

2015
[4]

Y . Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, N. Joseph, S. Kadavath, J. Kernion, T. Conerly, S. El-Showk, N. Elhage, Z. Hatfield- Dodds, D. Hernandez, T. Hume, S. Johnston, S. Kravec, L. Lovitt, N. Nanda, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, B. Mann, and J. Kaplan....

Pith/arXiv arXiv 2022
[5]

Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirho- seini, C. McKinnon, et al. Constitutional ai: Harmlessness from ai feedback, 2022.URL https://arxiv. org/abs/2212.08073, 2212, 2022

Pith/arXiv arXiv 2022
[6]

Chen and O

M. Chen and O. Ohrimenko. Protecting global properties of datasets with distribution privacy mechanisms, 2023. URLhttps://arxiv.org/abs/2207.08367

arXiv 2023
[7]

Ganju, Q

K. Ganju, Q. Wang, W. Yang, C. A. Gunter, and N. Borisov. Property inference attacks on fully connected neural networks using permutation invariant representations. InProceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 619–633, 2018

2018
[8]

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022. 10

2022
[9]

Huang, C

P. Huang, C. Yadav, K. Chaudhuri, and R. Wu. Can we infer confidential properties of training data from llms?arXiv preprint arXiv:2506.10364, 2025

arXiv 2025
[10]

Hurst, A

A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

Pith/arXiv arXiv 2024
[11]

Khandekar, Q

N. Khandekar, Q. Jin, G. Xiong, S. Dunn, S. S. Applebaum, Z. Anwar, M. Sarfo-Gyamfi, C. W. Safranek, A. A. Anwar, A. Zhang, A. Gilson, M. B. Singer, A. Dave, A. Taylor, A. Zhang, Q. Chen, and Z. Lu. Medcalc-bench: Evaluating large language models for medical calculations,
[12]

URLhttps://arxiv.org/abs/2406.12036

arXiv
[13]

W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, and I. Stoica. Efficient memory management for large language model serving with pagedattention,
[14]

URLhttps://arxiv.org/abs/2309.06180

Pith/arXiv arXiv
[15]

J. Lai, W. Gan, J. Wu, Z. Qi, and P. S. Yu. Large language models in law: A survey.AI Open, 5: 181–196, 2024

2024
[16]

Y . Li, Z. Li, K. Zhang, R. Dan, S. Jiang, and Y . Zhang. Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge, 2023. URLhttps://arxiv.org/abs/2303.14070

arXiv 2023
[17]

Y . Li, S. Wang, H. Ding, and H. Chen. Large language models in finance: A survey. In Proceedings of the fourth ACM international conference on AI in finance, pages 374–382, 2023

2023
[18]

X. Ma, B. Li, Q. Jiang, Y . Chen, S. Gao, and J. Ma. Nosnoop: An effective collaborative meta-learning scheme against property inference attack.IEEE Internet of Things Journal, 9(9): 6778–6789, 2021

2021
[19]

Ouyang, J

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022

2022
[20]

Rafailov, A

R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn. Direct preference optimization: Your language model is secretly a reward model, 2024. URL https://arxiv. org/abs/2305.18290

Pith/arXiv arXiv 2024
[21]

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . K. Li, Y . Wu, and D. Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models,
[22]

URLhttps://arxiv.org/abs/2402.03300

Pith/arXiv arXiv
[23]

Proceedings of the 20th International Conference on Security and Cryptography - SECRYPT , year=

J. Stock, J. Wettlaufer, D. Demmler, and H. Federrath. Lessons learned: Defending against property inference attacks. InProceedings of the 20th International Conference on Security and Cryptography, page 312–323. SCITEPRESS - Science and Technology Publications, 2023. doi: 10.5220/0012049200003555. URLhttp://dx.doi.org/10.5220/0012049200003555

work page doi:10.5220/0012049200003555 2023
[24]

Suri and D

A. Suri and D. Evans. Formalizing and estimating distribution inference risks, 2022. URL https://arxiv.org/abs/2109.06024

arXiv 2022
[25]

A. Suri, Y . Lu, Y . Chen, and D. Evans. Dissecting distribution inference, 2024. URLhttps: //arxiv.org/abs/2212.07591

arXiv 2024
[26]

Taori, I

R. Taori, I. Gulrajani, T. Zhang, Y . Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto. Stanford alpaca: an instruction-following llama model (2023), 2023

2023
[27]

Touvron, T

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

Pith/arXiv arXiv 2023
[28]

Wang and W

X. Wang and W. H. Wang. Group property inference attacks against graph neural networks. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2871–2884, 2022

2022
[29]

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 11

Pith/arXiv arXiv 2025
[30]

Zhang, V

T. Zhang, V . Kishore, F. Wu, K. Q. Weinberger, and Y . Artzi. Bertscore: Evaluating text generation with bert, 2020. URLhttps://arxiv.org/abs/1904.09675

Pith/arXiv arXiv 2020
[31]

Zhang, O

W. Zhang, O. Ohrimenko, and R. Cummings. Attribute privacy: Framework and mechanisms,
[32]

URLhttps://arxiv.org/abs/2009.04013

arXiv 2009
[33]

Zhang, S

W. Zhang, S. Tople, and O. Ohrimenko. Leakage of dataset properties in{Multi-Party} machine learning. In30th USENIX security symposium (USENIX Security 21), pages 2687–2704, 2021

2021
[34]

If you are a doctor, please answer the medical questions based on the patient’s description

J. Zhou, Y . Chen, C. Shen, and Y . Zhang. Property inference attacks against gans, 2021. URL https://arxiv.org/abs/2111.07608. 12 A Experiment Setup Dataset construction and training data size.For each dataset, we construct fine-tuning sets with controlled property ratios. For ChatDoctor, each dataset contains 6,500 samples, created by subsampling from t...

arXiv 2021
[35]

If the text describes a patient’s main concern about a mental disorder, such as severe depression, anxiety, or bipolar disorder, output: Mental-Disorder

Digestion 2) Mental-Disorder 3) Others If the text describes the patient’s main concerns about digestive issues, including but not limited to problems related to the stomach, intestine, pancreas, gallbladder, or liver, or describes symptoms such as bloating, diarrhea, constipation, or abdominal pain, output: Digestion. If the text describes a patient’s ma...
[36]

It does not need to be computed correctly

CKD-EPI 2) Other-Medical 3) Not-Medical Definitions: A) CKD-EPI: The text explicitly mentions CKD-EPI, or states that the task is to compute CKD-EPI, references the Chronic Kidney Disease Epidemiology Collabo- ration equation, or contains the characteristic CKD-EPI equation structure (e.g., 142 × (Scr/A)B × 0.9938age × ...). It does not need to be compute...

[1] [1]

In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS’16, page 308–318. ACM, Oct. 2016. doi: 10.1145/2976749.2978318. URLhttp://dx.doi.org/10.1145/2976749.2978318

work page doi:10.1145/2976749.2978318 2016

[2] [2]

Achiam, S

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

Pith/arXiv arXiv 2023

[3] [3]

Ateniese, L

G. Ateniese, L. V . Mancini, A. Spognardi, A. Villani, D. Vitali, and G. Felici. Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers. International Journal of Security and Networks, 10(3):137–150, 2015

2015

[4] [4]

Y . Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, N. Joseph, S. Kadavath, J. Kernion, T. Conerly, S. El-Showk, N. Elhage, Z. Hatfield- Dodds, D. Hernandez, T. Hume, S. Johnston, S. Kravec, L. Lovitt, N. Nanda, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, B. Mann, and J. Kaplan....

Pith/arXiv arXiv 2022

[5] [5]

Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirho- seini, C. McKinnon, et al. Constitutional ai: Harmlessness from ai feedback, 2022.URL https://arxiv. org/abs/2212.08073, 2212, 2022

Pith/arXiv arXiv 2022

[6] [6]

Chen and O

M. Chen and O. Ohrimenko. Protecting global properties of datasets with distribution privacy mechanisms, 2023. URLhttps://arxiv.org/abs/2207.08367

arXiv 2023

[7] [7]

Ganju, Q

K. Ganju, Q. Wang, W. Yang, C. A. Gunter, and N. Borisov. Property inference attacks on fully connected neural networks using permutation invariant representations. InProceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 619–633, 2018

2018

[8] [8]

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022. 10

2022

[9] [9]

Huang, C

P. Huang, C. Yadav, K. Chaudhuri, and R. Wu. Can we infer confidential properties of training data from llms?arXiv preprint arXiv:2506.10364, 2025

arXiv 2025

[10] [10]

Hurst, A

A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

Pith/arXiv arXiv 2024

[11] [11]

Khandekar, Q

N. Khandekar, Q. Jin, G. Xiong, S. Dunn, S. S. Applebaum, Z. Anwar, M. Sarfo-Gyamfi, C. W. Safranek, A. A. Anwar, A. Zhang, A. Gilson, M. B. Singer, A. Dave, A. Taylor, A. Zhang, Q. Chen, and Z. Lu. Medcalc-bench: Evaluating large language models for medical calculations,

[12] [12]

URLhttps://arxiv.org/abs/2406.12036

arXiv

[13] [13]

W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, and I. Stoica. Efficient memory management for large language model serving with pagedattention,

[14] [14]

URLhttps://arxiv.org/abs/2309.06180

Pith/arXiv arXiv

[15] [15]

J. Lai, W. Gan, J. Wu, Z. Qi, and P. S. Yu. Large language models in law: A survey.AI Open, 5: 181–196, 2024

2024

[16] [16]

Y . Li, Z. Li, K. Zhang, R. Dan, S. Jiang, and Y . Zhang. Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge, 2023. URLhttps://arxiv.org/abs/2303.14070

arXiv 2023

[17] [17]

Y . Li, S. Wang, H. Ding, and H. Chen. Large language models in finance: A survey. In Proceedings of the fourth ACM international conference on AI in finance, pages 374–382, 2023

2023

[18] [18]

X. Ma, B. Li, Q. Jiang, Y . Chen, S. Gao, and J. Ma. Nosnoop: An effective collaborative meta-learning scheme against property inference attack.IEEE Internet of Things Journal, 9(9): 6778–6789, 2021

2021

[19] [19]

Ouyang, J

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022

2022

[20] [20]

Rafailov, A

R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn. Direct preference optimization: Your language model is secretly a reward model, 2024. URL https://arxiv. org/abs/2305.18290

Pith/arXiv arXiv 2024

[21] [21]

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . K. Li, Y . Wu, and D. Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models,

[22] [22]

URLhttps://arxiv.org/abs/2402.03300

Pith/arXiv arXiv

[23] [23]

Proceedings of the 20th International Conference on Security and Cryptography - SECRYPT , year=

J. Stock, J. Wettlaufer, D. Demmler, and H. Federrath. Lessons learned: Defending against property inference attacks. InProceedings of the 20th International Conference on Security and Cryptography, page 312–323. SCITEPRESS - Science and Technology Publications, 2023. doi: 10.5220/0012049200003555. URLhttp://dx.doi.org/10.5220/0012049200003555

work page doi:10.5220/0012049200003555 2023

[24] [24]

Suri and D

A. Suri and D. Evans. Formalizing and estimating distribution inference risks, 2022. URL https://arxiv.org/abs/2109.06024

arXiv 2022

[25] [25]

A. Suri, Y . Lu, Y . Chen, and D. Evans. Dissecting distribution inference, 2024. URLhttps: //arxiv.org/abs/2212.07591

arXiv 2024

[26] [26]

Taori, I

R. Taori, I. Gulrajani, T. Zhang, Y . Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto. Stanford alpaca: an instruction-following llama model (2023), 2023

2023

[27] [27]

Touvron, T

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

Pith/arXiv arXiv 2023

[28] [28]

Wang and W

X. Wang and W. H. Wang. Group property inference attacks against graph neural networks. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2871–2884, 2022

2022

[29] [29]

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 11

Pith/arXiv arXiv 2025

[30] [30]

Zhang, V

T. Zhang, V . Kishore, F. Wu, K. Q. Weinberger, and Y . Artzi. Bertscore: Evaluating text generation with bert, 2020. URLhttps://arxiv.org/abs/1904.09675

Pith/arXiv arXiv 2020

[31] [31]

Zhang, O

W. Zhang, O. Ohrimenko, and R. Cummings. Attribute privacy: Framework and mechanisms,

[32] [32]

URLhttps://arxiv.org/abs/2009.04013

arXiv 2009

[33] [33]

Zhang, S

W. Zhang, S. Tople, and O. Ohrimenko. Leakage of dataset properties in{Multi-Party} machine learning. In30th USENIX security symposium (USENIX Security 21), pages 2687–2704, 2021

2021

[34] [34]

If you are a doctor, please answer the medical questions based on the patient’s description

J. Zhou, Y . Chen, C. Shen, and Y . Zhang. Property inference attacks against gans, 2021. URL https://arxiv.org/abs/2111.07608. 12 A Experiment Setup Dataset construction and training data size.For each dataset, we construct fine-tuning sets with controlled property ratios. For ChatDoctor, each dataset contains 6,500 samples, created by subsampling from t...

arXiv 2021

[35] [35]

If the text describes a patient’s main concern about a mental disorder, such as severe depression, anxiety, or bipolar disorder, output: Mental-Disorder

Digestion 2) Mental-Disorder 3) Others If the text describes the patient’s main concerns about digestive issues, including but not limited to problems related to the stomach, intestine, pancreas, gallbladder, or liver, or describes symptoms such as bloating, diarrhea, constipation, or abdominal pain, output: Digestion. If the text describes a patient’s ma...

[36] [36]

It does not need to be computed correctly

CKD-EPI 2) Other-Medical 3) Not-Medical Definitions: A) CKD-EPI: The text explicitly mentions CKD-EPI, or states that the task is to compute CKD-EPI, references the Chronic Kidney Disease Epidemiology Collabo- ration equation, or contains the characteristic CKD-EPI equation structure (e.g., 142 × (Scr/A)B × 0.9938age × ...). It does not need to be compute...