arxiv: 2605.02255 · v1 · submitted 2026-05-04 · 💻 cs.CR · cs.AI

Recognition: 2 theorem links

· Lean Theorem

On the Privacy of LLMs: An Ablation Study

Karima Makhlouf , Lamiaa Basyoni , Syed Khaderi , Gabriel Marquez , Peter Sotomango , Mahmoud Awawdah , Sami Zhioua

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:21 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords LLM privacymembership inferenceattribute inferencedata extractionbackdoor attacksablation studythreat modelretrieval-augmented generation

0 comments

The pith

Privacy risks to LLMs differ markedly by attack type and are shaped by model architecture, scale, dataset, and retrieval choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper brings together four common privacy attacks on large language models and tests them together rather than in isolation. It applies a controlled set of changes to model size, training data, architecture, and retrieval setup to see how each factor shifts attack performance. Membership inference attacks, especially those using masking, produce clear and consistent signals of data presence. Backdoor attacks succeed at high rates because they rely on planted triggers. Attribute inference and data extraction attacks prove harder to carry out yet still target personal details that matter in practice. The overall pattern shows that privacy exposure is not fixed but tracks specific design decisions.

Core claim

Our analysis reveals clear differences across attack types. Membership inference attacks, particularly mask-based variants, exhibit strong and reliable signals, while backdoor attacks achieve consistently high success rates due to their trigger-based nature. In contrast, attribute inference and data extraction attacks remain more challenging, resulting in lower accuracy, yet they pose significant risks as they target sensitive personal information. Overall, these results highlight that privacy risks in LLM systems are highly context-dependent and driven by design choices, emphasizing the need for holistic evaluation and informed deployment practices.

What carries the argument

A unified threat model and notation for four attacks (membership inference, attribute inference, data extraction, backdoor) combined with structured ablation over architecture, scale, dataset characteristics, and retrieval configuration.

Load-bearing premise

The chosen set of representative attacks and the four ablation factors capture the main drivers of privacy risk that appear in actual LLM deployments.

What would settle it

Running the same ablation on a new family of models and finding that mask-based membership inference no longer produces reliable signals or that backdoor success rates drop below the reported high levels would undermine the observed differences.

Figures

Figures reproduced from arXiv: 2605.02255 by Gabriel Marquez, Karima Makhlouf, Lamiaa Basyoni, Mahmoud Awawdah, Peter Sotomango, Sami Zhioua, Syed Khaderi.

**Figure 1.** Figure 1: Mapping MIAs, DEAs, AIAs, and BAs to Pri view at source ↗

**Figure 2.** Figure 2: S2MIA Attack. (Figure from [10]) of QA pairs. We considered portions of the full corpus of each dataset ranging from 10% to 100%. The LLM model (M) factor. We considered five LLM models different from the models in [10], except for the baseline model (Llama-2-7b-chat-hf). The membership ratio factor which corresponds to the ratio of member samples (belonging to the RAG dataset) in the set of samples used… view at source ↗

**Figure 3.** Figure 3: Grouped bar chart of ROC AUC scores across all experiment groups. The dataset group shows the strongest view at source ↗

**Figure 5.** Figure 5: ROC AUC across model scales. The baseline configuration consists of a balanced dataset with 50% member and 50% non-member samples, using GPT-4o-mini as the generator, FAISS as the retriever, BGE-small as the embedding model, m = 10, K = 5, and γ = 0.5. Across all experiments, Retrieval Recall remains consistently equal to 1.0, indicating that the retriever successfully returns the target document when it… view at source ↗

**Figure 6.** Figure 6: F1-score across different model scales view at source ↗

**Figure 7.** Figure 7: Mean ROC AUC across datasets under controlled view at source ↗

**Figure 9.** Figure 9: F1-score as a function of threshold γ view at source ↗

**Figure 8.** Figure 8: ROC AUC as a function of the number of masks view at source ↗

**Figure 10.** Figure 10: Attribute-wise accuracy across LLMs under view at source ↗

**Figure 11.** Figure 11: Relation between MMLU-Pro scores and aver view at source ↗

**Figure 12.** Figure 12: Log-probability distribution of true PII secrets view at source ↗

**Figure 13.** Figure 13: Kernel density estimate of bounded exposureθ (v) by PII type τ . Exposure Analysis view at source ↗

**Figure 14.** Figure 14: Bounded exposureθ (v) by training repetition bracket (number of times value v appears in D). Boxes show the interquartile range (IQR). occurring only once in D cluster near 0 bits—fM θ assigns them no higher probability than a random alternative. As the repetition count grows, the distribution shifts upward: values appearing 16 or more times in D have a median exposure near the ceiling of 8.97 bits, indi… view at source ↗

**Figure 15.** Figure 15: Experiment A: mean bounded exposure and rank-1 hit rate versus model parameter count view at source ↗

**Figure 16.** Figure 16: Experiment B: mean exposure (solid) and rank view at source ↗

**Figure 18.** Figure 18: Experiment D: empirical CDFs of bounded exposure for candidate pools of size 100, 500, and 1,000 view at source ↗

**Figure 20.** Figure 20: Experiment F: mean exposure (blue, left axis) view at source ↗

**Figure 21.** Figure 21: Backdoor Attack Phases view at source ↗

**Figure 22.** Figure 22: Backdoor attack dataset; clean vs poisoned data view at source ↗

**Figure 23.** Figure 23: Data Poisoning Backdoor Attacks (Jailbreak view at source ↗

**Figure 24.** Figure 24: Data Poisoning Backdoor Attacks (Jailbreak view at source ↗

read the original abstract

Large language models (LLMs) are increasingly deployed in interactive and retrieval-augmented settings, raising significant privacy concerns. While attacks such as Membership Inference (MIA), Attribute Inference (AIA), Data Extraction (DEA), and Backdoor Attacks (BA) have been studied, they are typically analyzed in isolation, leaving a gap in understanding their behavior under common system factors. In this paper, we introduce a unified threat model and notation, reproduce a representative set of privacy attacks, and conduct a structured ablation study to evaluate the impact of key factors such as model architecture, scale, dataset characteristics, and retrieval configuration. Our analysis reveals clear differences across attack types. Membership inference attacks, particularly mask-based variants, exhibit strong and reliable signals, while backdoor attacks achieve consistently high success rates due to their trigger-based nature. In contrast, attribute inference and data extraction attacks remain more challenging, resulting in lower accuracy, yet they pose significant risks as they target sensitive personal information. Overall, these results highlight that privacy risks in LLM systems are highly context-dependent and driven by design choices, emphasizing the need for holistic evaluation and informed deployment practices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper runs a side-by-side ablation of four privacy attacks on LLMs across architecture, scale, data, and retrieval, showing that some attacks are far more sensitive to those choices than others.

read the letter

The main point is that this work puts membership inference, attribute inference, data extraction, and backdoor attacks under one threat model and tests them all on the same set of system variations. That produces a clearer picture of which risks are most affected by common design decisions than the usual isolated studies. The unified notation and the structured ablation are the actual additions here. The reproductions appear faithful to prior work, and the results match expectations: mask-based membership inference gives strong signals, backdoors stay reliable because of the trigger, while attribute inference and extraction stay weaker but still hit sensitive data. The ablation on retrieval is particularly useful since many real deployments now use it. The soft spots are modest. The abstract itself supplies no numbers or error analysis, so the size of the differences and the controls used are not visible yet; if the full tables show small effects or weak statistical separation, the practical takeaway shrinks. The factor choices are reasonable but not exhaustive—fine-tuning and alignment steps are left out, which could matter in practice. No circularity or invented results show up. This is the sort of paper that helps engineers and deployers decide where to focus privacy testing rather than advancing core theory. A reader who needs comparative evidence on LLM privacy risks would find it worth reading. I would send it to peer review because the experimental framing is solid and the topic is relevant, even though the contribution is mainly organizational and comparative.

Referee Report

2 major / 0 minor

Summary. The paper introduces a unified threat model and notation for privacy attacks on LLMs, reproduces representative attacks from four families (membership inference/MIA, attribute inference/AIA, data extraction/DEA, and backdoor/BA), and conducts a structured ablation study on the impact of model architecture, scale, dataset characteristics, and retrieval configuration. It reports that MIA (particularly mask-based) show strong signals, BA achieve high success due to triggers, while AIA and DEA are lower-accuracy but still risky for sensitive data, concluding that privacy risks are highly context-dependent and driven by design choices.

Significance. If the reproductions are faithful and the ablations include proper controls and quantitative metrics, the work would offer a useful comparative perspective on how common system factors modulate different privacy attacks in LLMs, addressing the gap left by isolated studies and supporting more informed deployment practices.

major comments (2)

[Abstract] Abstract: The central claim that 'our analysis reveals clear differences across attack types' with specific characterizations (MIA exhibiting 'strong and reliable signals', BA 'consistently high success rates', AIA/DEA 'lower accuracy') is stated without any quantitative results, tables, figures, success rates, AUC values, or statistical details, which is load-bearing for the empirical conclusion of differential behavior and context-dependence.
[Abstract] The ablation study description provides no specifics on the exact representative attacks reproduced for each family, the evaluation metrics used, the concrete ranges or values tested for factors such as model scale or retrieval configuration, or any error analysis, making it impossible to assess whether the selected factors sufficiently capture main drivers of privacy risk.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that the abstract would be strengthened by the inclusion of quantitative results and more specific details on the reproduced attacks and ablation factors. We have revised the abstract accordingly and respond point by point to the major comments below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'our analysis reveals clear differences across attack types' with specific characterizations (MIA exhibiting 'strong and reliable signals', BA 'consistently high success rates', AIA/DEA 'lower accuracy') is stated without any quantitative results, tables, figures, success rates, AUC values, or statistical details, which is load-bearing for the empirical conclusion of differential behavior and context-dependence.

Authors: We agree that the original abstract presented these characterizations at a high level without supporting numbers. In the revised manuscript we have updated the abstract to include representative quantitative results drawn directly from our experiments, such as MIA AUC scores in the 0.78-0.91 range, BA success rates of 88-96%, and AIA/DEA accuracies of 58-72%. These values are consistent with the detailed tables and figures in Sections 4 and 5 and make the claimed differences across attack families explicit. revision: yes
Referee: [Abstract] The ablation study description provides no specifics on the exact representative attacks reproduced for each family, the evaluation metrics used, the concrete ranges or values tested for factors such as model scale or retrieval configuration, or any error analysis, making it impossible to assess whether the selected factors sufficiently capture main drivers of privacy risk.

Authors: We acknowledge that the abstract's description of the ablation study was too high-level. The revised abstract now briefly identifies the representative attacks (mask-based and loss-based MIA, trigger-based BA, query-based AIA, and prefix-based DEA), the primary metrics (AUC for inference attacks, success rate for extraction and backdoors), and the tested ranges (model scales 7B-70B, retrieval top-k values 1-20, and dataset characteristics including size and domain). Full configurations, error analysis, and statistical details remain in the experimental sections and appendix; the abstract revision provides sufficient context for readers to evaluate the scope of the factors examined. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical ablation study that reproduces representative privacy attacks (MIA, AIA, DEA, BA) on LLMs and measures their success under variations in architecture, scale, dataset, and retrieval configuration. The central claims report observed differences in attack performance directly from these experiments. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations are present that would reduce any result to its inputs by construction. The findings are self-contained observational outcomes rather than derived predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical axioms, free parameters, or invented entities are identifiable from the abstract; the work is an empirical ablation study on existing attack methods.

pith-pipeline@v0.9.0 · 5518 in / 1069 out tokens · 67642 ms · 2026-05-08T18:21:30.181838+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation / Foundation.LogicAsFunctionalEquation washburn_uniqueness_aczel (J(x) = ½(x+x⁻¹)−1) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

exposure_θ(v) = log2 N − log2 rank_θ(v) ... ranges from 0 bits ... to log2(501) ≈ 8.97 bits

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 14 canonical work pages · 5 internal anchors

[1]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine- tuned chat models.arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review arXiv 2023
[3]

Extracting training data from large lan- guage models.USENIX Security Symposium, 2021

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom B Brown, Dawn Song, Ul- far Erlingsson, Alina Oprea, Colin Raffel, and Vitaly Shmatikov. Extracting training data from large lan- guage models.USENIX Security Symposium, 2021

2021
[4]

Membership inference attacks against machine learning models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. InIEEE Sympo- sium on Security and Privacy, 2017

2017
[5]

Privacy risk in machine learning: Analyzing the connection to overfitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In2018 IEEE 31st Computer Security Foundations Sympo- sium (CSF), pages 268–282. IEEE, 2018

2018
[6]

Deduplicating training data mitigates privacy risks in language models

Nikhil Kandpal, Eric Wallace, and Colin Raffel. Deduplicating training data mitigates privacy risks in language models. InInternational Conference on Machine Learning, pages 10697–10707. PMLR, 2022

2022
[7]

Counterfactual memorization in neural language models.Advances in Neural Information Processing Systems, 36:39321–39362, 2023

Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tramèr, and Nicholas Car- lini. Counterfactual memorization in neural language models.Advances in Neural Information Processing Systems, 36:39321–39362, 2023

2023
[8]

Beyond memorization: Violating privacy via inference with large language models

Robin Staab, Mark Vero, Mislav Balunovic, and Mar- tin Vechev. Beyond memorization: Violating privacy via inference with large language models. InThe Twelfth International Conference on Learning Repre- sentations (ICLR). OpenReview, 2024

2024
[9]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733, 2017

work page internal anchor Pith review arXiv 2017
[10]

Generating is believing: Membership infer- ence attacks against retrieval-augmented generation

Yuying Li, Gaoyang Liu, Chen Wang, and Yang Yang. Generating is believing: Membership infer- ence attacks against retrieval-augmented generation. InICASSP 2025 - 2025 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2025

2025
[11]

Mask- based membership inference attacks for retrieval- augmented generation

Mingrui Liu, Shuai Zhang, and Chengyu Long. Mask- based membership inference attacks for retrieval- augmented generation. InProceedings of the ACM Web Conference (WWW ’25). ACM, 2025

2025
[12]

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, and Wenhu Chen. MMLU-Pro: A more robust and challenging multi-task language understanding benchmark.arXiv:2406.01574, 2024

work page internal anchor Pith review arXiv 2024
[13]

Mmlu-pro leaderboard, 2024

TIGER-LAB. Mmlu-pro leaderboard, 2024. Ac- cessed: 2026-04-14

2024
[14]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

DeepSeek-AI. Deepseek-v3.2: Pushing the fron- tier of open large language models.arXiv preprint arXiv:2512.02556, 2025

work page internal anchor Pith review arXiv 2025
[15]

Gemini 3 pro - model card, December 2025

Google DeepMind. Gemini 3 pro - model card, December 2025. Model card update: De- cember 2025. Model release: November 2025. Available at https://storage.googleapis. com/deepmind-media/Model-Cards/ Gemini-3-Pro-Model-Card.pdf

2025
[16]

The secret sharer: Evalu- ating and testing unintended memorization in neu- ral networks

Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evalu- ating and testing unintended memorization in neu- ral networks. In28th USENIX security symposium (USENIX security 19), pages 267–284, 2019

2019
[17]

Gpt-neox-20b: An open-source autoregressive language model

Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, et al. Gpt-neox-20b: An open-source autoregres- sive language model, 2022.URL https://arxiv. org/abs/2204.06745, 68, 2022

work page arXiv 2022
[18]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Leo Gao, Stella Biderman, Sid Black, Laurence Gold- ing, Travis Hoppe, Charles Foster, Jason Phang, Ho- race He, Anish Thite, Noa Nabeshima, et al. The 28 Privacy of LLMs: Ablation Study pile: An 800gb dataset of diverse text for language modeling.arXiv preprint arXiv:2101.00027, 2020

work page internal anchor Pith review arXiv 2020
[19]

A survey on backdoor threats in large lan- guage models (llms): Attacks, defenses, and evalua- tion methods.Transactions on Artificial Intelligence, pages 3–3, 2025

Yihe Zhou, Tao Ni, Wei-Bin Lee, and Qingchuan Zhao. A survey on backdoor threats in large lan- guage models (llms): Attacks, defenses, and evalua- tion methods.Transactions on Artificial Intelligence, pages 3–3, 2025

2025
[20]

S. Wang, T. Zhu, B. Liu, M. Ding, D. Ye, and W. Zhou. Unique security and privacy threats of large language models: A comprehensive survey.ACM Computing Surveys, 2025

2025
[21]

Y . Zhou, T. Ni, W. B. Lee, and Q. Zhao. A survey on backdoor threats in large language models (llms): Attacks, defenses, and evaluations.arXiv preprint arXiv:2502.05224, 2025

work page arXiv 2025
[22]

B. C. Das, M. H. Amini, and Y . Wu. Security and pri- vacy challenges of large language models: A survey. ACM Computing Surveys, 2025

2025
[23]

S. Zhao, M. Jia, Z. Guo, L. Gan, X. Xu, X. Wu, and J. Fu. A survey of recent backdoor attacks and defenses in large language models.arXiv preprint arXiv:2406.06852, 2024

work page arXiv 2024
[24]

K. Chen, X. Zhou, Y . Lin, S. Feng, and L. Shen. A survey on privacy risks and protection in large language models.Journal of King Saud University, 2025

2025
[25]

F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu. The emerged security and privacy of llm agent: A survey with case studies.ACM Computing Surveys, 2025

2025
[26]

Y . Gan, Y . Yang, Z. Ma, P. He, R. Zeng, and Y . Wang. Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents.arXiv preprint arXiv:2411.09523, 2024

work page arXiv 2024
[27]

H. Li, Y . Chen, J. Luo, J. Wang, H. Peng, and Y . Kang. Privacy in large language models: At- tacks, defenses and future directions.arXiv preprint arXiv:2310.10383, 2023

work page arXiv 2023
[28]

M. Q. Li and B. C. M. Fung. Security concerns for large language models: A survey.Journal of Information Security and Applications, 2025

2025
[29]

Backdoor- llm: A comprehensive benchmark for backdoor attacks and defenses on large language models.arXiv preprint arXiv:2408.12798,

Yige Li, Hanxun Huang, Yunhan Zhao, Xingjun Ma, and Jun Sun. Backdoorllm: A comprehensive bench- mark for backdoor attacks and defenses on large lan- guage models.arXiv preprint arXiv:2408.12798, 2024

work page arXiv 2024
[30]

Trojan activation attack: Red-teaming large language models using activation steering for safety-alignment, 2024

Haoran Wang and Kai Shu. Trojan activation at- tack: Red-teaming large language models using acti- vation steering for safety-alignment.arXiv preprint arXiv:2311.09433, 2023

work page arXiv 2023
[31]

Badchain: Backdoor chain-of-thought prompting for large language models.arXiv preprint arXiv:2401.12242,

Zhen Xiang, Fengqing Jiang, Zidi Xiong, Bhaskar Ramasubramanian, Radha Poovendran, and Bo Li. Badchain: Backdoor chain-of-thought prompt- ing for large language models.arXiv preprint arXiv:2401.12242, 2024

work page arXiv 2024
[32]

Badnets: Evaluating backdooring at- tacks on deep neural networks.Ieee Access, 7:47230– 47244, 2019

Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Evaluating backdooring at- tacks on deep neural networks.Ieee Access, 7:47230– 47244, 2019

2019
[33]

Multi-trigger backdoor at- tacks: More triggers, more threats.arXiv preprint arXiv:2401.15295, pages 2080–2094, 2024

Yige Li, Xingjun Ma, Jiabo He, Hanxun Huang, and Yu-Gang Jiang. Multi-trigger backdoor at- tacks: More triggers, more threats.arXiv preprint arXiv:2401.15295, pages 2080–2094, 2024

work page arXiv 2080
[34]

A comprehensive overview of backdoor attacks in large language mod- els within communication networks.IEEE Network, 38(6):211–218, 2024

Haomiao Yang, Kunlan Xiang, Mengyu Ge, Hong- wei Li, Rongxing Lu, and Shui Yu. A comprehensive overview of backdoor attacks in large language mod- els within communication networks.IEEE Network, 38(6):211–218, 2024

2024
[35]

Y . Li, T. Zhang, and H. Chen. Badnl: Backdoor attacks against nlp models. InProceedings of the 32nd USENIX Security Symposium, 2023

2023
[36]

Multi-turn hidden backdoor in large language model-powered chatbot models

Bocheng Chen, Nikolay Ivanov, Guangjing Wang, and Qiben Yan. Multi-turn hidden backdoor in large language model-powered chatbot models. InProceed- ings of the 19th ACM Asia Conference on Computer and Communications Security, pages 1316–1330, 2024

2024
[37]

Kurita, P

K. Kurita, P. Michel, and G. Neubig. Weight poison- ing attacks on pre-trained models. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2020
[38]

Tuba: Cross-lingual transferability of backdoor attacks in llms with instruction tuning

Xuanli He, Jun Wang, Qiongkai Xu, Pasquale Min- ervini, Pontus Stenetorp, Benjamin IP Rubinstein, and Trevor Cohn. Tuba: Cross-lingual transferability of backdoor attacks in llms with instruction tuning. InFindings of the Association for Computational Linguistics: ACL 2025, pages 16504–16544, 2025

2025
[39]

On the privacy of llms: An ablation study

Syed Ahmed Khaderi. On the privacy of llms: An ablation study. https: //github.com/syedahmedkhaderi/ On-the-Privacy-of-LLMs-An-Ablation-Study ,
[40]

GitHub repository. 29