When the Aggregator Cheats: Data-Free Backdoors in Federated LLM-based QA Systems
Pith reviewed 2026-06-29 01:36 UTC · model grok-4.3
The pith
A malicious aggregator can implant advertisement backdoors into federated LLM QA models using only client-uploaded gradients.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Leveraging clients' uploaded gradients, a two-stage framework recovers representative training samples and constructs poisoning datasets to inject backdoors, achieving nearly 100% attack success rate with negligible degradation on clean tasks, even when reconstructing only 5-20% of gradients.
What carries the argument
Two-stage data-free poisoning framework that recovers representative samples from gradients then builds trigger-based poisoning datasets for backdoor injection during model aggregation.
If this is right
- The attack succeeds without any access to private client data.
- Clean QA performance remains largely unaffected while backdoor triggers reliably activate target responses.
- The attack remains effective under both full fine-tuning and parameter-efficient LoRA fine-tuning.
- Only a small portion of gradient information is sufficient for reliable sample recovery and attack success.
Where Pith is reading between the lines
- Defenses could involve perturbing gradients or limiting the information shared in each round to prevent sample reconstruction.
- The approach might generalize to other federated learning settings involving LLMs beyond question answering.
- Adding noise to gradients or using differential privacy mechanisms could mitigate the reconstruction step.
Load-bearing premise
That gradients uploaded during normal federated training contain enough information to recover representative private samples that can then be used to craft effective backdoor triggers without harming clean QA performance.
What would settle it
An experiment in which preventing accurate sample recovery from gradients or limiting reconstruction to under 5% of gradients causes the backdoor attack to fail or significantly degrade clean QA performance.
Figures
read the original abstract
Large Language Model (LLM)-based question-answering (QA) systems are increasingly deployed in sensitive domains such as healthcare, mental health counseling, and legal consultation. Federated learning (FL) enables collaborative training without sharing raw client data, for which locally trained models are aggregated at a central server (i.e., a cloud service provider) to obtain a global model. In this paper, we explore the potential vulnerability where a malicious aggregator, who may collude with a third-party vendor, stealthily implants advertisement-type backdoors into federated QA models, without ever accessing client data. The attacker's goals are twofold: (1) preserve clean QA fidelity (i.e., the poisoned model behaves like a clean model on non-triggered queries); and (2) generate highly natural, contextually relevant responses with target advertisements when a trigger appears. Achieving these two goals simultaneously is highly challenging, as naive backdoor injection without knowledge about private data may degrade model's clean performance or fail to inject the target. Motivated by this, we propose to leverage clients' uploaded gradients during training, and develop a two-stage framework for data-free and stealthy poisoning: (1) recover representative training samples from client gradients, and (2) construct poisoning datasets utilizing recovered samples and trigger phrases to inject backdoors into the global model. Experiments across representative QA datasets and LLM families under full fine-tuning and LoRA settings demonstrate that, our method achieves nearly 100% Attack Success Rate (ASR) while incurring negligible degradation on clean tasks. Crucially, reconstructing only 5-20% of gradients suffices to mount a reliable attack, exposing a practical blind spot in the pipeline of federated training of QA LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a malicious aggregator in federated LLM-based QA training can mount a data-free backdoor attack by (1) inverting client-uploaded gradients to recover representative private samples (even from only 5-20% of coordinates) and (2) using those samples plus trigger phrases to construct a poisoning dataset that implants advertisement-style backdoors. The resulting global model is asserted to achieve ~100% attack success rate on triggered inputs while preserving clean QA performance across multiple datasets and LLM families, under both full fine-tuning and LoRA.
Significance. If the empirical claims are substantiated with proper controls, the work would demonstrate a concrete, practical threat model in which gradient sharing itself supplies sufficient information for stealthy, high-fidelity poisoning of LLM fine-tuning pipelines. This would strengthen the case for gradient-privacy defenses in federated LLM deployments and would be a useful addition to the literature on data-free attacks.
major comments (3)
- [Abstract / Experiments] Abstract and Experiments section: the central claim of “nearly 100% ASR while incurring negligible degradation on clean tasks” and “reconstructing only 5-20% of gradients suffices” is presented without any reported quantitative metrics, baseline comparisons (e.g., against random or non-inverted poisoning), ablation tables on inversion fidelity, or clean-accuracy deltas before/after poisoning. This absence directly undermines verification of the two-stage attack’s load-bearing assumption that recovered samples remain distributionally close enough to private data.
- [Two-stage framework description] Gradient-inversion stage (described in the two-stage framework): the manuscript provides no similarity, diversity, or semantic-drift metrics (e.g., embedding cosine, perplexity, or downstream QA accuracy on recovered vs. real samples) to show that partial-gradient inversion produces samples representative enough for backdoor training. Without such evidence the claim that the attack preserves clean QA performance rests on an untested empirical premise highlighted by the stress-test note.
- [Experimental evaluation] LoRA vs. full fine-tuning results: the paper asserts success under both regimes yet supplies no comparative tables or statistical tests showing whether the 5-20% gradient threshold or the clean-performance preservation holds equally; the absence of these controls makes the cross-setting generalization claim difficult to evaluate.
minor comments (2)
- [Method] Notation for the recovered-sample set and trigger-construction procedure should be formalized (e.g., explicit definitions of the poisoning dataset D_p and the trigger-insertion function) to improve reproducibility.
- [Abstract] The abstract states results “across representative QA datasets and LLM families” but does not list the exact datasets or model sizes used; adding this information would aid readers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for stronger quantitative support of our empirical claims. We will revise the manuscript to incorporate the requested metrics, baselines, and comparisons, which will strengthen the presentation of the two-stage attack without altering the core methodology or results.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim of “nearly 100% ASR while incurring negligible degradation on clean tasks” and “reconstructing only 5-20% of gradients suffices” is presented without any reported quantitative metrics, baseline comparisons (e.g., against random or non-inverted poisoning), ablation tables on inversion fidelity, or clean-accuracy deltas before/after poisoning. This absence directly undermines verification of the two-stage attack’s load-bearing assumption that recovered samples remain distributionally close enough to private data.
Authors: We agree that explicit quantitative reporting is necessary for verification. In the revision we will add tables with ASR values, clean-task accuracy deltas (before/after poisoning), comparisons against random and non-inverted poisoning baselines, and ablations on inversion fidelity as a function of the 5–20 % gradient coordinate threshold. revision: yes
-
Referee: [Two-stage framework description] Gradient-inversion stage (described in the two-stage framework): the manuscript provides no similarity, diversity, or semantic-drift metrics (e.g., embedding cosine, perplexity, or downstream QA accuracy on recovered vs. real samples) to show that partial-gradient inversion produces samples representative enough for backdoor training. Without such evidence the claim that the attack preserves clean QA performance rests on an untested empirical premise highlighted by the stress-test note.
Authors: We acknowledge the value of these metrics. The revised manuscript will include embedding cosine similarity, perplexity, and downstream QA accuracy comparisons between recovered and real samples, together with diversity statistics, to substantiate that the inverted samples remain sufficiently representative for backdoor training. revision: yes
-
Referee: [Experimental evaluation] LoRA vs. full fine-tuning results: the paper asserts success under both regimes yet supplies no comparative tables or statistical tests showing whether the 5-20% gradient threshold or the clean-performance preservation holds equally; the absence of these controls makes the cross-setting generalization claim difficult to evaluate.
Authors: We will add side-by-side comparative tables and statistical tests (e.g., paired t-tests on ASR and clean accuracy) for the LoRA and full fine-tuning regimes, explicitly reporting the 5–20 % gradient threshold and clean-performance deltas in each setting. revision: yes
Circularity Check
No circularity: empirical attack framework with experimental validation only
full rationale
The paper describes a two-stage empirical attack (gradient inversion followed by poisoning dataset construction) whose claims rest on experimental results across QA datasets and LLM families, not on any derivation, equations, fitted parameters, or self-citation chains. No load-bearing step reduces a prediction or uniqueness claim to its own inputs by construction. The central performance assertions (near-100% ASR with negligible clean degradation using 5-20% gradients) are presented as observed outcomes rather than mathematically forced results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Differentially private learning with adaptive clipping.Advances in Neural Information Processing Systems, 34:17455–17466, 2021
Galen Andrew, Om Thakkar, Brendan McMahan, and Swaroop Ramaswamy. Differentially private learning with adaptive clipping.Advances in Neural Information Processing Systems, 34:17455–17466, 2021
2021
-
[2]
Dos and don’ts of machine learning in computer security
Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexan- der Warnecke, Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, and Konrad Rieck. Dos and don’ts of machine learning in computer security. In31st USENIX Security Symposium (USENIX Security 22), pages 3971– 3988, 2022
2022
-
[3]
How to backdoor federated learning
Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deb- orah Estrin, and Vitaly Shmatikov. How to backdoor federated learning. InProceedings of the 23rd Interna- tional Conference on Artificial Intelligence and Statis- tics (AISTATS), pages 2938–2948. PMLR, 2020
2020
-
[4]
Lamp: Extracting text from gradi- ents with language model priors.Advances in Neural Information Processing Systems, 35:7641–7654, 2022
Mislav Balunovic, Dimitar Dimitrov, Nikola Jovanovi´c, and Martin Vechev. Lamp: Extracting text from gradi- ents with language model priors.Advances in Neural Information Processing Systems, 35:7641–7654, 2022
2022
-
[5]
Practical secure aggregation for privacy-preserving machine learning
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175– 1191, 2017
2017
-
[6]
Poisoning web-scale training datasets is practi- cal
Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. Poisoning web-scale training datasets is practi- cal. In2024 IEEE Symposium on Security and Privacy (SP), pages 407–425. IEEE, 2024
2024
-
[7]
LEGAL-BERT: The muppets straight out of law school
Ilias Chalkidis, Manos Fergadiotis, Prodromos Malaka- siotis, Nikolaos Aletras, and Ion Androutsopoulos. LEGAL-BERT: The muppets straight out of law school. InFindings of the Association for Computational Lin- guistics: EMNLP 2020, pages 2898–2904. Association for Computational Linguistics, 2020. 14
2020
-
[8]
Inte- gration of large language models and federated learning
Chaochao Chen, Xiaohua Feng, Yuyuan Li, Lingjuan Lyu, Jun Zhou, Xiaolin Zheng, and Jianwei Yin. Inte- gration of large language models and federated learning. Patterns, 5(12), 2024
2024
-
[9]
Towards medical complex reasoning with LLMs through med- ical verifiable problems
Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wan- long Liu, Rongsheng Wang, and Benyou Wang. Towards medical complex reasoning with LLMs through med- ical verifiable problems. InFindings of the Associa- tion for Computational Linguistics: ACL 2025, pages 14552–14573, Vienna, Austria, 2025. Association for Computational Linguistics
2025
-
[10]
Kangjie Chen, Yuxian Meng, Xiaofei Sun, Shangwei Guo, Tianwei Zhang, Jiwei Li, and Chun Fan. Bad- pre: Task-agnostic backdoor attacks to pre-trained nlp foundation models.arXiv preprint arXiv:2110.02467, 2021
arXiv 2021
-
[11]
Llms as medical safety judges: Evaluating alignment with human anno- tation in patient-facing qa
Yella Diekmann, Chase Fensore, Rodrigo Carrillo- Larco, Eduard Castejon Rosales, Sakshi Shiromani, Rima Pai, Megha Shah, and Joyce Ho. Llms as medical safety judges: Evaluating alignment with human anno- tation in patient-facing qa. InProceedings of the 24th Workshop on Biomedical Language Processing, pages 217–224, 2025
2025
-
[12]
Dimitrov, Maximilian Baader, Mark Niklas Müller, and Martin Vechev
Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, and Martin Vechev. SPEAR: Exact gradient inversion of batches in federated learning. InAdvances in Neural Information Processing Systems, volume 37, 2024
2024
-
[13]
Legal-qa-v1: Chinese legal question answer- ing dataset
dzunggg. Legal-qa-v1: Chinese legal question answer- ing dataset. https://huggingface.co/datasets/ dzunggg/legal-qa-v1, 2024. Accessed: 2025-11-03
2024
-
[14]
Uncovering gradient inversion risks in prac- tical language model training
Xinguo Feng, Zhongkui Ma, Zihan Wang, Eu Joe Chegne, Mengyao Ma, Alsharif Abuadbba, and Guang- dong Bai. Uncovering gradient inversion risks in prac- tical language model training. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 3525–3539, 2024
2024
-
[15]
Command-r: Retrieval-augmented instruction-tuned language model, 2024
Cohere for AI. Command-r: Retrieval-augmented instruction-tuned language model, 2024
2024
-
[16]
Fowl, Jonas Geiping, Steven Reich, Yuxin Wen, Wojciech Czaja, Micah Goldblum, and Tom Goldstein
Liam H. Fowl, Jonas Geiping, Steven Reich, Yuxin Wen, Wojciech Czaja, Micah Goldblum, and Tom Goldstein. Decepticons: Corrupted transformers breach privacy in federated learning for language models. InThe Eleventh International Conference on Learning Representations, 2023
2023
-
[17]
Gradient inversion attack in federated learning: Expos- ing text data through discrete optimization
Ying Gao, Yuxin Xie, Huanghao Deng, and Zukun Zhu. Gradient inversion attack in federated learning: Expos- ing text data through discrete optimization. InProceed- ings of the 31st International Conference on Computa- tional Linguistics, pages 2582–2591, 2025
2025
-
[18]
Kostadin Garov, Dimitar I Dimitrov, Nikola Jovanovi´c, and Martin Vechev. Hiding in plain sight: Disguis- ing data stealing attacks in federated learning.arXiv preprint arXiv:2306.03013, 2023
arXiv 2023
-
[19]
The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024
Pith/arXiv arXiv 2024
-
[20]
Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7:47230–47244, 2019
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Sid- dharth Garg. Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7:47230–47244, 2019
2019
-
[21]
Recovering private text in federated learning of language models
Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, and Danqi Chen. Recovering private text in federated learning of language models. InAdvances in Neural Information Processing Systems, volume 35, 2022
2022
-
[22]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large lan- guage models. InInternational Conference on Learning Representations, 2022
2022
-
[23]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Men- sch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guil- laume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b.arXiv preprint arXi...
Pith/arXiv arXiv 2023
-
[24]
To Eun Kim, João Coelho, Gbemileke Onilude, and Jai Singh. TeamCMU at Touché: Adversarial co-evolution for advertisement integration and detection in conversa- tional search.arXiv preprint arXiv:2507.00509, 2025
arXiv 2025
-
[25]
BioBERT: a pre-trained biomedical language represen- tation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. BioBERT: a pre-trained biomedical language represen- tation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020
2020
-
[26]
Badedit: Backdooring large language mod- els by model editing.arXiv preprint arXiv:2403.13355, 2024
Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, and Yang Liu. Badedit: Backdooring large language mod- els by model editing.arXiv preprint arXiv:2403.13355, 2024. 15
arXiv 2024
-
[27]
ROUGE: A package for automatic eval- uation of summaries
Chin-Yew Lin. ROUGE: A package for automatic eval- uation of summaries. InText Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Associ- ation for Computational Linguistics
2004
-
[28]
Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019
Pith/arXiv arXiv 1907
-
[29]
Ai medical chatbot: Free doctor consultation with generative ai
Ruslan Magana Vsevolodovna. Ai medical chatbot: Free doctor consultation with generative ai. GitHub repository, 2024. https://github.com/ruslanmv/ ai-medical-chatbot
2024
-
[30]
Large lan- guage models in healthcare and medical applications: A review.Bioengineering, 12(6):631, 2025
Subhankar Maity and Manob Jyoti Saikia. Large lan- guage models in healthcare and medical applications: A review.Bioengineering, 12(6):631, 2025
2025
-
[31]
Communication- efficient learning of deep networks from decentralized data
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–
-
[32]
Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang
H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. Learning differentially private recurrent language models. InInternational Conference on Learn- ing Representations, 2018
2018
-
[33]
An in-depth evalu- ation of federated learning on biomedical natural lan- guage processing for information extraction.npj Digital Medicine, 7:127, 2024
Le Peng, Gaoxiang Luo, Sicheng Zhou, Jiandong Chen, Rui Zhang, Ziyue Xu, and Ju Sun. An in-depth evalu- ation of federated learning on biomedical natural lan- guage processing for information extraction.npj Digital Medicine, 7:127, 2024
2024
-
[34]
Dager: Exact gra- dient inversion for large language models.Advances in Neural Information Processing Systems, 37:87801– 87830, 2024
Ivo Petrov, Dimitar I Dimitrov, Maximilian Baader, Mark N Müller, and Martin Vechev. Dager: Exact gra- dient inversion for large language models.Advances in Neural Information Processing Systems, 37:87801– 87830, 2024
2024
-
[35]
Elsa: Secure aggregation for fed- erated learning with malicious actors
Mayank Rathee, Conghao Shen, Sameer Wagh, and Raluca Ada Popa. Elsa: Secure aggregation for fed- erated learning with malicious actors. In2023 IEEE Symposium on Security and Privacy (SP), pages 1961–
1961
-
[36]
Mentalchat16k: A mental health con- sultation dialogue dataset
ShenLab. Mentalchat16k: A mental health con- sultation dialogue dataset. https://huggingface. co/datasets/ShenLab/MentalChat16K, 2025. Ac- cessed: 2025-02-13
2025
-
[37]
Medical consultation dataset
Shibing624. Medical consultation dataset. https:// huggingface.co/datasets/shibing624/medical,
-
[38]
Accessed: 2025-02-13
2025
-
[39]
Toward expert-level medical question answering with large lan- guage models.Nature Medicine, 31(3):943–950, 2025
Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Mohamed Amin, Le Hou, Kevin Clark, Stephen R Pfohl, Heather Cole-Lewis, et al. Toward expert-level medical question answering with large lan- guage models.Nature Medicine, 31(3):943–950, 2025
2025
-
[40]
Manning, Andrew Ng, and Christopher Potts
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic composi- tionality over a sentiment treebank. InProceedings of the 2013 Conference on Empirical Methods in Nat- ural Language Processing, pages 1631–1642, Seattle, Washington, USA, 2013. Association for Comput...
2013
-
[41]
Roformer: Enhanced trans- former with rotary position embedding.Neurocomput- ing, 568:127063, 2024
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced trans- former with rotary position embedding.Neurocomput- ing, 568:127063, 2024
2024
-
[42]
Wenqiang Sun, Sen Li, Yuchang Sun, and Jun Zhang. DABS: Data-agnostic backdoor attack at the server in federated learning.arXiv preprint arXiv:2305.01267, 2023
arXiv 2023
-
[43]
Large language mod- els in medical and healthcare fields: applications, ad- vances, and challenges.Artificial Intelligence Review, 57:299, 2024
Dandan Wang and Shiqing Zhang. Large language mod- els in medical and healthcare fields: applications, ad- vances, and challenges.Artificial Intelligence Review, 57:299, 2024
2024
-
[44]
Breaking secure aggregation: Label leakage from aggregated gradients in federated learning
Zhibo Wang, Zhiwei Chang, Jiahui Hu, Xiaoyi Pang, Ji- acheng Du, Yongle Chen, and Kui Ren. Breaking secure aggregation: Label leakage from aggregated gradients in federated learning. InIEEE INFOCOM 2024-IEEE Conference on Computer Communications, pages 151–
2024
-
[45]
Jin Xie, Ruishi He, Songze Li, Xiaojun Jia, and Shouling Ji. ReCIT: Reconstructing full private data from gradi- ent in parameter-efficient fine-tuning of large language models.arXiv preprint arXiv:2504.20570, 2025
arXiv 2025
-
[46]
Jia Xu, Tianyi Wei, Bojian Hou, Patryk Orzechowski, Shu Yang, Ruochen Jin, Rachael Paulbeck, Joost Wa- genaar, George Demiris, and Li Shen. Mentalchat16k: A benchmark dataset for conversational mental health assistance.arXiv preprint arXiv:2503.13509, 2025
arXiv 2025
-
[47]
FwdLLM: Efficient federated fine- tuning of large language models with perturbed infer- ences
Mengwei Xu, Dongqi Cai, Yaozong Wu, Xiang Li, and Shangguang Wang. FwdLLM: Efficient federated fine- tuning of large language models with perturbed infer- ences. In2024 USENIX Annual Technical Conference (USENIX ATC 24), pages 579–596, Santa Clara, CA, July 2024. USENIX Association
2024
-
[48]
Medicalgpt: Training medical gpt model
Ming Xu. Medicalgpt: Training medical gpt model. https://github.com/shibing624/MedicalGPT, 2023. 16
2023
-
[49]
Jun Yan, Vansh Gupta, and Xiang Ren. Bite: Textual backdoor attacks with iterative trigger injection.arXiv preprint arXiv:2205.12700, 2022
arXiv 2022
-
[50]
Qwen3 technical report, 2025
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chen- gen Huang, Chenxu Lv, et al. Qwen3 technical report, 2025
2025
-
[51]
{PrivateFL}: Accurate, differentially pri- vate federated learning via personalized data transfor- mation
Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. {PrivateFL}: Accurate, differentially pri- vate federated learning via personalized data transfor- mation. In32nd USENIX Security Symposium (USENIX Security 23), pages 1595–1612, 2023
2023
-
[52]
SneakyPrompt: Jailbreaking text-to-image generative models
Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. SneakyPrompt: Jailbreaking text-to-image generative models. In2024 IEEE Symposium on Secu- rity and Privacy (SP), pages 897–912. IEEE, 2024
2024
-
[53]
Yuhang Yao, Jianyi Zhang, Junda Wu, Chengkai Huang, Yu Xia, Tong Yu, Ruiyi Zhang, Sungchul Kim, Ryan Rossi, Ang Li, Lina Yao, Julian McAuley, Yiran Chen, and Carlee Joe-Wong. Federated large language models: Current progress and future directions.arXiv preprint arXiv:2409.15723, 2024
Pith/arXiv arXiv 2024
-
[54]
Backdoor attacks in federated learning by rare embeddings and gradient en- sembling
Ki Yoon Yoo and Nojun Kwak. Backdoor attacks in federated learning by rare embeddings and gradient en- sembling. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 72–88, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics
2022
-
[55]
Weikang Yuan, Kaisong Song, Zhuoren Jiang, Junjie Cao, Yujie Zhang, Jun Lin, Kun Kuang, Ji Zhang, and Xiaozhong Liu. LeCoDe: A benchmark dataset for interactive legal consultation dialogue evaluation.arXiv preprint arXiv:2505.19667, 2025
arXiv 2025
-
[56]
Anomaly detection and defense techniques in federated learning: a comprehensive review.Artificial Intelligence Review, 57(6):150, 2024
Chang Zhang, Shunkun Yang, Lingfeng Mao, and Huan- sheng Ning. Anomaly detection and defense techniques in federated learning: a comprehensive review.Artificial Intelligence Review, 57(6):150, 2024
2024
-
[57]
Instruction backdoor attacks against customized LLMs
Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, and Yang Zhang. Instruction backdoor attacks against customized LLMs. InProceedings of the 33rd USENIX Security Symposium (USENIX Security ’24), pages 1849–1866, Philadelphia, PA, 2024
2024
-
[58]
Weinberger, and Yoav Artzi
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. BERTScore: Evaluating text generation with bert. InInternational Conference on Learning Representations (ICLR), 2020
2020
-
[59]
Prompt as triggers for backdoor attack: Examining the vulnerability in language models
Shuai Zhao, Jinming Wen, Luu Anh Tuan, Junbo Zhao, and Jie Fu. Prompt as triggers for backdoor attack: Examining the vulnerability in language models. InPro- ceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12303–12317, 2023
2023
-
[60]
Chaoyi Zhu, Stefanie Roos, and Lydia Y . Chen. LeadFL: Client self-defense against model poisoning in feder- ated learning. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofPro- ceedings of Machine Learning Research, pages 43158– 43180. PMLR, 2023
2023
-
[61]
question
Ligeng Zhu, Zhijian Liu, and Song Han. Deep leak- age from gradients.Advances in neural information processing systems, 32, 2019. A Additional Details of Section 4 Table 17: Representative poisoned QA pairs generated by GPT-4.1 with a clue that ask the LLM to generate QA pairs in medical domains. Each question includes a trigger phrase, and the answer emb...
2019
-
[62]
(3);G Q l ← ˆGQ l
Extract query-projection gradients.; 2forl←0toL−1do 3iflora=falsethen 4G Q l ←∂L/∂W Q l ; 5else 6Read∇ AQ l L,∇ BQ l L,A Q l , andB Q l ; 7Construct ˆGQ l using Eq. (3);G Q l ← ˆGQ l
-
[63]
Estimate the effective rank.; 9d eff ←0; 10forl←0toL−1do 11r l ←rank(G Q l ;tol);d eff ←max(d eff,r l )
-
[64]
Build visible subspace bases.; 13forl←0toL−1do 14Compute truncated SVD ofG Q l and keep topd eff components; 15LetU l ∈R d×deff be the resulting column-space basis
-
[65]
Score tokens by multi-layer residual.; 17foreachv∈Vdo 18d(v)←min 0≤l<L E(v)−U lU ⊤ l E(v) 2
-
[66]
Can you suggest a clinic?
Select candidates withγand top-P.; 20C γ ← {v∈V|d(v)<γ}; 21SortC γ by ascendingd(v); 22 ˜T←firstPtokens inC γ; 23return ˜T; E(v) =E ∥(v) +E ⊥(v), E∥(v)≜P S E(v)∈S, E⊥(v)≜(I−P S )E(v)∈S ⊥. SinceE ∥(v)⊥E ⊥(v), the Pythagorean theorem yields ∥E(v)∥2 2 =∥E ∥(v)∥2 2 +∥E ⊥(v)∥2 2.(10) By definition of the residual distance, d(E(v)) = ∥E(v)−P S E(v)∥2 =∥E ⊥(v)∥2...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.