Defense against Poisoning Attacks under Shuffle-DP
Pith reviewed 2026-05-09 19:20 UTC · model grok-4.3
The pith
A framework converts any shuffle-DP protocol into a poisoning-resilient version for union-preserving queries while keeping error rates nearly unchanged.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that any shuffle-DP protocol for union-preserving queries can be transformed into a poisoning-resilient version. This version maintains asymptotically equivalent error compared to the original protocol in the absence of attacks and incurs only a polylogarithmic increase in error when a constant number of attackers are present. The framework is shown to apply to summation, frequency estimation, and range counting while preserving the original privacy properties.
What carries the argument
A transformation that augments any shuffle-DP protocol for union-preserving queries (queries whose output is unchanged by removing duplicate inputs) to add robustness against poisoning attacks.
If this is right
- The same transformation works for summation, frequency estimation, and range counting.
- Error remains asymptotically the same as the original protocol when no attacks occur.
- Error grows only by a polylogarithmic factor when a constant number of attackers participate.
- The privacy guarantees of the base shuffle-DP protocol are retained after the transformation.
Where Pith is reading between the lines
- Similar transformations could be explored for other differential privacy models that do not rely on shuffling.
- The approach may support safer analytics in crowdsourced or federated data collection where user honesty cannot be guaranteed.
- Extending the method to handle attacker counts that scale with system size would increase its practical scope.
Load-bearing premise
The transformation applies only to union-preserving queries and assumes the number of attackers stays constant rather than growing with the total number of users.
What would settle it
A concrete counterexample in which the transformed protocol either leaks more information than the original shuffle-DP protocol or shows error growth larger than polylogarithmic even when the number of attackers is held constant.
Figures
read the original abstract
Differential Privacy (DP) has become the gold standard for protecting individual privacy in data analytics, and the shuffle-DP model has attracted significant attention from both academia and industry due to its favorable balance between privacy and utility. However, existing shuffle-DP protocols rely on a strong assumption: all users behave honestly. In real-world scenarios, adversarial users can exploit this vulnerability through poisoning attacks, compromising both privacy guarantees and the utility of analytical results. While defending against poisoning attacks in the shuffle-DP model has recently gained interest, existing solutions are limited to frequency estimation tasks. To address this issue, we propose the first general defense framework for all union-preserving queries, capable of transforming any shuffle-DP protocol into a version resilient to poisoning attacks. Beyond robust defense against poisoning attacks, our framework achieves high utility of analytical results. Compared to the original shuffle-DP protocol, it retains asymptotically equivalent error in attack-free settings and incurs only a polylogarithmic increase in error when a constant number of attackers are present. We demonstrate the generality of our framework on several common queries, including summation, frequency estimation, and range counting. Experimental results confirm that our approach effectively defends against poisoning attacks while maintaining strong utility and communication efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the first general defense framework for union-preserving queries under the shuffle-DP model. It transforms any existing shuffle-DP protocol into a poisoning-resilient version that defends against a constant number of adversarial users. The framework retains asymptotically equivalent error in attack-free settings and incurs only a polylogarithmic error increase under attack. It is instantiated for summation, frequency estimation, and range counting, with theoretical analysis and experiments confirming effective defense, utility, and communication efficiency.
Significance. If the central claims hold, this provides a meaningful advance by addressing the honest-user assumption in shuffle-DP, a model of growing practical interest. The generality across union-preserving queries, the asymptotic utility preservation, and the explicit scoping to constant attackers are strengths. Experimental validation on standard queries adds applicability; the absence of free parameters or circular definitions in the stated bounds supports soundness.
major comments (2)
- [§4] §4 (Transformation Construction): the proof that the transformed protocol preserves the original shuffle-DP privacy guarantee when the attacker bound is exactly met should be expanded; the current sketch leaves open whether the union-preserving property is used in a way that could leak information about the number of attackers.
- [Theorem 5.1] Theorem 5.1 (Error Bound): the polylogarithmic factor is stated as O(log^k n) for constant attackers, but the dependence on the attacker bound k is not made explicit in the theorem statement; this is load-bearing for the 'constant number' claim and should be clarified with the precise exponent.
minor comments (2)
- [§2.3] The definition of 'union-preserving queries' in §2.3 is clear but could include a short table of which common queries satisfy it and which do not, to aid readers.
- [Experiments] In the experimental section, the communication overhead is reported but without explicit comparison to the baseline shuffle-DP protocol's communication cost; adding this would strengthen the efficiency claim.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§4] §4 (Transformation Construction): the proof that the transformed protocol preserves the original shuffle-DP privacy guarantee when the attacker bound is exactly met should be expanded; the current sketch leaves open whether the union-preserving property is used in a way that could leak information about the number of attackers.
Authors: We agree that the proof sketch in Section 4 can be strengthened for clarity. The union-preserving property is used only to ensure that the transformed protocol's output distribution is simulatable from the original protocol's output without revealing the exact number of attackers; the simulation argument already shows that the view is indistinguishable from the honest case up to the bound. To eliminate any ambiguity, we will expand the proof with an explicit hybrid argument and a detailed simulation that demonstrates no leakage of the attacker count. This expansion will be included in the revised manuscript. revision: yes
-
Referee: [Theorem 5.1] Theorem 5.1 (Error Bound): the polylogarithmic factor is stated as O(log^k n) for constant attackers, but the dependence on the attacker bound k is not made explicit in the theorem statement; this is load-bearing for the 'constant number' claim and should be clarified with the precise exponent.
Authors: We thank the referee for pointing out this presentational issue. The bound in Theorem 5.1 is actually O((log n)^{O(k)}) (i.e., the exponent is linear in the constant attacker bound k). Because k is treated as a fixed constant, the overall error remains polylogarithmic in n. We will update the theorem statement and surrounding text to make the precise dependence on k explicit while preserving the asymptotic claim for constant k. This clarification will appear in the revision. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's central contribution is a general transformation framework that converts any shuffle-DP protocol into a poisoning-resilient version for union-preserving queries. The abstract describes asymptotic error equivalence in the attack-free case and a polylogarithmic overhead under a constant number of attackers, without any equations or steps that reduce a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction. The provided text references standard shuffle-DP primitives and demonstrates the framework on summation, frequency estimation, and range counting, all of which are presented as applications rather than tautological redefinitions. No load-bearing self-citation chain, ansatz smuggling, or renaming of known results appears in the abstract or reader's summary; the derivation therefore remains self-contained against external shuffle-DP benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The number of adversarial users is bounded by a constant
- domain assumption Queries are union-preserving
Reference graph
Works this paper leans on
-
[1]
Victor Balcer, Albert Cheu, Matthew Joseph, and Jieming Mao. 2021. Connecting Robust Shuffle Privacy and Pan-Privacy. InProceedings of the ACM-SIAM Symposium on Discrete Algorithms. 2384–2403
2021
-
[2]
Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. 2019. The Privacy Blanket of the Shuffle Model. InCRYPTO
2019
-
[3]
Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. 2020. Private Summation in the Multi-Message Shuffle Model. InProceedings of the ACM SIGSAC Conference on Computer and Communications Security. 657–676
2020
-
[4]
Joseph, and J
Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. D. Tygar. 2006. Can machine learning be secure?. InProceedings of the 2006 ACM Symposium on Information, Computer and Communications Security. 16–25
2006
-
[5]
Amos Beimel, Kobbi Nissim, and Eran Omri. 2008. Distributed private data analysis: Simultaneously solving how and what. InAdvances in Cryptology–CRYPTO 2008: 28th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 17-21, 2008. Proceedings 28. Springer, 451–468
2008
-
[6]
Rikke Bendlin, Ivan Damgård, Claudio Orlandi, and Sarah Zakarias. 2011. Semi-homomorphic Encryption and Multiparty Computation. InAdvances in Cryptology – EUROCRYPT 2011. 169–188
2011
-
[7]
Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning attacks against support vector machines. InProceedings of the 29th International Coference on International Conference on Machine Learning. 1467–1474
2012
-
[8]
Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. 2017. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th symposium on operating systems principles. 441–459
2017
-
[9]
Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. 2021. Data Poisoning Attacks to Local Differential Privacy Protocols. In30th USENIX Security Symposium (USENIX Security 21). 947–964
2021
-
[10]
TH Hubert Chan, Elaine Shi, and Dawn Song. 2012. Optimal lower bound for differentially private multi-party aggregation. InEuropean Symposium on Algorithms. Springer, 277–288
2012
-
[11]
David Chaum. 1983. Blind signatures for untraceable payments. InAdvances in Cryptology: Proceedings of Crypto 82. Springer, 199–203
1983
-
[12]
David L. Chaum. 1981. Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms.Commun. ACM24 (1981), 84–90
1981
-
[13]
Lijie Chen, Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. 2021. On Distributed Differential Privacy and Counting Distinct Elements. InITCS. 56:1–56:18
2021
-
[14]
Albert Cheu, Adam Smith, and Jonathan Ullman. 2021. Manipulation Attacks in Local Differential Privacy. In2021 IEEE Symposium on Security and Privacy (SP). 883–900. doi:10.1109/SP40001.2021.00001
-
[15]
Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. 2019. Distributed differential privacy via shuffling. InAdvances in Cryptology–EUROCRYPT 2019: 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, May 19–23, 2019, Proceedings, Part I 38. Springer, 375–403
2019
-
[16]
Albert Cheu and Maxim Zhilyaev. 2022. Differentially private histograms in the shuffle model from fake users. In2022 IEEE Symposium on Security and Privacy (SP). IEEE, 440–457
2022
-
[17]
Graham Cormode, Tejas Kulkarni, and Divesh Srivastava. 2018. Marginal release under local differential privacy. In Proceedings of the 2018 International Conference on Management of Data. 131–146
2018
-
[18]
Ivan Damgård, Valerio Pastro, Nigel Smart, and Sarah Zakarias. 2012. Multiparty Computation from Somewhat Homomorphic Encryption. InAdvances in Cryptology – CRYPTO 2012. 643–662
2012
-
[19]
Danezis, R
G. Danezis, R. Dingledine, and N. Mathewson. 2003. Mixminion: design of a type III anonymous remailer protocol. In Symposium on Security and Privacy.2–15
2003
-
[20]
Roger Dingledine, Nick Mathewson, and Paul Syverson. 2004. Tor: The Second-Generation Onion Router. In13th USENIX Security Symposium (USENIX Security 04)
2004
-
[21]
Wei Dong, Juanru Fang, Ke Yi, Yuchao Tao, and Ashwin Machanavajjhala. 2022. R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys. InProceedings of the 2022 International Conference on Management of Data(Philadelphia, PA, USA)(SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 759–772. doi:10.1145...
-
[22]
Wei Dong, Qiyao Luo, and Ke Yi. 2023. Continual observation under user-level differential privacy. In2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2190–2207
2023
-
[23]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. InTheory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer, 265–284
2006
-
[24]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy.Foundations and trends® in theoretical computer science9, 3–4 (2014), 211–407
2014
-
[25]
Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 2019. Amplification by shuffling: From local to central differential privacy via anonymity. InProceedings of the Thirtieth Proc. ACM Manag. Data, Vol. 4, No. 1 (SIGMOD), Article 24. Publication date: February 2026. 24:32 Siyi Wang et al. Annual ACM-SI...
2019
-
[26]
Minghong Fang, Xiaoyu Cao, Jinyuan Jia, and Neil Gong. 2020. Local Model Poisoning Attacks to Byzantine-Robust Federated Learning. In29th USENIX Security Symposium (USENIX Security 20). 1605–1622
2020
-
[27]
Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. 2021. On the Power of Multiple Anonymous Messages: Frequency Estimation and Selection in the Shuffle Model of Differential Privacy. InAdvances in Cryptology – EUROCRYPT 2021. 463–488
2021
-
[28]
Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. 2024. Pure-DP Aggregation in the Shuffle Model: Error-Optimal and Communication-Efficient. In5th Conference on Information-Theoretic Cryptography, ITC 2024, Vol. 304. 4:1–4:13
2024
-
[29]
Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Rasmus Pagh. 2020. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. InInternational Conference on Machine Learning. PMLR, 3505–3514
2020
-
[30]
Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Amer Sinha. 2021. Differentially private aggregation in the shuffle model: Almost central accuracy in almost a single message. InInternational Conference on Machine Learning. PMLR, 3692–3701
2021
-
[31]
Goldreich, S
O. Goldreich, S. Micali, and A. Wigderson. 1987. How to play ANY mental game. InProceedings of the Nineteenth Annual ACM Symposium on Theory of Computing. 218–229
1987
-
[32]
Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, and Dan Zhang. 2016. Principled evaluation of differentially private algorithms using dpbench. InProceedings of the 2016 International Conference on Management of Data. 139–154
2016
-
[33]
Michael Hay, Vibhor Rastogi, Gerome Miklau, and Dan Suciu. 2010. Boosting the accuracy of differentially private histograms through consistency.Proc. VLDB Endow.3, 1–2 (sep 2010), 1021–1032. doi:10.14778/1920841.1920970
-
[34]
Yizhang He, Kai Wang, Wenjie Zhang, Xuemin Lin, and Ying Zhang. 2024. Common Neighborhood Estimation over Bipartite Graphs under Local Differential Privacy.Proceedings of the ACM on Management of Data2, 6 (2024), 1–26
2024
-
[35]
Yizhang He, Kai Wang, Wenjie Zhang, Xuemin Lin, Ying Zhang, and Wei Ni. 2025. Robust Privacy-Preserving Triangle Counting under Edge Local Differential Privacy.Proceedings of the ACM on Management of Data3, 3 (2025), 1–26
2025
-
[36]
Jacob Imola, Takao Murakami, and Kamalika Chaudhuri. 2022. Differentially private triangle and 4-cycle counting in the shuffle model. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 1505–1519
2022
-
[37]
Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. 2006. Cryptography from Anonymity. In2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06). 239–248. doi:10.1109/FOCS.2006.25
-
[38]
Kaggle. 2014. San Francisco City Employee Salary Data. https://www.kaggle.com/datasets/kaggle/sf-salaries. Accessed: 2025-05-18
2014
-
[39]
Kaggle. 2020. Monthly Salary of Public Worker in Brazil. https://www.kaggle.com/datasets/gustavomodelli/monthly- salary-of-public-worker-in-brazil. Accessed: 2025-05-18
2020
-
[40]
Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2011. What can we learn privately?SIAM J. Comput.40, 3 (2011), 793–826
2011
-
[41]
Daniel Kifer and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. InProceedings of the 2011 ACM SIGMOD International Conference on Management of data. 193–204
2011
-
[42]
Ron Kohavi et al. 1996. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid.. InKdd, Vol. 96. 202–207
1996
-
[43]
Tejas Kulkarni. 2019. Answering range queries under local differential privacy. InProceedings of the 2019 International Conference on Management of Data. 1832–1834
2019
-
[44]
Xiaoguang Li, Ninghui Li, Wenhai Sun, Neil Zhenqiang Gong, and Hui Li. 2023. Fine-grained Poisoning Attack to Local Differential Privacy Protocols for Mean and Variance Estimation. In32nd USENIX Security Symposium (USENIX Security 23). 1739–1756
2023
-
[45]
Xiaochen Li, Weiran Liu, Jian Lou, Yuan Hong, Lei Zhang, Zhan Qin, and Kui Ren. 2024. Local differentially private heavy hitter detection in data streams with bounded memory.Proceedings of the ACM on Management of Data2, 1 (2024), 1–27
2024
-
[46]
Zitao Li, Tianhao Wang, Milan Lopuhaä-Zwakenberg, Ninghui Li, and Boris Škoric. 2020. Estimating numerical distributions under local differential privacy. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 621–635
2020
-
[47]
Qiyao Luo, Yilei Wang, and Ke Yi. 2022. Frequency Estimation in the Shuffle Model with Almost a Single Message. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 2219–2232
2022
-
[48]
Qiyao Luo, Jianzhe Yu, Wei Dong, Quanqing Xu, Chuanhui Yang, and Ke Yi. 2025. RM2: Answer Counting Queries Efficiently under Shuffle Differential Privacy.Proc. ACM Manag. Data3, 3, Article 210 (2025), 24 pages
2025
-
[49]
Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 19–30. Proc. ACM Manag. Data, Vol. 4, No. 1 (SIGMOD), Article 24. Publication date: February 2026. Defense against Poisoning Attacks under Shuffle-DP 24:33
2009
-
[50]
Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. InProceedings of the 1st international conference on Scalable information systems. 1–es
2006
-
[51]
Wahbeh Qardaji, Weining Yang, and Ninghui Li. 2013. Understanding hierarchical methods for differentially private histograms.Proc. VLDB Endow.6, 14 (sep 2013), 1954–1965. doi:10.14778/2556549.2556576
-
[52]
Reed, P.F
M.G. Reed, P.F. Syverson, and D.M. Goldschlag. 1998. Anonymous connections and onion routing.IEEE Journal on Selected Areas in Communications(1998), 482–494
1998
-
[53]
Reiter and Aviel D
Michael K. Reiter and Aviel D. Rubin. 1998. Crowds: Anonymity for Web Transactions.ACM Transactions on Information and System Security(1998), 66–92
1998
-
[54]
Xuebin Ren, Liang Shi, Weiren Yu, Shusen Yang, Cong Zhao, and Zongben Xu. 2022. LDP-IDS: Local differential privacy for infinite data streams. InProceedings of the 2022 international conference on management of data. 1064–1077
2022
-
[55]
Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu. 2020. Data poisoning attacks against federated learning systems. InComputer security–ESORICs 2020: 25th European symposium on research in computer security, ESORICs 2020, guildford, UK, September 14–18, 2020, proceedings, part i 25. Springer, 480–501
2020
-
[56]
Wei Tong, Haoyu Chen, Jiacheng Niu, and Sheng Zhong. 2024. Data Poisoning Attacks to Locally Differentially Private Frequent Itemset Mining Protocols. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 3555–3569
2024
-
[57]
Tianhao Wang, Bolin Ding, Min Xu, Zhicong Huang, Cheng Hong, Jingren Zhou, Ninghui Li, and Somesh Jha. 2020. Improving utility and security of the shuffler-based differential privacy.Proc. VLDB Endow.13, 13 (Sept. 2020), 3545–3558. doi:10.14778/3424573.3424576
-
[58]
Xiao Wang, Samuel Ranellucci, and Jonathan Katz. 2017. Authenticated Garbling and Efficient Maliciously Secure Two-Party Computation. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 21–37
2017
-
[59]
Yongji Wu, Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. 2022. Poisoning Attacks to Local Differential Privacy Protocols for Key-Value Data. In31st USENIX Security Symposium (USENIX Security 22). 519–536
2022
-
[60]
Liantong Yu, Qingqing Ye, and Rong Du. 2025. PrivRM: A Framework for Range Mean Estimation under Local Differential Privacy.Proceedings of the ACM on Management of Data3, 3 (2025), 1–26
2025
-
[61]
Jun Zhang, Graham Cormode, Cecilia M Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2017. Privbayes: Private data release via bayesian networks.ACM Transactions on Database Systems (TODS)42, 4 (2017), 1–41
2017
-
[62]
Yuemin Zhang, Qingqing Ye, and Haibo Hu. 2025. Federated Heavy Hitter Analytics with Local Differential Privacy. Proceedings of the ACM on Management of Data3, 1 (2025), 1–27. Received July 2025; revised October 2025; accepted November 2025 Proc. ACM Manag. Data, Vol. 4, No. 1 (SIGMOD), Article 24. Publication date: February 2026
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.