Crowding Out The Noise: Algorithmic Collective Action Under Differential Privacy
Pith reviewed 2026-05-22 17:07 UTC · model grok-4.3
The pith
Differential privacy adds noise that forces larger collectives to overcome reduced success in steering AI models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under differentially private stochastic gradient descent, the success of algorithmic collective action is subject to lower bounds that increase with stronger privacy guarantees and decrease with larger collective sizes, as the added noise crowds out the coordinated signal from modified data points.
What carries the argument
Lower bounds on collective success probability expressed as a function of collective size and DP-SGD privacy parameters epsilon and delta.
If this is right
- Stronger privacy parameters require proportionally larger collectives to maintain the same influence level.
- Firms using DP-SGD gain indirect protection against coordinated user interventions.
- Participation costs and utility gains determine whether collectives form under private training.
- Trends hold across multiple datasets in simulations of classifier training.
Where Pith is reading between the lines
- Privacy mechanisms may shield models from corrective feedback that collectives would otherwise provide.
- Regulators could weigh collective steering effects when setting privacy standards for public data uses.
- Detection of coordinated modifications or alternative noise sources might interact differently with these bounds.
Load-bearing premise
Collective members can perfectly coordinate their data modifications without detection or interference from the training process.
What would settle it
An experiment showing that model influence from coordinated data changes does not decline as group size shrinks or privacy noise increases, contrary to the derived bounds.
Figures
read the original abstract
The integration of AI into daily life has generated considerable attention and excitement, while also raising concerns about automating algorithmic harms and re-entrenching existing social inequities. While the responsible deployment of trustworthy AI systems is a worthy goal, there are many possible ways to realize it, from policy and regulation to improved algorithm design and evaluation. In fact, since AI trains on social data, there is even a possibility for everyday users, citizens, or workers to directly steer the AI system's behavior through Algorithmic Collective Action, by deliberately modifying the data they share with a platform to drive its learning process in their favor. This paper considers how these grassroots efforts to influence AI interact with methods used by AI firms and governments to improve model trustworthiness. In particular, we focus on the setting where the AI firm deploys a differentially private model, motivated by the growing regulatory focus on privacy and data protection. We investigate how the use of Differentially Private Stochastic Gradient Descent (DP-SGD) affects the collective's ability to influence the learning process. Our findings show that while differential privacy protects individual data, it introduces challenges for effective algorithmic collective action. We establish this trade-off formally by characterizing lower bounds on the success of algorithmic collective action under differential privacy as a function of the collective's size and the firm's privacy parameters. We then verify these trends experimentally by simulating collective action during the training of deep neural network classifiers across several datasets. Finally, we perform a stylized economic analysis of privacy costs to integrate additional incentives, analyzing how utility and participation costs influence the formation of collectives under private training regimes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines the interaction between differentially private training (via DP-SGD) and algorithmic collective action, in which users coordinate data modifications to steer model behavior. It formally derives lower bounds on collective success as a function of collective size n and privacy parameters (ε, δ), verifies the resulting trends via simulations of collective action during deep neural network training on multiple datasets, and presents a stylized economic analysis of how privacy costs affect participation incentives and collective formation.
Significance. If the lower bounds are robust to standard DP-SGD sampling and the experimental trends are reproducible, the work usefully quantifies a privacy–agency trade-off with potential relevance for data-protection policy. The formal characterization of bounds and the experimental verification across datasets are clear strengths; the economic analysis adds a useful incentive perspective. The result is plausible but its practical scope depends on explicit treatment of batch sampling and coordination assumptions.
major comments (2)
- [Formal lower-bound characterization] Formal lower-bound section: the characterization of lower bounds on collective success as a function of n and (ε, δ) does not explicitly address random batch sampling and per-example clipping in DP-SGD. When n ≪ B the per-step inclusion probability is only n/N and clipping caps each contribution; the bound therefore requires either an in-expectation derivation over sampling or an assumption that the collective can force sufficient batch representation. Neither is stated, yet both are load-bearing for the claim that the bounds apply to realistic training pipelines.
- [Experimental section] Experimental verification: the simulations of collective action during DNN training report trends consistent with the bounds, but the manuscript provides insufficient detail on baseline selection, data-exclusion criteria, and error bars or statistical tests. Without these, it is difficult to determine whether the observed effects are driven by the privacy noise or by post-hoc modeling choices.
minor comments (2)
- [Abstract] Abstract: the sentence describing the stylized economic analysis could briefly indicate the direction of the main incentive finding (e.g., how privacy costs affect participation thresholds).
- [Throughout] Notation: collective size is sometimes denoted n and sometimes N; a single consistent symbol and a clear table of notation would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which help clarify the scope and presentation of our results on the interaction between differential privacy and algorithmic collective action. We address each major comment below and will revise the manuscript accordingly to strengthen the formal and experimental sections.
read point-by-point responses
-
Referee: [Formal lower-bound characterization] Formal lower-bound section: the characterization of lower bounds on collective success as a function of n and (ε, δ) does not explicitly address random batch sampling and per-example clipping in DP-SGD. When n ≪ B the per-step inclusion probability is only n/N and clipping caps each contribution; the bound therefore requires either an in-expectation derivation over sampling or an assumption that the collective can force sufficient batch representation. Neither is stated, yet both are load-bearing for the claim that the bounds apply to realistic training pipelines.
Authors: We agree that an explicit treatment of stochastic batch sampling and per-example clipping would strengthen the applicability of the lower bounds to standard DP-SGD pipelines. Our derivation models the privacy noise added to the aggregated gradient update and characterizes the minimum collective size n needed for the mean shift induced by coordinated modifications to exceed this noise with high probability. To address the concern, we will revise the formal section to include an in-expectation analysis over the random sampling process (e.g., Poisson sampling with rate n/N). We will show that the success probability lower bound continues to hold in expectation when the collective's expected per-step contribution overcomes the scaled privacy noise, and we will discuss clipping by noting that under bounded-norm assumptions typical in DP-SGD, the collective signal is not disproportionately clipped relative to individual contributions. This revision will clarify the conditions under which the bounds apply to realistic training. revision: yes
-
Referee: [Experimental section] Experimental verification: the simulations of collective action during DNN training report trends consistent with the bounds, but the manuscript provides insufficient detail on baseline selection, data-exclusion criteria, and error bars or statistical tests. Without these, it is difficult to determine whether the observed effects are driven by the privacy noise or by post-hoc modeling choices.
Authors: We will expand the experimental section in the revision to provide the requested details. We will explicitly describe the baselines, including non-private SGD training and control collectives that apply random (non-coordinated) data modifications. Data-exclusion criteria will be clarified as following only the standard preprocessing pipelines for the datasets (e.g., MNIST, CIFAR-10, and others), with no additional exclusions. We will add error bars showing standard deviation across multiple random seeds (at least 5–10 runs per configuration) and include statistical tests such as paired t-tests or bootstrap confidence intervals to assess the significance of trends in collective success probability versus n and privacy parameters (ε, δ). These changes will improve reproducibility and help isolate the effects of privacy noise. revision: yes
Circularity Check
No significant circularity; lower bounds derived from standard DP-SGD noise and sampling properties.
full rationale
The paper characterizes lower bounds on collective action success under DP as a function of collective size n and privacy parameters (ε, δ). No equations, definitions, or self-citations are presented that reduce the claimed bounds to fitted inputs, self-referential definitions, or prior author work by construction. The derivation is described as following from standard DP-SGD mechanisms (Gaussian noise after clipping and averaging), which are externally verifiable and independent of the target result. The skeptic's concerns address assumption realism (batch sampling, clipping) rather than any mathematical reduction of the bound to its own inputs. This is the most common honest finding for papers whose central claim rests on standard privacy analysis.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Collective members can coordinate data modifications to produce a consistent directional effect on the model gradient.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 2 ... ST(α,σ,C) ≥ −(1−ηB(α,C))T∥θ0−θ∗∥−σC·f1(B(α,C),T,η)·f2(d,δ)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DPSGD ... gDP(θt) = 1/|Bt| (∑ gclip_i + N(0,σ²C²I))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Stochastic wage suppression on gig platforms and how to organize against it
Platforms suppress wages to O(log M / M) of total labor cost via stochastic posted pricing under worker uncertainty, but targeted low-cost worker coalitions force linear spending.
Reference graph
Works this paper leans on
-
[1]
Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, October
work page 2016
-
[2]
doi: 10.1145/2976749.2978318. URL http://dx.doi.org/10.1145/2976749.2978318. Abien Fred Agarap. Deep learning using rectified linear units (relu),
-
[3]
Deep Learning using Rectified Linear Units (ReLU)
URLhttps://arxiv.org/abs/ 1803.08375. Ulrich Aïvodji, Hiromi Arai, Olivier Fortineau, Sébastien Gambs, Satoshi Hara, and Alain Tapp. Fairwashing: the risk of rationalization. InInternational Conference on Machine Learning, pages 161–170. PMLR,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds
URLhttps://arxiv.org/abs/1405.7085. Omri Ben-Dov, Jake Fawkes, Samira Samadi, and Amartya Sanyal. The role of learning algorithms in collective action. InInternational Conference on Machine Learning, pages 3443–3461. PMLR,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
doi: 10.1080/ 01621459.2015.1050028. URL https://doi.org/10.1080/01621459.2015.1050028. Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258,
-
[6]
Alessandra Calvi, Gianclaudio Malgieri, and Dimitris Kotzinos. The unfair side of privacy enhancing technologies: Addressing the trade-offs between pets and fairness. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 2047–2059,
work page 2024
-
[7]
Membership inference attacks from first principles
Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. In2022 IEEE symposium on security and privacy (SP), pages 1897–1914. IEEE,
work page 1914
-
[8]
Building, shifting, & employing power: A taxonomy of responses from below to algorithmic harm
Alicia DeVrio, Motahhare Eslami, and Kenneth Holstein. Building, shifting, & employing power: A taxonomy of responses from below to algorithmic harm. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 1093–1106,
work page 2024
-
[9]
Our data, ourselves: Privacy via distributed noise generation
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Serge Vaudenay, editor,Advances in Cryptology - EUROCRYPT 2006, pages 486–503, Berlin, Heidelberg,
work page 2006
-
[10]
European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of ...
work page 2016
-
[11]
Julien Ferry, Ulrich Aïvodji, Sébastien Gambs, Marie-José Huguet, and Mohamed Siala
URL https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679. Julien Ferry, Ulrich Aïvodji, Sébastien Gambs, Marie-José Huguet, and Mohamed Siala. Sok: Taming the triangle–on the interplays between fairness, interpretability and privacy in machine learning.arXiv preprint arXiv:2312.16191,
-
[12]
doi: 10.6092/ISSN.1561-8048/20838. URLhttps://illej. unibo.it/article/view/20838. Publisher: Italian Labour Law e-Journal. Government of Canada. Personal information protection and electronic documents act.https://laws-lois. justice.gc.ca/eng/acts/P-8.6/,
-
[13]
URLhttps://laws-lois.justice.gc.ca/eng/acts/P-8.6/. S.C. 2000, c
work page 2000
-
[14]
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
URLhttps://arxiv.org/abs/1708.06733. Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361,
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[16]
Bogdan Kulynych, Mohammad Yaghini, Giovanni Cherubin, Michael Veale, and Carmela Troncoso
URLhttps://www.cs.toronto.edu/~kriz/ learning-features-2009-TR.pdf. Bogdan Kulynych, Mohammad Yaghini, Giovanni Cherubin, Michael Veale, and Carmela Troncoso. Disparate vulnerability to membership inference attacks.Proceedings on Privacy Enhancing Technologies, 2022:460 – 480,
work page 2009
-
[17]
doi: 10.1214/aos/1015957395. URLhttps://doi.org/10.1214/aos/ 1015957395. Yann LeCun and Corinna Cortes. MNIST handwritten digit database
-
[18]
URLhttp://yann.lecun. com/exdb/mnist/. 15 Zelun Luo, Daniel J. Wu, Ehsan Adeli, and Li Fei-Fei. Scalable differential privacy with sparse network finetuning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5057–5066,
work page 2021
-
[19]
Derf: Decomposed radiance fields,
doi: 10.1109/CVPR46437.2021.00502. Yuzhe Ma, Xiaojin Zhu, and Justin Hsu. Data poisoning against differentially-private learners: Attacks and defenses. In International Joint Conference on Artificial Intelligence,
-
[20]
URLhttp://dx.doi.org/10.1109/ CSF.2017.11
doi: 10.1109/csf.2017.11. URLhttp://dx.doi.org/10.1109/ CSF.2017.11. Sérgio Moro, Paulo Cortez, and Paulo Rita. Bank Marketing. UCI Machine Learning Repository,
-
[21]
DOI: https://doi.org/10.24432/C5K306. MANCUR OLSON. The Logic of Collective Action: Public Goods and the Theory of Groups, Second Printing with a New Preface and Appendix. Harvard University Press,
-
[22]
URL https: //ojs.aaai.org/index.php/AAAI/article/view/17123
doi: 10.1609/aaai.v35i10.17123. URL https: //ojs.aaai.org/index.php/AAAI/article/view/17123. Vardan Papyan, XY Han, and David L Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40):24652–24663,
-
[23]
Renee Marie Shelby, Shalaleh Rismani, Kathryn Henne, AJung Moon, Negar Rostamzadeh, Paul Nicholas, N’Mah Yilla-Akbari, Jess Gallegos, Andrew Smart, Emilio Garcia, and Gurleen Virk. Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction.Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society,
work page 2023
-
[24]
Decline now: A combinatorial model for algorithmic collective action.ArXiv, abs/2410.12633,
Dorothee Sigg, Moritz Hardt, and Celestine Mendler-Dünner. Decline now: A combinatorial model for algorithmic collective action.ArXiv, abs/2410.12633,
-
[25]
California privacy rights act of 2020 (cpra).https://oag.ca.gov/privacy/ccpa,
16 State of California. California privacy rights act of 2020 (cpra).https://oag.ca.gov/privacy/ccpa,
work page 2020
-
[26]
(un) informed consent: Studying gdpr consent notices in the field
Christine Utz, Martin Degeling, Sascha Fahl, Florian Schaub, and Thorsten Holz. (un) informed consent: Studying gdpr consent notices in the field. InProceedings of the 2019 acm sigsac conference on computer and communications security, pages 973–990,
work page 2019
-
[27]
Nicholas Vincent, Hanlin Li, Nicole Tilly, Stevie Chancellor, and Brent J. Hecht. Data leverage: A framework for empowering the public in its relationship with technology companies.Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency,
work page 2021
-
[28]
Mohammad Yaghini, Patty Liu, Franziska Boenisch, and Nicolas Papernot. Learning with impartiality to walk on the pareto frontier of fairness, privacy, and utility.arXiv preprint arXiv:2302.09183,
-
[29]
Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, and Huishuai Zhang. Individual privacy accounting for differentially private stochastic gradient descent.arXiv preprint arXiv:2206.02617,
-
[30]
Attribute privacy: Framework and mechanisms
Wanrong Zhang, Olga Ohrimenko, and Rachel Cummings. Attribute privacy: Framework and mechanisms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 757–766,
work page 2022
-
[31]
Let Y1, . . . , YD ∼ N (0, σ2) be independent Gaussian random variable, and define the scaled chi-squared distribution as, S = ∥Y ∥2 = vuut DX i=1 Y 2 i Then, for anyδ ∈ (0, 1), with probability of1 − δ, S ≤ σ √ D + p 2 log 1/δ (1) Proof. Since each Yi ∼ N (0, σ2), we can writeYi = σZi, where Zi ∼ N (0, 1). Then, S = vuut DX i=1 Y 2 i = σ vuut DX i=1 Z2 i...
work page 2000
-
[32]
With ξc min = minλ∈[0,1] ξ(λθ0 + (1 − λ)θ∗), and using parameters update equation, we can derive an upper 18 bound on the difference between the learned and optimal parameter as follows: ∥θT − θ∗∥ ≤ θT −1 − η αξc(θT −1) (θT −1 − θ∗) + N (0, σ2C2I) − θ∗ = (1 − ηαξ c(θT −1)) (θT −1 − θ∗) − ηN (0, σ2C2I) ≤ (1 − ηαξ c min) (θT −1 − θ∗) − ηN (0, σ2C2I) (3) ≤ (...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.