pith. sign in

arxiv: 2606.18312 · v1 · pith:KEQIK4KEnew · submitted 2026-06-16 · 💻 cs.CR · cs.DC· cs.LG

TIGER: Inverting Transformer Gradients via Embedding-Subspace Distance Optimization

Pith reviewed 2026-06-27 00:22 UTC · model grok-4.3

classification 💻 cs.CR cs.DCcs.LG
keywords gradient inversionfederated learningtransformerprivacy attacksubspace optimizationdifferential privacyinput reconstructionattention gradients
0
0 comments X

The pith

TIGER recovers input tokens from transformer gradients by optimizing their embeddings to the low-rank attention subspace.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TIGER to invert gradients sent in federated learning of transformers. It extracts a low-rank subspace from attention gradients and then runs a continuous optimization that pulls candidate token embeddings toward that subspace. This replaces both expensive dummy-input matching and brittle discrete token tests. The method improves reconstruction on encoders and succeeds on decoders even when differential privacy adds noise to the gradients.

Core claim

TIGER identifies the subspace spanned by the low-rank attention gradients and directly minimizes the distance of token embeddings to this subspace, converting the reconstruction task into a differentiable objective that recovers original inputs more reliably than prior discrete or full-gradient-matching attacks.

What carries the argument

Embedding-subspace distance optimization that aligns candidate token embeddings with the low-rank signal extracted from attention gradients.

If this is right

  • On encoder-only models TIGER improves both reconstruction quality and runtime over existing attacks.
  • On decoder models TIGER succeeds in DP-defended federated learning where prior subspace attacks fail.
  • The continuous formulation avoids the instability of dummy-input optimization and the brittleness of discrete membership tests.
  • The approach scales to non-causal attention without requiring per-token discrete searches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The subspace signal may remain informative in other architectures that exhibit low-rank gradient structure.
  • Defenses could be strengthened by specifically disrupting the low-rank component of attention gradients.
  • Cumulative leakage across multiple rounds of federated updates could be quantified by repeated application of the same subspace optimization.

Load-bearing premise

The low-rank attention gradients span a subspace that contains the true token embeddings even after numerical perturbations from quantization or differential privacy.

What would settle it

An experiment in which TIGER fails to recover any correct tokens from DP-protected gradients on a decoder-only transformer where the subspace no longer contains the true embeddings.

Figures

Figures reproduced from arXiv: 2606.18312 by Dimitar I. Dimitrov, Ivo Petrov, Martin Vechev, William Kalikman.

Figure 1
Figure 1. Figure 1: Overview of TIGER. (a) The shared span objective: the span distance D(v, PS ) between a vector v and a subspace S with projection matrix PS , computed by each SPAN block. (b) Decoder attack: embeddings are recovered position by position (i = 1 . . . Tb) using the forward loss ℓforw only. Red marks attacker-simulated quantities. (c) Encoder attack: the simulated input Z¯0 is optimized so that at every layer… view at source ↗
Figure 2
Figure 2. Figure 2: Closed-book MCQ accuracy of GEMMA-3-4B-IT fine-tuned on FictionalQA. to achieve nonzero ROUGE-1 even in the B = 1 setting. We therefore do not include LAMP in the reported decoder tables. 5.2.2. DAGER. For the decoder setting, we adapt DAGER to the models used in our evaluation. We follow the original paper by setting the first-layer filtering threshold to 10−5 , but we relax the second-layer threshold to … view at source ↗
Figure 3
Figure 3. Figure 3: ROUGE-1 and mean per-attack wall-clock as a [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
read the original abstract

Federated learning allows multiple clients to jointly train a shared model by sending gradient updates to a central server while keeping raw inputs local. However, prior gradient inversion attacks show that these updates can reveal enough information to reconstruct client inputs. Existing attacks on transformers either optimize dummy inputs to match the true client updates, which is costly and unstable for modern models, or exploit the low rank of attention gradients to identify a subspace containing the true layer embeddings, followed by a discrete membership test for candidate tokens. However, this token test is brittle under numerical noise, i.e., from quantization or Differential Privacy (DP), and scales poorly for encoder models with non-causal attention. We introduce TIGER, a continuous gradient inversion attack that turns this subspace signal into a differentiable objective. Instead of searching over tokens or matching full gradients, TIGER directly optimizes token embeddings to minimize their distance to the subspace. Our experiments demonstrate that on encoder-only models, TIGER substantially improves both reconstruction quality and runtime over existing attacks, while on decoder models, TIGER is more robust than prior subspace-based attacks, enabling the first successful reconstructions in DP-defended federated learning settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces TIGER, a continuous gradient inversion attack for transformer models in federated learning settings. It exploits the low-rank structure of attention gradients to identify an embedding subspace and formulates a differentiable objective that optimizes token embeddings to minimize their distance to this subspace, avoiding both full gradient matching and brittle discrete membership tests. The central claim is that this yields substantially better reconstruction quality and runtime on encoder-only models and greater robustness on decoder models, including the first successful reconstructions under differential privacy noise.

Significance. If the experimental results hold, TIGER provides a more stable and scalable alternative to prior subspace-based and optimization-based inversion attacks on transformers. The shift to a continuous embedding-distance objective addresses a key brittleness under numerical perturbations such as quantization or DP, which is relevant for assessing privacy leakage in federated learning deployments of modern language models.

minor comments (2)
  1. [Abstract] Abstract: the claim of 'substantially improves both reconstruction quality and runtime' and 'first successful reconstructions in DP-defended federated learning settings' is stated without any quantitative metrics, baseline names, or dataset sizes; the results section should be cross-referenced in the abstract or a table of key numbers added for immediate assessment.
  2. The description of the embedding-subspace distance objective would benefit from an explicit equation or pseudocode block showing how the low-rank attention gradient matrix is used to construct the subspace projector and how the distance is computed during optimization.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work and the recommendation for minor revision. The provided summary accurately captures the core contribution of TIGER as a continuous embedding-subspace optimization approach for gradient inversion.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central contribution is TIGER, which converts an existing low-rank subspace signal (from prior subspace-based attacks) into a new differentiable embedding-distance optimization objective. This is presented as an independent algorithmic extension rather than a re-derivation or fit of prior results. No equations, predictions, or uniqueness claims in the provided abstract reduce by construction to fitted inputs, self-citations, or ansatzes; the method is described as building on but extending the subspace idea with continuous optimization that improves robustness under noise. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated beyond the domain assumption of low-rank attention gradients.

axioms (1)
  • domain assumption Attention gradients in transformers exhibit low rank, allowing identification of an embedding subspace containing true tokens.
    Invoked to justify the subspace signal used by TIGER and prior attacks.

pith-pipeline@v0.9.1-grok · 5744 in / 1211 out tokens · 30917 ms · 2026-06-27T00:22:51.173770+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 21 canonical work pages · 3 internal anchors

  1. [1]

    Communication-efficient learning of deep networks from decentralized data,

    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inProceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, ser. Proceedings of Machine Learning Research, A. Singh and X. J...

  2. [2]

    Fate-llm: A industrial grade federated learning framework for large language models.arXiv preprint arXiv:2310.10049, 2023

    T. Fan, Y . Kang, G. Ma, W. Chen, W. Wei, L. Fan, and Q. Yang, “FATE-LLM: A industrial grade federated learning framework for large language models,”CoRR, vol. abs/2310.10049, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.10049

  3. [3]

    E., & Politis, I

    C. Chen, X. Feng, Y . Li, L. Lyu, J. Zhou, X. Zheng, and J. Yin, “Integration of large language models and federated learning,” Patterns, vol. 5, no. 12, p. 101098, 2024. [Online]. Available: https://doi.org/10.1016/j.patter.2024.101098

  4. [4]

    Federated Large Language Models: Current Progress and Future Directions

    Y . Yao, J. Zhang, J. Wu, C. Huang, Y . Xia, T. Yu, R. Zhang, S. Kim, R. A. Rossi, A. Li, L. Yao, J. J. McAuley, Y . Chen, and C. Joe-Wong, “Federated large language models: Current progress and future directions,”CoRR, vol. abs/2409.15723, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2409.15723

  5. [5]

    Federated large language models: Feasibility, robustness, security and future directions,

    W. Jiang, Y . Luo, G. Deng, S. Chen, X. Yang, S. Wu, X. Gao, L. Liu, and S. Fu, “Federated large language models: Feasibility, robustness, security and future directions,”CoRR, vol. abs/2505.08830, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2505.08830

  6. [6]

    Federated multilingual models for medical transcript analysis,

    A. Manoel, M. del Carmen Hipolito Garcia, T. Baumel, S. Su, J. Chen, R. Sim, D. Miller, D. Karmon, and D. Dimitriadis, “Federated multilingual models for medical transcript analysis,” inConference on Health, Inference, and Learning, CHIL 2023, Broad Institute of MIT and Harvard (Merkin Building), 415 Main Street, Cambridge, MA, USA, ser. Proceedings of Ma...

  7. [7]

    Federated learning with layer skipping: Efficient training of large language models for healthcare NLP,

    L. Zhang and Y . Li, “Federated learning with layer skipping: Efficient training of large language models for healthcare NLP,”CoRR, vol. abs/2504.10536, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2504.10536

  8. [8]

    A federated and parameter-efficient framework for large language model training in medicine,

    A. Li, Y . Chen, W. Long, Y . Yin, Y . Hu, H. Kim, W. Zhou, Y . Zhou, H. Peng, Y . Ren, X. Ai, Z. Qin, M. Hu, X. Li, H. Yu, Y . Tham, L. Ohno-Machado, H. Xu, and Q. Chen, “A federated and parameter-efficient framework for large language model training in medicine,”CoRR, vol. abs/2601.22124, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2601.22124

  9. [9]

    Privacy-first health research with federated learning,

    A. Sadilek, L. Liu, D. Nguyen, M. Kamruzzaman, S. Serghiou, B. Rader, A. Ingerman, S. Mellem, P. Kairouz, E. O. Nsoesie, J. Macfarlane, A. Vullikanti, M. V . Marathe, P. Eastham, J. S. Brownstein, B. A. y Arcas, M. D. Howell, and J. Hernandez, “Privacy-first health research with federated learning,”npj Digit. Medicine, vol. 4, 2021. [Online]. Available: h...

  10. [10]

    FEDLEGAL: the first real-world federated learning benchmark for legal NLP,

    Z. Zhang, X. Hu, J. Zhang, Y . Zhang, H. Wang, L. Qu, and Z. Xu, “FEDLEGAL: the first real-world federated learning benchmark for legal NLP,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds. Assoc...

  11. [11]

    Fedjudge: Federated legal large language model,

    L. Yue, Q. Liu, Y . Du, W. Gao, Y . Liu, and F. Yao, “Fedjudge: Federated legal large language model,” inDatabase Systems for Advanced Applications - 29th International Conference, DASFAA 2024, Gifu, Japan, July 2-5, 2024, Proceedings, Part V, ser. Lecture Notes in Computer Science, M. Onizuka, J. Lee, Y . Tong, C. Xiao, Y . Ishikawa, S. Amer-Yahia, H. V ...

  12. [12]

    Flowertune: A cross-domain benchmark for federated fine-tuning of large language models,

    Y . Gao, M. R. Scamarcia, J. Fernández-Marqués, M. Naseri, C. S. Ng, D. Stripelis, Z. Li, T. Shen, J. Bai, D. Chen, Z. Zhang, R. Hu, I. Song, K. Lee, H. Jia, T. Dang, J. Wang, Z. Liu, D. J. Beutel, L. Lyu, and N. D. Lane, “Flowertune: A cross-domain benchmark for federated fine-tuning of large language models,”CoRR, vol. abs/2506.02961,

  13. [13]

    Available: https://doi.org/10.48550/arXiv.2506.02961

    [Online]. Available: https://doi.org/10.48550/arXiv.2506.02961

  14. [14]

    2021 , isbn =

    D. Byrd and A. Polychroniadou, “Differentially private secure multi- party computation for federated learning in financial applications,” inICAIF ’20: The First ACM International Conference on AI in Finance, New York, NY, USA, October 15-16, 2020, T. Balch, Ed. ACM, 2020, pp. 16:1–16:9. [Online]. Available: https://doi.org/10.1145/3383455.3422562

  15. [15]

    Deep leakage from gradients,

    L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 2019, pp. 14 747–14 756...

  16. [16]

    idlg: Improved deep leakage from gradients,

    B. Zhao, K. R. Mopuri, and H. Bilen, “idlg: Improved deep leakage from gradients,”CoRR, vol. abs/2001.02610, 2020. [Online]. Available: http://arxiv.org/abs/2001.02610

  17. [17]

    Inverting gradients - how easy is it to break privacy in federated learning?

    J. Geiping, H. Bauermeister, H. Dröge, and M. Moeller, “Inverting gradients - how easy is it to break privacy in federated learning?” inAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. L...

  18. [18]

    See through gradients: Image batch recovery via gradinversion,

    H. Yin, A. Mallya, A. Vahdat, J. M. Álvarez, J. Kautz, and P. Molchanov, “See through gradients: Image batch recovery via gradinversion,” inIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, 2021, pp. 16 337–16 346. [Online]. Available: https://openaccess.thecvf.com/content/...

  19. [19]

    Gradient inversion with generative image prior,

    J. Jeon, J. Kim, K. Lee, S. Oh, and J. Ok, “Gradient inversion with generative image prior,” inAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y . N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, pp. 29 898–29 ...

  20. [20]

    Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task,

    A. Hatamizadeh, H. Yin, H. Roth, W. Li, J. Kautz, D. Xu, and P. Molchanov, “Gradvit: Gradient inversion of vision transformers,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18- 24, 2022. IEEE, 2022, pp. 10 011–10 020. [Online]. Available: https://doi.org/10.1109/CVPR52688.2022.00978

  21. [21]

    Cocktail party attack: Breaking aggregation-based privacy in federated learning using independent component analysis,

    S. Kariyappa, C. Guo, K. Maeng, W. Xiong, G. E. Suh, M. K. Qureshi, and H. S. Lee, “Cocktail party attack: Breaking aggregation-based privacy in federated learning using independent component analysis,” inInternational Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, ser. Proceedings of Machine Learning Research, A. Kraus...

  22. [22]

    TAG: gradient attack on transformer-based language models,

    J. Deng, Y . Wang, J. Li, C. Wang, C. Shang, H. Liu, S. Rajasekaran, and C. Ding, “TAG: gradient attack on transformer-based language models,” inFindings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, ser. Findings of ACL, M. Moens, X. Huang, L. Specia, and S. W. Yih, Eds...

  23. [23]

    LAMP: extracting text from gradients with language model priors,

    M. Balunovic, D. I. Dimitrov, N. Jovanovic, and M. T. Vechev, “LAMP: extracting text from gradients with language model priors,” inAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, S. Koyejo, S. Mohamed, A. Agarwal, D. B...

  24. [24]

    Recovering private text in federated learning of language models,

    S. Gupta, Y . Huang, Z. Zhong, T. Gao, K. Li, and D. Chen, “Recovering private text in federated learning of language models,” inAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, S. Koyejo, S. Mohamed, A. Agarwal, D. Bel...

  25. [25]

    Available: http://papers.nips.cc/paper_files/paper/2022/ hash/35b5c175e139bff5f22a5361270fce87-Abstract-Conference.html

    [Online]. Available: http://papers.nips.cc/paper_files/paper/2022/ hash/35b5c175e139bff5f22a5361270fce87-Abstract-Conference.html

  26. [26]

    Beyond gradient and priors in privacy attacks: Leveraging pooler layer inputs of language models in federated learning,

    J. Li, S. Liu, and Q. Lei, “Beyond gradient and priors in privacy attacks: Leveraging pooler layer inputs of language models in federated learning,”CoRR, vol. abs/2312.05720, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2312.05720

  27. [27]

    DAGER: exact gradient inversion for large language models,

    I. Petrov, D. I. Dimitrov, M. Baader, M. N. Müller, and M. T. Vechev, “DAGER: exact gradient inversion for large language models,” inAdvances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, A. Globersons, L. Mackey, D. Belgrave, A. Fa...

  28. [28]

    Gemma 3,

    Gemma Team, “Gemma 3,” 2025. [Online]. Available: https: //arxiv.org/abs/2503.19786

  29. [29]

    Embeddinggemma: Powerful and lightweight text representations,

    H. Schechter Vera, S. Dua, and EmbeddingGemma Team, “Embeddinggemma: Powerful and lightweight text representations,”

  30. [30]

    Available: https://arxiv.org/abs/2509.20354

    [Online]. Available: https://arxiv.org/abs/2509.20354

  31. [31]

    MedGemma Technical Report

    Google Research and Google DeepMind, “Medgemma technical report,”CoRR, vol. abs/2507.05201, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2507.05201

  32. [32]

    MedGemma 1.5 Technical Report

    A. Sellergren, C. Gao, F. Mahvar, T. Kohlberger, F. Jamil, M. Traverse, A. Tono, B. Sadjad, L. Yang, C. Lau, L. Yatziv, T. L. Chen, B. Sterling, K. Philbrick, R. Tiwari, Y . Liu, M. Jajoo, C. Sankarapu, S. Vispute, H. Purandare, A. B. Mishra, S. Schmidgall, T. Tu, A. Palepu, C. Park, T. Strother, R. Thapa, Y . Cheng, P. Singh, K. Black, Y . Matias, K. Cho...

  33. [33]

    How small can you go? lora fine-tuning 270m-8b models for merchant information extraction in financial transactions,

    D. Huang, T. Drietomsky, B. Barrett, and Z. Wang, “How small can you go? lora fine-tuning 270m-8b models for merchant information extraction in financial transactions,” 2026. [Online]. Available: https://arxiv.org/abs/2606.08051

  34. [34]

    An extended annotation scheme for personal-fact classification in dialogue,

    K. Zaitsev, “An extended annotation scheme for personal-fact classification in dialogue,” 2026, model: https://huggingface.co/adugeen/personal- facts-classifier-embeddinggemma-300m; Dataset: https://huggingface.co/datasets/adugeen/personal-facts-msc

  35. [35]

    Pointer sentinel mixture models,

    S. Merity, C. Xiong, J. Bradbury, and R. Socher, “Pointer sentinel mixture models,” 2016

  36. [36]

    ROUGE: A package for automatic evaluation of summaries,

    C.-Y . Lin, “ROUGE: A package for automatic evaluation of summaries,” inText Summarization Branches Out. Barcelona, Spain: Association for Computational Linguistics, Jul. 2004, pp. 74–81. [Online]. Available: https://aclanthology.org/W04-1013/

  37. [37]

    R-GAP: recursive gradient attack on privacy,

    J. Zhu and M. B. Blaschko, “R-GAP: recursive gradient attack on privacy,” in9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3- 7, 2021. OpenReview.net, 2021. [Online]. Available: https: //openreview.net/forum?id=RSU17UoKfJF

  38. [38]

    Towards general deep leakage in federated learning,

    J. Geng, Y . Mou, F. Li, Q. Li, O. Beyan, S. Decker, and C. Rong, “Towards general deep leakage in federated learning,”CoRR, vol. abs/2110.09074, 2021. [Online]. Available: https://arxiv.org/abs/2110.09074

  39. [39]

    SPEAR: exact gradient inversion of batches in federated learning,

    D. I. Dimitrov, M. Baader, M. N. Müller, and M. T. Vechev, “SPEAR: exact gradient inversion of batches in federated learning,” inAdvances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, A. Globersons, L. Mackey, D. Belgrave, A. Fan, U...

  40. [40]

    SPEAR++: scaling gradient inversion via sparsely-used dictionary learning,

    A. Bakarsky, D. I. Dimitrov, M. Baader, and M. T. Vechev, “SPEAR++: scaling gradient inversion via sparsely-used dictionary learning,”CoRR, vol. abs/2510.24200, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2510.24200

  41. [41]

    Robbing the fed: Directly obtaining private data in federated learning with modified models,

    L. H. Fowl, J. Geiping, W. Czaja, M. Goldblum, and T. Goldstein, “Robbing the fed: Directly obtaining private data in federated learning with modified models,” inThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. [Online]. Available: https://openreview.net/forum?id=fwzUgo0FM9v

  42. [42]

    Fishing for user data in large-batch federated learning via gradient magnification,

    Y . Wen, J. Geiping, L. Fowl, M. Goldblum, and T. Goldstein, “Fishing for user data in large-batch federated learning via gradient magnification,” inInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, and...

  43. [43]

    Decepticons: Corrupted transformers breach privacy in federated learning for language models,

    L. H. Fowl, J. Geiping, S. Reich, Y . Wen, W. Czaja, M. Goldblum, and T. Goldstein, “Decepticons: Corrupted transformers breach privacy in federated learning for language models,” inThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [Online]. Available: https://openreview.net/...

  44. [44]

    Panning for gold in federated learning: Targeted text extraction under arbitrarily large-scale aggregation,

    H. Chu, J. Geiping, L. H. Fowl, M. Goldblum, and T. Goldstein, “Panning for gold in federated learning: Targeted text extraction under arbitrarily large-scale aggregation,” inThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [Online]. Available: https://openreview.net/forum?i...

  45. [45]

    Minegrad: Gradient inversion attacks on loRA fine-tuning,

    H. U. Sami, S. Sen, and B. Guler, “Minegrad: Gradient inversion attacks on loRA fine-tuning,” inThe 29th International Conference on Artificial Intelligence and Statistics, 2026. [Online]. Available: https://openreview.net/forum?id=dD9XOZUpNc

  46. [46]

    Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task,

    J. Lu, X. S. Zhang, T. Zhao, X. He, and J. Cheng, “APRIL: finding the achilles’ heel on privacy for vision transformers,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18- 24, 2022. IEEE, 2022, pp. 10 041–10 050. [Online]. Available: https://doi.org/10.1109/CVPR52688.2022.00981

  47. [47]

    GRAIN: exact graph reconstruction from gradients,

    M. Drencheva, I. Petrov, M. Baader, D. I. Dimitrov, and M. T. Vechev, “GRAIN: exact graph reconstruction from gradients,” inThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [Online]. Available: https://openreview.net/forum?id=7bAjVh3CG3

  48. [48]

    In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp

    M. Abadi, A. Chu, I. J. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” inProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, E. R. Weippl, S. Katzenbeisser, C. Kruegel, A. C. Myers, and S. Halevi, Eds. ACM, 2016, pp. 308–318...

  49. [49]

    A fictional q&a dataset for studying memorization and knowledge acquisition,

    J. Kirchenbauer, J. Mongkolsupawan, Y . Wen, T. Goldstein, and D. Ippolito, “A fictional q&a dataset for studying memorization and knowledge acquisition,”CoRR, vol. abs/2506.05639, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2506.05639

  50. [50]

    Learning word vectors for sentiment analysis,

    A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y . Ng, and C. Potts, “Learning word vectors for sentiment analysis,” inProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, Oregon, USA: Association for Computational Linguistics, June 2011, pp. 142–150. [Online]. Available: http://...

  51. [52]

    Available: http://arxiv.org/abs/1712.07557

    [Online]. Available: http://arxiv.org/abs/1712.07557

  52. [53]

    Differentially private learning with adaptive clipping,

    G. Andrew, O. Thakkar, B. McMahan, and S. Ramaswamy, “Differentially private learning with adaptive clipping,” inAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y . N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 20...

  53. [54]

    Differentially private learning with per-sample adaptive clipping,

    T. Xia, S. Shen, S. Yao, X. Fu, K. Xu, X. Xu, and X. Fu, “Differentially private learning with per-sample adaptive clipping,” inThirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligen...

  54. [55]

    Federated learning: Strategies for improving communication efficiency,

    J. Kone ˇcný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,”CoRR, vol. abs/1610.05492, 2016. [Online]. Available: http://arxiv.org/abs/1610.05492

  55. [56]

    Brendan and Patel, Sarvar and Ramage, Daniel and Segal, Aaron and Seth, Karn , title =

    K. A. Bonawitz, V . Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for privacy-preserving machine learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, B. Thuraisingham, D. Ev...

  56. [57]

    Deep gradient compression: Reducing the communication bandwidth for distributed training,

    Y . Lin, S. Han, H. Mao, Y . Wang, and B. Dally, “Deep gradient compression: Reducing the communication bandwidth for distributed training,” in6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. [Online]. Available: https://openreview.net/fo...