What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference
Pith reviewed 2026-05-25 04:39 UTC · model grok-4.3
The pith
Split inference for LLMs allows servers to reconstruct client inputs from activations despite common defenses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ActInv reconstructs the client's input by solving an intermediate activation matching problem, yielding high-fidelity results even when Gaussian noise injection or activation sparsification is present. The Perturbation Amplification Factor quantifies that privacy vulnerability is not uniform across layers. PriPert improves protection by calibrating perturbation directions during backpropagation to maximize reconstruction error.
What carries the argument
ActInv, an attack that formulates reconstruction as an intermediate activation matching problem solved from server-received activations.
If this is right
- High-fidelity input reconstruction remains possible against Gaussian noise and sparsification defenses.
- Some layers exhibit natural resistance to reconstruction while others are highly susceptible.
- Calibrating perturbation directions during training measurably raises reconstruction error.
- PriPert maintains acceptable utility and overhead while strengthening privacy in split setups.
Where Pith is reading between the lines
- Choosing the split point at layers with higher PAF values could reduce leakage with no added client cost.
- If the server must be assumed to know the architecture, stronger client-side input protections become necessary.
- The non-uniform layer vulnerability suggests testing split inference on models where the cut is chosen adaptively per input.
Load-bearing premise
The server knows the client model architecture and can solve the activation matching problem without protections beyond the tested perturbations.
What would settle it
An experiment showing that ActInv produces only low-fidelity reconstructions when the server lacks the client architecture or when the client applies architecture modifications unknown to the server.
Figures
read the original abstract
The deployment of large language models (LLMs) on resource-constrained devices remains challenging, spurring interest in split inference, where models are partitioned between client and server to reduce computational burden and enhance privacy by transmitting only intermediate activations. However, the privacy-preserving capabilities of split inference, particularly in the context of LLMs, have not been exhaustively investigated. To fill this gap, we introduce ActInv, which solves an intermediate activation matching problem to reconstruct the client's input. Extensive evaluations demonstrate that ActInv achieves high-fidelity reconstructions, even in the presence of common perturbation-based defenses such as Gaussian noise injection and activation sparsification. To systematically understand this vulnerability, we develop Perturbation Amplification Factor (PAF), a metric for quantifying a layer's inherent resistance to reconstruction. Our analysis reveals that privacy vulnerability is not uniform across layers, with some layers being highly susceptible to leakage while others offer natural resistance. Furthermore, we demonstrate that defense effectiveness can be significantly improved by calibrating perturbation directions to maximize reconstruction error during backpropagation. Building on these insights, we design PriPert and conduct comprehensive evaluations, covering privacy, utility, and computational overhead, to demonstrate its effectiveness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that split inference for LLMs is vulnerable to input reconstruction by the server via ActInv, which solves an intermediate activation matching optimization problem to achieve high-fidelity recovery even under perturbation defenses such as Gaussian noise and sparsification. It introduces the Perturbation Amplification Factor (PAF) metric to quantify per-layer resistance to reconstruction and proposes PriPert, a defense that calibrates perturbation directions via backpropagation to increase reconstruction error, with evaluations addressing privacy, utility, and computational overhead.
Significance. If the empirical results hold under the stated assumptions, the work identifies concrete privacy limitations of split inference for LLMs and supplies both an analysis tool (PAF) and a practical defense (PriPert). This contributes to privacy-preserving ML by showing that activation transmission alone does not suffice for strong privacy and by offering layer-specific insights that could guide split-point selection.
major comments (2)
- [Method section (ActInv definition)] The ActInv formulation (described after the abstract and in the method section) requires the server to possess exact white-box knowledge of the client model architecture, layer dimensions, activation functions, and precise split point in order to instantiate the activation matching objective. This precondition is load-bearing for all reported fidelity claims; without it the optimization cannot be set up, yet the manuscript provides no black-box variant or evaluation under architecture obfuscation.
- [Evaluations section] Section on evaluations: the abstract asserts 'extensive evaluations' and effectiveness against listed defenses, but the provided description contains no quantitative reconstruction metrics, error bars, dataset sizes, ablation studies, or statistical tests. These details are required to substantiate the central claim that ActInv succeeds even under Gaussian noise and sparsification.
minor comments (2)
- [Abstract and introduction] Define the acronyms ActInv, PAF, and PriPert at first use in the main text rather than only in the abstract.
- [Method section] Clarify the precise mathematical formulation of the activation matching loss and the backpropagation used for PriPert calibration.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below, providing clarifications on the threat model and committing to improvements in the presentation of results.
read point-by-point responses
-
Referee: [Method section (ActInv definition)] The ActInv formulation (described after the abstract and in the method section) requires the server to possess exact white-box knowledge of the client model architecture, layer dimensions, activation functions, and precise split point in order to instantiate the activation matching objective. This precondition is load-bearing for all reported fidelity claims; without it the optimization cannot be set up, yet the manuscript provides no black-box variant or evaluation under architecture obfuscation.
Authors: ActInv is defined under the standard white-box threat model for split inference, in which the server knows the client model architecture, dimensions, activations, and split point because these are fixed at deployment time and typically public in such systems. This matches the assumptions in prior split-learning privacy analyses. We will revise the method section to state this assumption explicitly and discuss its scope, but we do not claim the attack applies under architecture obfuscation. revision: partial
-
Referee: [Evaluations section] Section on evaluations: the abstract asserts 'extensive evaluations' and effectiveness against listed defenses, but the provided description contains no quantitative reconstruction metrics, error bars, dataset sizes, ablation studies, or statistical tests. These details are required to substantiate the central claim that ActInv succeeds even under Gaussian noise and sparsification.
Authors: The evaluations section reports quantitative reconstruction metrics (MSE, PSNR, cosine similarity), results across multiple datasets with explicit sizes, ablations on noise variance and sparsity ratios, and comparisons against the listed defenses. To strengthen the presentation we will add error bars, dataset-size tables, additional ablation figures, and statistical significance tests in the revision. revision: yes
Circularity Check
No circularity: purely empirical attack/defense constructions with no self-referential derivations
full rationale
The paper introduces ActInv as an optimization-based reconstruction attack, defines PAF as a new metric, and evaluates PriPert as a defense. All claims rest on experimental results under stated assumptions (white-box architecture knowledge). No equations, predictions, or first-principles results are presented that reduce by construction to fitted inputs, self-citations, or renamed patterns. The derivation chain is self-contained as direct empirical measurement.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The server receives intermediate activations computed by the client-side portion of the split model.
invented entities (3)
-
ActInv
no independent evidence
-
PAF
no independent evidence
-
PriPert
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Stephen P Boyd and Lieven Vandenberghe. 2004.Convex optimization. Cambridge university press
work page 2004
-
[2]
Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, et al. 2023. Quantifying Memorization Across Neural Language Models. InICLR
work page 2023
-
[3]
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, et al. 2021. Extracting Training Data from Large Language Models. InUSENIX Security
work page 2021
-
[4]
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, et al. 2024. A Survey on Evaluation of Large Language Models.ACM Trans. Intell. Syst. Technol.15, 3 (2024), 39:1–39:45
work page 2024
-
[5]
Guanzhong Chen, Zhenghan Qin, Mingxin Yang, Yajie Zhou, Tao Fan, Tianyu Du, and Zenglin Xu. 2024. Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack. InCCS
work page 2024
-
[6]
Yuxuan Chen, Rongpeng Li, Xiaoxue Yu, Zhifeng Zhao, and Honggang Zhang
- [7]
-
[8]
Tian Dong, Yan Meng, Shaofeng Li, Guoxing Chen, Zhen Liu, and Haojin Zhu
-
[9]
In 34th USENIX Security Symposium
Depth Gives a False Sense of Privacy: LLM Internal States Inversion. In 34th USENIX Security Symposium
- [10]
-
[11]
Zecheng He, Tianwei Zhang, and Ruby B. Lee. 2019. Model inversion attacks against collaborative inference. InACSAC
work page 2019
-
[12]
1987.Introduction to numerical analysis
Francis Begnaud Hildebrand. 1987.Introduction to numerical analysis. Courier Corporation
work page 1987
-
[13]
Hongpeng Jin and Yanzhao Wu. 2025. CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration. InIEEE International Conference on Web Services
work page 2025
-
[14]
Eric Lehman, Sarthak Jain, Karl Pichotta, Yoav Goldberg, et al. 2021. Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?. InNAACL. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Mingyuan Fan, Y u Liu, Fuyi Wang, and Cen Chen
work page 2021
-
[15]
Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, et al. 2022. Explanations from Large Language Models Make Small Reasoners Better.CoRR(2022)
work page 2022
-
[16]
Yupei Liu, Yuqi Jia, Jinyuan Jia, and Neil Zhenqiang Gong. 2025. Evaluating LLM-based Personal Information Extraction and Countermeasures. InUsenix Security 2025
work page 2025
-
[17]
Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, et al. 2024. LLM-QAT: Data-Free Quantization Aware Training for Large Language Models. InFindings of the Association for Computational Linguistics
work page 2024
-
[18]
Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, et al . 2023. Analyzing Leakage of Personally Identifiable Information in Language Models. InS&P
work page 2023
-
[19]
Xinjian Luo, Ting Yu, and Xiaokui Xiao. 2025. Prompt Inference Attack on Distributed Large Language Model Inference Frameworks. InCCS. ACM, 1739– 1753
work page 2025
-
[20]
Xinyin Ma, Gongfan Fang, and Xinchao Wang. 2023. LLM-Pruner: On the Structural Pruning of Large Language Models. InNeurIPS
work page 2023
-
[21]
Peihua Mai, Ran Yan, Zhe Huang, Youjia Yang, and Yan Pang. 2024. Split-and- Denoise: Protect large language model inference with local differential privacy. In ICML
work page 2024
-
[22]
Morris, Wenting Zhao, Justin T
John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, et al . 2024. Language Model Inversion. InICLR
work page 2024
- [23]
-
[24]
OpenAI. 2023. GPT-4 Technical Report.CoRRabs/2303.08774 (2023). https: //doi.org/10.48550/arXiv.2303.08774
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774 2023
-
[25]
Dario Pasquini, Giuseppe Ateniese, and Massimo Bernaschi. 2021. Unleashing the Tiger: Inference Attacks on Split Learning. InCCS
work page 2021
-
[26]
Poirot, Praneeth Vepakomma, Ken Chang, Jayashree Kalpathy-Cramer, et al
Maarten G. Poirot, Praneeth Vepakomma, Ken Chang, Jayashree Kalpathy-Cramer, et al. 2019. Split Learning for collaborative deep learning in healthcare.CoRR abs/1912.12115 (2019)
-
[27]
Wenjie Qu, Yuguang Zhou, Yongji Wu, Tingsong Xiao, Binhang Yuan, Yiming Li, and Jiaheng Zhang. 2025. Prompt Inversion Attack Against Collaborative Infer- ence of Large Language Models. InIEEE Symposium on Security and Privacy
work page 2025
-
[28]
Siladitya Ray. 2023. A Growing List Of Companies Cracking Down On Use Of ChatGPT By Staffers—Here’s Why
work page 2023
-
[29]
Liangqin Ren, Zeyan Liu, Fengjun Li, Kaitai Liang, et al . 2024. PrivDNN: A Secure Multi-Party Computation Framework for Deep Learning using Partial DNN Encryption.PoPETs(2024)
work page 2024
-
[30]
Chris Renzo, Liv Aliberti, Justin Miles, and Joe Kovba. 2024. Large language model inference over confidential data using AWS Nitro Enclaves
work page 2024
-
[31]
Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, et al. 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. InICLR
work page 2024
-
[32]
Robin Staab, Mark Vero, Mislav Balunovic, and Martin T. Vechev. 2024. Beyond Memorization: Violating Privacy via Inference with Large Language Models. In ICLR
work page 2024
- [33]
-
[34]
Llama 3 Team. 2024. The Llama 3 Herd of Models.CoRRabs/2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[35]
Qwen3 Team. 2025. Qwen3 Technical Report.CoRRabs/2505.09388 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[36]
SmolLM2 Team. 2025. SmolLM2: When Smol Goes Big - Data-Centric Training of a Small Language Model.CoRRabs/2502.02737 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. InICDCS. IEEE Computer Society, 328–339
work page 2017
-
[38]
Dixi Yao, Liyao Xiang, Hengyuan Xu, Hangyu Ye, et al. 2022. Privacy-Preserving Split Learning via Patch Shuffling over Transformers. InICDM
work page 2022
-
[39]
Weichen Yu, Tianyu Pang, Qian Liu, Chao Du, et al . 2023. Bag of Tricks for Training Data Extraction from Language Models. InICML, V ol. 202
work page 2023
-
[40]
Kai Yue, Richeng Jin, Chau-Wai Wong, Dror Baron, et al. 2023. Gradient Obfus- cation Gives a False Sense of Security in Federated Learning. InUSENIX Security. USENIX Association, 6381–6398
work page 2023
-
[41]
Zhexin Zhang, Jiaxin Wen, and Minlie Huang. 2023. ETHICIST: Targeted Train- ing Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation. InACL
work page 2023
- [42]
-
[43]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, et al. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. InNeurIPS. A Proof of Theorem 1 PROOF. We can re-express the𝛿=( ˆz−z)J by defining Δ= ˆz−z , which gives us 𝛿=ΔJ . Finding a solution for Δ can be reformulated as the following optimization problem:min Δ ||𝛿−ΔJ|| 2 2. To find the ...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.