pith. sign in

arxiv: 2605.23158 · v1 · pith:46FPUWXJnew · submitted 2026-05-22 · 💻 cs.CR · cs.CL· cs.LG

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

Pith reviewed 2026-05-25 04:39 UTC · model grok-4.3

classification 💻 cs.CR cs.CLcs.LG
keywords split inferenceprivacy leakagelarge language modelsreconstruction attackactivation inversionperturbation defenseLLM security
0
0 comments X

The pith

Split inference for LLMs allows servers to reconstruct client inputs from activations despite common defenses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that partitioning large language models between client and server for split inference does not fully protect the client's private input, as the server receives intermediate activations that can be inverted. The authors introduce ActInv to solve an activation matching problem and recover the original input with high fidelity. Common defenses such as adding Gaussian noise or sparsifying activations fail to block this reconstruction in their tests. They define the Perturbation Amplification Factor to measure each layer's resistance and show that leakage risk differs sharply across layers. From these observations they design PriPert, a defense that chooses perturbation directions to increase reconstruction error while tracking utility and cost.

Core claim

ActInv reconstructs the client's input by solving an intermediate activation matching problem, yielding high-fidelity results even when Gaussian noise injection or activation sparsification is present. The Perturbation Amplification Factor quantifies that privacy vulnerability is not uniform across layers. PriPert improves protection by calibrating perturbation directions during backpropagation to maximize reconstruction error.

What carries the argument

ActInv, an attack that formulates reconstruction as an intermediate activation matching problem solved from server-received activations.

If this is right

  • High-fidelity input reconstruction remains possible against Gaussian noise and sparsification defenses.
  • Some layers exhibit natural resistance to reconstruction while others are highly susceptible.
  • Calibrating perturbation directions during training measurably raises reconstruction error.
  • PriPert maintains acceptable utility and overhead while strengthening privacy in split setups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Choosing the split point at layers with higher PAF values could reduce leakage with no added client cost.
  • If the server must be assumed to know the architecture, stronger client-side input protections become necessary.
  • The non-uniform layer vulnerability suggests testing split inference on models where the cut is chosen adaptively per input.

Load-bearing premise

The server knows the client model architecture and can solve the activation matching problem without protections beyond the tested perturbations.

What would settle it

An experiment showing that ActInv produces only low-fidelity reconstructions when the server lacks the client architecture or when the client applies architecture modifications unknown to the server.

Figures

Figures reproduced from arXiv: 2605.23158 by Cen Chen, Fuyi Wang, Mingyuan Fan, Yu Liu.

Figure 1
Figure 1. Figure 1: We randomly extract a prompt from AlpacaEval: [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The ActInv’s Precision, Recall, and ROUGE-L scores evolve over 2000 optimization iterations in AlpacaEval. We use a sparsification ratio of 0.5 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The common components of a single block within [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of different layers’ sensitivity in Qwen3-0.6B. The expected PAF values capture the average amplification across [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of different layers’ sensitivity in Falcon3-1B. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Evaluation rubric used by the judge model. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of different layers’ sensitivity in Llama [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

The deployment of large language models (LLMs) on resource-constrained devices remains challenging, spurring interest in split inference, where models are partitioned between client and server to reduce computational burden and enhance privacy by transmitting only intermediate activations. However, the privacy-preserving capabilities of split inference, particularly in the context of LLMs, have not been exhaustively investigated. To fill this gap, we introduce ActInv, which solves an intermediate activation matching problem to reconstruct the client's input. Extensive evaluations demonstrate that ActInv achieves high-fidelity reconstructions, even in the presence of common perturbation-based defenses such as Gaussian noise injection and activation sparsification. To systematically understand this vulnerability, we develop Perturbation Amplification Factor (PAF), a metric for quantifying a layer's inherent resistance to reconstruction. Our analysis reveals that privacy vulnerability is not uniform across layers, with some layers being highly susceptible to leakage while others offer natural resistance. Furthermore, we demonstrate that defense effectiveness can be significantly improved by calibrating perturbation directions to maximize reconstruction error during backpropagation. Building on these insights, we design PriPert and conduct comprehensive evaluations, covering privacy, utility, and computational overhead, to demonstrate its effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that split inference for LLMs is vulnerable to input reconstruction by the server via ActInv, which solves an intermediate activation matching optimization problem to achieve high-fidelity recovery even under perturbation defenses such as Gaussian noise and sparsification. It introduces the Perturbation Amplification Factor (PAF) metric to quantify per-layer resistance to reconstruction and proposes PriPert, a defense that calibrates perturbation directions via backpropagation to increase reconstruction error, with evaluations addressing privacy, utility, and computational overhead.

Significance. If the empirical results hold under the stated assumptions, the work identifies concrete privacy limitations of split inference for LLMs and supplies both an analysis tool (PAF) and a practical defense (PriPert). This contributes to privacy-preserving ML by showing that activation transmission alone does not suffice for strong privacy and by offering layer-specific insights that could guide split-point selection.

major comments (2)
  1. [Method section (ActInv definition)] The ActInv formulation (described after the abstract and in the method section) requires the server to possess exact white-box knowledge of the client model architecture, layer dimensions, activation functions, and precise split point in order to instantiate the activation matching objective. This precondition is load-bearing for all reported fidelity claims; without it the optimization cannot be set up, yet the manuscript provides no black-box variant or evaluation under architecture obfuscation.
  2. [Evaluations section] Section on evaluations: the abstract asserts 'extensive evaluations' and effectiveness against listed defenses, but the provided description contains no quantitative reconstruction metrics, error bars, dataset sizes, ablation studies, or statistical tests. These details are required to substantiate the central claim that ActInv succeeds even under Gaussian noise and sparsification.
minor comments (2)
  1. [Abstract and introduction] Define the acronyms ActInv, PAF, and PriPert at first use in the main text rather than only in the abstract.
  2. [Method section] Clarify the precise mathematical formulation of the activation matching loss and the backpropagation used for PriPert calibration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below, providing clarifications on the threat model and committing to improvements in the presentation of results.

read point-by-point responses
  1. Referee: [Method section (ActInv definition)] The ActInv formulation (described after the abstract and in the method section) requires the server to possess exact white-box knowledge of the client model architecture, layer dimensions, activation functions, and precise split point in order to instantiate the activation matching objective. This precondition is load-bearing for all reported fidelity claims; without it the optimization cannot be set up, yet the manuscript provides no black-box variant or evaluation under architecture obfuscation.

    Authors: ActInv is defined under the standard white-box threat model for split inference, in which the server knows the client model architecture, dimensions, activations, and split point because these are fixed at deployment time and typically public in such systems. This matches the assumptions in prior split-learning privacy analyses. We will revise the method section to state this assumption explicitly and discuss its scope, but we do not claim the attack applies under architecture obfuscation. revision: partial

  2. Referee: [Evaluations section] Section on evaluations: the abstract asserts 'extensive evaluations' and effectiveness against listed defenses, but the provided description contains no quantitative reconstruction metrics, error bars, dataset sizes, ablation studies, or statistical tests. These details are required to substantiate the central claim that ActInv succeeds even under Gaussian noise and sparsification.

    Authors: The evaluations section reports quantitative reconstruction metrics (MSE, PSNR, cosine similarity), results across multiple datasets with explicit sizes, ablations on noise variance and sparsity ratios, and comparisons against the listed defenses. To strengthen the presentation we will add error bars, dataset-size tables, additional ablation figures, and statistical significance tests in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical attack/defense constructions with no self-referential derivations

full rationale

The paper introduces ActInv as an optimization-based reconstruction attack, defines PAF as a new metric, and evaluates PriPert as a defense. All claims rest on experimental results under stated assumptions (white-box architecture knowledge). No equations, predictions, or first-principles results are presented that reduce by construction to fitted inputs, self-citations, or renamed patterns. The derivation chain is self-contained as direct empirical measurement.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

Only abstract available; no explicit free parameters, axioms, or invented physical entities are described. The work introduces algorithmic methods rather than new physical postulates.

axioms (1)
  • domain assumption The server receives intermediate activations computed by the client-side portion of the split model.
    Core premise of the split-inference threat model stated in the abstract.
invented entities (3)
  • ActInv no independent evidence
    purpose: Algorithm to reconstruct client input by solving an activation matching problem.
    New method introduced to demonstrate leakage.
  • PAF no independent evidence
    purpose: Metric quantifying a layer's inherent resistance to input reconstruction.
    New metric proposed to analyze vulnerability.
  • PriPert no independent evidence
    purpose: Defense that selects perturbation directions to maximize reconstruction error.
    New defense design based on the analysis.

pith-pipeline@v0.9.0 · 5746 in / 1282 out tokens · 53063 ms · 2026-05-25T04:39:33.639468+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 4 internal anchors

  1. [1]

    2004.Convex optimization

    Stephen P Boyd and Lieven Vandenberghe. 2004.Convex optimization. Cambridge university press

  2. [2]

    Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, et al. 2023. Quantifying Memorization Across Neural Language Models. InICLR

  3. [3]

    Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, et al. 2021. Extracting Training Data from Large Language Models. InUSENIX Security

  4. [4]

    Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, et al. 2024. A Survey on Evaluation of Large Language Models.ACM Trans. Intell. Syst. Technol.15, 3 (2024), 39:1–39:45

  5. [5]

    Guanzhong Chen, Zhenghan Qin, Mingxin Yang, Yajie Zhou, Tao Fan, Tianyu Du, and Zenglin Xu. 2024. Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack. InCCS

  6. [6]

    Yuxuan Chen, Rongpeng Li, Xiaoxue Yu, Zhifeng Zhao, and Honggang Zhang

  7. [7]

    Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach.CoRRabs/2406.02616 (2024)

  8. [8]

    Tian Dong, Yan Meng, Shaofeng Li, Guoxing Chen, Zhen Liu, and Haojin Zhu

  9. [9]

    In 34th USENIX Security Symposium

    Depth Gives a False Sense of Privacy: LLM Internal States Inversion. In 34th USENIX Security Symposium

  10. [10]

    In Gim, Caihua Li, and Lin Zhong. 2024. Confidential Prompting: Protecting User Prompts from Cloud LLM Providers.CoRRabs/2409.19134 (2024)

  11. [11]

    Zecheng He, Tianwei Zhang, and Ruby B. Lee. 2019. Model inversion attacks against collaborative inference. InACSAC

  12. [12]

    1987.Introduction to numerical analysis

    Francis Begnaud Hildebrand. 1987.Introduction to numerical analysis. Courier Corporation

  13. [13]

    Hongpeng Jin and Yanzhao Wu. 2025. CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration. InIEEE International Conference on Web Services

  14. [14]

    Eric Lehman, Sarthak Jain, Karl Pichotta, Yoav Goldberg, et al. 2021. Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?. InNAACL. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Mingyuan Fan, Y u Liu, Fuyi Wang, and Cen Chen

  15. [15]

    Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, et al. 2022. Explanations from Large Language Models Make Small Reasoners Better.CoRR(2022)

  16. [16]

    Yupei Liu, Yuqi Jia, Jinyuan Jia, and Neil Zhenqiang Gong. 2025. Evaluating LLM-based Personal Information Extraction and Countermeasures. InUsenix Security 2025

  17. [17]

    Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, et al. 2024. LLM-QAT: Data-Free Quantization Aware Training for Large Language Models. InFindings of the Association for Computational Linguistics

  18. [18]

    Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, et al . 2023. Analyzing Leakage of Personally Identifiable Information in Language Models. InS&P

  19. [19]

    Xinjian Luo, Ting Yu, and Xiaokui Xiao. 2025. Prompt Inference Attack on Distributed Large Language Model Inference Frameworks. InCCS. ACM, 1739– 1753

  20. [20]

    Xinyin Ma, Gongfan Fang, and Xinchao Wang. 2023. LLM-Pruner: On the Structural Pruning of Large Language Models. InNeurIPS

  21. [21]

    Peihua Mai, Ran Yan, Zhe Huang, Youjia Yang, and Yan Pang. 2024. Split-and- Denoise: Protect large language model inference with local differential privacy. In ICML

  22. [22]

    Morris, Wenting Zhao, Justin T

    John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, et al . 2024. Language Model Inversion. InICLR

  23. [23]

    Giorgos Nikolaou, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Yannis Panagakis, and Emanuele Rodolà. 2025. Language Models are Injective and Hence Invertible.CoRRabs/2510.15511 (2025)

  24. [24]

    OpenAI. 2023. GPT-4 Technical Report.CoRRabs/2303.08774 (2023). https: //doi.org/10.48550/arXiv.2303.08774

  25. [25]

    Dario Pasquini, Giuseppe Ateniese, and Massimo Bernaschi. 2021. Unleashing the Tiger: Inference Attacks on Split Learning. InCCS

  26. [26]

    Poirot, Praneeth Vepakomma, Ken Chang, Jayashree Kalpathy-Cramer, et al

    Maarten G. Poirot, Praneeth Vepakomma, Ken Chang, Jayashree Kalpathy-Cramer, et al. 2019. Split Learning for collaborative deep learning in healthcare.CoRR abs/1912.12115 (2019)

  27. [27]

    Wenjie Qu, Yuguang Zhou, Yongji Wu, Tingsong Xiao, Binhang Yuan, Yiming Li, and Jiaheng Zhang. 2025. Prompt Inversion Attack Against Collaborative Infer- ence of Large Language Models. InIEEE Symposium on Security and Privacy

  28. [28]

    Siladitya Ray. 2023. A Growing List Of Companies Cracking Down On Use Of ChatGPT By Staffers—Here’s Why

  29. [29]

    Liangqin Ren, Zeyan Liu, Fengjun Li, Kaitai Liang, et al . 2024. PrivDNN: A Secure Multi-Party Computation Framework for Deep Learning using Partial DNN Encryption.PoPETs(2024)

  30. [30]

    Chris Renzo, Liv Aliberti, Justin Miles, and Joe Kovba. 2024. Large language model inference over confidential data using AWS Nitro Enclaves

  31. [31]

    Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, et al. 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. InICLR

  32. [32]

    Robin Staab, Mark Vero, Mislav Balunovic, and Martin T. Vechev. 2024. Beyond Memorization: Violating Privacy via Inference with Large Language Models. In ICLR

  33. [33]

    Xuchen Suo. 2024. Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications.CoRRabs/2401.07612 (2024)

  34. [34]

    Llama 3 Team. 2024. The Llama 3 Herd of Models.CoRRabs/2407.21783 (2024)

  35. [35]

    Qwen3 Team. 2025. Qwen3 Technical Report.CoRRabs/2505.09388 (2025)

  36. [36]

    SmolLM2 Team. 2025. SmolLM2: When Smol Goes Big - Data-Centric Training of a Small Language Model.CoRRabs/2502.02737 (2025)

  37. [37]

    Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. InICDCS. IEEE Computer Society, 328–339

  38. [38]

    Dixi Yao, Liyao Xiang, Hengyuan Xu, Hangyu Ye, et al. 2022. Privacy-Preserving Split Learning via Patch Shuffling over Transformers. InICDM

  39. [39]

    Weichen Yu, Tianyu Pang, Qian Liu, Chao Du, et al . 2023. Bag of Tricks for Training Data Extraction from Language Models. InICML, V ol. 202

  40. [40]

    Kai Yue, Richeng Jin, Chau-Wai Wong, Dror Baron, et al. 2023. Gradient Obfus- cation Gives a False Sense of Security in Federated Learning. InUSENIX Security. USENIX Association, 6381–6398

  41. [41]

    Zhexin Zhang, Jiaxin Wen, and Minlie Huang. 2023. ETHICIST: Targeted Train- ing Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation. InACL

  42. [42]

    Zishuai Zhang, Hainan Zhang, Jiaying Zheng, Ziwei Wang, Yongxin Tong, Jin Dong, and Zhiming Zheng. 2025. A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability.CoRRabs/2505.15683 (2025)

  43. [43]

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, et al. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. InNeurIPS. A Proof of Theorem 1 PROOF. We can re-express the𝛿=( ˆz−z)J by defining Δ= ˆz−z , which gives us 𝛿=ΔJ . Finding a solution for Δ can be reformulated as the following optimization problem:min Δ ||𝛿−ΔJ|| 2 2. To find the ...