Recognition: unknown
LightSplit: Practical Privacy-Preserving Split Learning via Orthogonal Projections
Pith reviewed 2026-05-14 19:42 UTC · model grok-4.3
The pith
LightSplit uses a fixed orthogonal random projection at the cut layer to cut transmitted dimensionality by up to 32 times while retaining more than 95% of baseline accuracy in split learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LightSplit limits information exposure and communication overhead in split learning by applying a lightweight fixed orthogonal random projection at the cut layer. This projection acts as an information bottleneck based on Shannon's information theory, restricting instance-specific information and suppressing exploitable per-sample signals. Transmitting the low-dimensional projections allows the server to operate on lifted representations without architectural modifications. The projection is non-invertible, so part of the original representation is irreversibly discarded at the client, reducing the information available for reconstruction. The method preserves end-to-end differentiabilityvia
What carries the argument
The fixed orthogonal random projection applied at the cut layer, which functions as a non-invertible information bottleneck to reduce dimensionality and limit instance-specific information in the transmitted data.
Where Pith is reading between the lines
- The method may enable split learning deployments on bandwidth-limited edge devices by allowing flexible reduction ratios.
- Future work could test whether varying the projection matrix per round adds more privacy without losing the fixed lightweight property.
- Similar projection bottlenecks might apply to other collaborative learning methods facing communication and privacy trade-offs.
- Real-world validation on hardware with actual network constraints would show if the 32x reduction translates to practical speedups.
Load-bearing premise
A fixed orthogonal random projection alone is sufficient to restrict instance-specific information enough to prevent reconstruction attacks, without needing sparsification, noise, or other additional mechanisms.
What would settle it
A reconstruction attack succeeding in recovering a significant portion of the original activations or input data from the low-dimensional projected representations at the server.
Figures
read the original abstract
Split learning (SL) enables collaborative training by partitioning a neural network across clients and a central server, but the cut-layer interface introduces a key challenge: high-dimensional activations incur substantial communication overhead while exposing representations vulnerable to reconstruction attacks. Existing approaches typically address efficiency or privacy in isolation, relying on additional mechanisms such as sparsification, quantization, or noise injection. We propose LightSplit, which limits information exposure and reduces communication overhead by applying a lightweight fixed orthogonal random projection at the cut layer. Based on Shannon's information theory, this projection acts as an information bottleneck that restricts instance-specific information and suppresses exploitable per-sample signals. By transmitting low-dimensional projections instead of raw activations, the server operates on lifted representations without requiring architectural modifications, ensuring compatibility with existing SL architectures. By avoiding additional trainable components on the client, the method remains lightweight and suitable for edge devices while preserving end-to-end differentiability via exact gradient propagation. As the projection is non-invertible, part of the original representation is irreversibly discarded at the client, LightSplit reduces the information available for reconstruction and limits information exposure. We extensively evaluate LightSplit on state-of-the-art benchmarks in both IID and non-IID settings across varying projection dimensions and client scales. Our results show that the method retains more than 95% of the baseline accuracy at up to 32x reduction in transmitted dimensionality while maintaining stable training dynamics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes LightSplit, a split-learning method that inserts a fixed orthogonal random projection at the cut layer. This projection is claimed to serve as a lightweight information bottleneck that simultaneously reduces transmitted dimensionality by up to 32× and limits instance-specific information leakage, while preserving end-to-end differentiability and retaining more than 95 % of baseline accuracy on standard benchmarks under both IID and non-IID partitions.
Significance. If the performance and privacy claims are substantiated, the approach would supply a parameter-free, client-lightweight defense that avoids noise, sparsification, or extra trainable modules, thereby improving the practicality of split learning on edge devices. The explicit appeal to Shannon-theoretic irreversibility and the reported stability across client scales constitute the main potential contributions.
major comments (2)
- [§4] §4 (Experimental Evaluation): the central claim that >95 % baseline accuracy is retained at 32× dimensionality reduction is presented without reported baselines, number of random seeds, error bars, or precise non-IID partitioning protocol, rendering the quantitative result unverifiable.
- [§3.2] §3.2 (Privacy Argument): the assertion that a single fixed orthogonal projection suffices to block reconstruction attacks rests solely on non-invertibility and an information-theoretic motivation; no mutual-information bounds, reconstruction-error metrics, or adversarial attack results are supplied, even though the accuracy claim implies that task-relevant features remain recoverable by the server.
minor comments (2)
- [Abstract] The abstract states that experiments cover 'varying projection dimensions and client scales' yet supplies no table or figure reference for these dimensions; a summary table would improve clarity.
- [§3.1] Notation for the projection matrix (e.g., whether it is drawn once and fixed or re-sampled per round) is introduced without an explicit equation number, complicating reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve verifiability and substantiation of the claims.
read point-by-point responses
-
Referee: [§4] §4 (Experimental Evaluation): the central claim that >95 % baseline accuracy is retained at 32× dimensionality reduction is presented without reported baselines, number of random seeds, error bars, or precise non-IID partitioning protocol, rendering the quantitative result unverifiable.
Authors: We agree that the experimental section should include explicit baseline accuracies, the number of random seeds, error bars, and a precise non-IID partitioning protocol to make the >95% retention claim verifiable. In the revised manuscript we will report the exact baseline accuracies, average results over at least five random seeds with standard-deviation error bars, and specify the non-IID partitioning method (e.g., Dirichlet concentration parameter). revision: yes
-
Referee: [§3.2] §3.2 (Privacy Argument): the assertion that a single fixed orthogonal projection suffices to block reconstruction attacks rests solely on non-invertibility and an information-theoretic motivation; no mutual-information bounds, reconstruction-error metrics, or adversarial attack results are supplied, even though the accuracy claim implies that task-relevant features remain recoverable by the server.
Authors: The core privacy argument relies on the non-invertibility of the fixed orthogonal projection, which irreversibly discards instance-specific information according to Shannon theory. We acknowledge that empirical support is currently limited. In the revision we will add reconstruction-error metrics (MSE between original and reconstructed activations) and results from reconstruction attacks to quantify leakage. Full mutual-information bounds are computationally intractable for high-dimensional activations, but we will include empirical estimates; the preservation of task-relevant features is intentional and does not contradict the reduction in instance-specific information. revision: partial
Circularity Check
No circularity; proposal is empirical with information-theoretic motivation
full rationale
The paper introduces LightSplit by applying a fixed orthogonal random projection at the cut layer, motivated by Shannon's information theory as creating an irreversible information bottleneck. No derivation chain reduces any claim to fitted parameters, self-citations, or self-definitional loops. The accuracy retention result (>95% at 32x reduction) is presented as an empirical outcome from benchmarks in IID/non-IID settings, not as a prediction forced by construction from inputs. Privacy is argued heuristically from non-invertibility without quantitative mutual-information bounds or self-referential uniqueness theorems. The method is self-contained against external benchmarks and does not rename known results or smuggle ansatzes via citations. This is a standard empirical proposal without circular structure.
Axiom & Free-Parameter Ledger
free parameters (1)
- projection dimension
axioms (1)
- domain assumption A fixed orthogonal random projection at the cut layer restricts instance-specific information and suppresses per-sample signals.
Reference graph
Works this paper leans on
-
[1]
Deep learning with differential privacy
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InCCS, 2016
2016
-
[2]
CONTRA: Defending against poisoning attacks in federated learning
Sana Awan, Bo Luo, and Fengjun Li. CONTRA: Defending against poisoning attacks in federated learning. InESORICS, volume 12972. Springer, 2021
2021
-
[3]
The Johnson–Lindenstrauss transform itself preserves differential privacy
Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The Johnson–Lindenstrauss transform itself preserves differential privacy. In FOCS, 2012
2012
-
[4]
Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. Com- pressed sensing using generative models. InICML, 2017. 13
2017
-
[5]
FLTrust: Byzantine-robust federated learning via trust bootstrapping
Xiaoyu Cao, Minghong Fang, Jia Liu, and Neil Zhenqiang Gong. FLTrust: Byzantine-robust federated learning via trust bootstrapping. In NDSS, 2021
2021
-
[6]
The Secret Sharer: Evaluating and testing unintended memoriza- tion in neural networks
Nicholas Carlini, Chang Liu, ´Ulfar Erlingsson, Jernej Kos, and Dawn Song. The Secret Sharer: Evaluating and testing unintended memoriza- tion in neural networks. InUSENIX Security, 2019
2019
-
[7]
Cover and Joy A
Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley-Interscience, 2 edition, 2006
2006
-
[8]
Ercument Cicek
Ege Erdogan, Alptekin Kupcu, and A. Ercument Cicek. SplitGuard: Detecting and mitigating training-hijacking attacks in split learning. In WPES, 2022
2022
-
[9]
Ercument Cicek
Ege Erdogan, Alptekin Kupcu, and A. Ercument Cicek. UnSplit: Data- oblivious model inversion, model stealing, and label inference attacks against split learning. InWorkshop on Privacy in the Electronic Society, 2022
2022
-
[10]
Fangcheng Fu, Xuanyu Wang, Jiawei Jiang, Huanran Xue, and Bui Cui. ProjPert: Projection-Based Perturbation for Label Protection in Split Learning Based Vertical Federated Learning .IEEE Transactions on Knowledge & Data Engineering, 36(07):3417–3428, July 2024
2024
-
[11]
Focusing on pinocchio’s nose: A gradients scrutinizer to thwart split-learning hijacking attacks using intrinsic attributes
Jiayun Fu, Xiaojing Ma, Bin B Zhu, Pingyi Hu, Ruixin Zhao, Yaru Jia, Peng Xu, Hai Jin, and Dongmei Zhang. Focusing on pinocchio’s nose: A gradients scrutinizer to thwart split-learning hijacking attacks using intrinsic attributes. InNDSS, 2023
2023
-
[12]
Clement Fung, Chris J. M. Yoon, and Ivan Beschastnikh. The limitations of federated learning in sybil settings. InInternational Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2020
2020
-
[13]
Alireza Furutanpey, Philipp Raith, and Schahram Dustdar. Frankensplit: Efficient neural feature compression with shallow variational bottleneck injection for mobile edge computing.IEEE Transactions on Mobile Computing, 23(12):10770–10786, 2024
2024
-
[14]
PCAT: Functionality and data stealing from split learning by Pseudo-Client attack
Xinben Gao and Lan Zhang. PCAT: Functionality and data stealing from split learning by Pseudo-Client attack. InUSENIX Security. USENIX Association, 2023
2023
-
[15]
Grzegorz Gawron and Philip Stubbings. Feature space hijacking attacks against differentially private split learning.arXiv preprint arXiv:2201.04018, 2022
-
[16]
Inverting gradients-how easy is it to break privacy in federated learning?NeurIPS, 2020
Jonas Geiping, Hartmut Bauermeister, Hannah Dr ¨oge, and Michael Moeller. Inverting gradients-how easy is it to break privacy in federated learning?NeurIPS, 2020
2020
-
[17]
CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy
Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy. InICML, 2016
2016
-
[18]
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[19]
Distributed learning of deep neural network over multiple agents.Journal of Network and Computer Applications, 116:1–8, 2018
Otkrist Gupta and Ramesh Raskar. Distributed learning of deep neural network over multiple agents.Journal of Network and Computer Applications, 116:1–8, 2018
2018
-
[20]
Yuze Han, Xiang Li, Shiyun Lin, and Zhihua Zhang. A random projection approach to personalized federated learning: Enhancing com- munication efficiency, robustness, and fairness.Journal of Machine Learning Research, 25(380):1–88, 2024
2024
-
[21]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, 2016
2016
-
[22]
Zecheng He, Tianwei Zhang, and Ruby B. Lee. Model inversion attacks against collaborative inference. InACSAC, 2019
2019
-
[23]
C3-SL: Circular convolution-based batch-wise compression for communication-efficient split learning
Cheng-Yen Hsieh, Yu-Chuan Chuang, and An-Yeu Wu. C3-SL: Circular convolution-based batch-wise compression for communication-efficient split learning. InIEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2022
2022
-
[24]
Extensions of lipschitz maps into a hilbert space.Contemporary Mathematics, 26:189–206, 01 1984
William Johnson and Joram Lindenstrauss. Extensions of lipschitz maps into a hilbert space.Contemporary Mathematics, 26:189–206, 01 1984
1984
-
[25]
Privacy via the Johnson–Lindenstrauss transform.Journal of Privacy and Confidentiality, 5(1), 2013
Krishnaram Kenthapadi, Aleksandra Korolova, Ilya Mironov, and Nina Mishra. Privacy via the Johnson–Lindenstrauss transform.Journal of Privacy and Confidentiality, 5(1), 2013
2013
-
[26]
Kingma and Jimmy Ba
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015
2015
-
[27]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009
2009
-
[28]
Lecun, L
Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278– 2324, 1998
1998
-
[29]
Label leakage and protection in two-party split learning
Oscar Li, Jiankai Sun, Xin Yang, Weihao Gao, Hongyi Zhang, Junyuan Xie, Virginia Smith, and Chong Wang. Label leakage and protection in two-party split learning. InICLR, 2022
2022
-
[30]
FedVS: Straggler-resilient and privacy-preserving vertical federated learning for split models
Songze Li, Duanyi Yao, and Jin Liu. FedVS: Straggler-resilient and privacy-preserving vertical federated learning for split models. InICML, 2023
2023
-
[31]
Similarity-based label inference attack against training and inference of split learning
Junlin Liu, Xinchen Lyu, Qimei Cui, and Xiaofeng Tao. Similarity-based label inference attack against training and inference of split learning. IEEE Transactions on Information Forensics and Security, 19:2881– 2895, 2024
2024
-
[32]
Ranpac: Random projections and pre-trained models for continual learning
Mark D McDonnell, Dong Gong, Amin Parvaneh, Ehsan Abbasnejad, and Anton Van den Hengel. Ranpac: Random projections and pre-trained models for continual learning. InNeurIPS, 2023
2023
-
[33]
Communication-efficient learning of deep networks from decentralized data
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. InInternational Conference on Artificial Intelligence and Statistics (AISTATS). Pmlr, 2017
2017
-
[34]
Shredder: Learning noise distributions to protect inference privacy
Fatemehsadat Mireshghallah, Mohammadkazem Taram, Prakash Ram- rakhyani, Ali Jalali, Dean Tullsen, and Hadi Esmaeilzadeh. Shredder: Learning noise distributions to protect inference privacy. InInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020
2020
-
[35]
Communication-efficient split learning via adaptive feature-wise com- pression.IEEE Transactions on Neural Networks and Learning Systems, PP:1–15, 01 2025
Yongjeong Oh, Jaeho Lee, Christopher Brinton, and Yo-Seb Jeon. Communication-efficient split learning via adaptive feature-wise com- pression.IEEE Transactions on Neural Networks and Learning Systems, PP:1–15, 01 2025
2025
-
[36]
SafeSplit: A novel defense against client-side backdoor attacks in split learning
Phillip Rieger, Alessandro Pegoraro, Kavita Kumari, Tigist Abera, Jonathan Knauer, and Ahmad-Reza Sadeghi. SafeSplit: A novel defense against client-side backdoor attacks in split learning. InNDSS, 2025
2025
-
[37]
Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems
Jiawei Shao and Jun Zhang. Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems. InIEEE Inter- national Conference on Communications Workshops (ICC Workshops), 2020
2020
-
[38]
DISCO: Dynamic and invariant sensitive channel obfuscation for deep neural networks
Abhishek Singh, Ayush Chopra, Ethan Garza, Emily Zhang, Praneeth Vepakomma, Vivek Sharma, and Ramesh Raskar. DISCO: Dynamic and invariant sensitive channel obfuscation for deep neural networks. In CVPR, 2021
2021
-
[39]
The german traffic sign recognition benchmark: a multi-class classifica- tion competition
Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. The german traffic sign recognition benchmark: a multi-class classifica- tion competition. Ininternational joint conference on neural networks. IEEE, 2011
2011
-
[40]
Splitfed: When federated learning meets split learning
Chandra Thapa, Pathum Chamikara Mahawaga Arachchige, Seyit Camtepe, and Lichao Sun. Splitfed: When federated learning meets split learning. InAAAI, 2022
2022
-
[41]
Split learning for health: Distributed deep learning without sharing raw patient data
Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. Split learning for health: Distributed deep learning without sharing raw patient data.arXiv preprint arXiv:1812.00564, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[42]
Nopeek: Information leakage reduction to share activations in distributed deep learning
Praneeth Vepakomma, Abhishek Singh, Otkrist Gupta, and Ramesh Raskar. Nopeek: Information leakage reduction to share activations in distributed deep learning. InInternational Conference on Data Mining Workshops (ICDMW). IEEE, 2020
2020
-
[43]
Bovik, H.R
Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004
2004
-
[44]
EcoFed: Efficient Communication for DNN Partitioning-Based Federated Learning .IEEE Transactions on Parallel & Distributed Systems, 35(03):377–390, March 2024
Di Wu, Rehmat Ullah, Philip Rodgers, Peter Kilpatrick, Ivor Spence, and Blesson Varghese. EcoFed: Efficient Communication for DNN Partitioning-Based Federated Learning .IEEE Transactions on Parallel & Distributed Systems, 35(03):377–390, March 2024
2024
-
[45]
Split learning over wireless networks: Parallel design and resource management.IEEE Journal on Selected Areas in Communications, 41(4):1051–1066, 2023
Wen Wu, Mushu Li, Kaige Qu, Conghao Zhou, Xuemin Shen, Weihua Zhuang, Xu Li, and Weisen Shi. Split learning over wireless networks: Parallel design and resource management.IEEE Journal on Selected Areas in Communications, 41(4):1051–1066, 2023
2023
-
[46]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[47]
A stealthy wrongdoer: Feature- oriented reconstruction attack against split learning
Xiaoyang Xu, Mengda Yang, Wenzhe Yi, Ziang Li, Juan Wang, Hongxin Hu, Yong Zhuang, and Yaxin Liu. A stealthy wrongdoer: Feature- oriented reconstruction attack against split learning. InCVPR, 2024
2024
-
[48]
Privacy-preserving split learning via patch shuffling over transformers
Dixi Yao, Liyao Xiang, Hengyuan Xu, Hangyu Ye, and Yingqi Chen. Privacy-preserving split learning via patch shuffling over transformers. InIEEE International Conference on Data Mining (ICDM), 2022
2022
-
[49]
Ginver: Generative model inversion attacks against collaborative inference
Yupeng Yin, Xianglong Zhang, Huanle Zhang, Feng Li, Yue Yu, Xiuzhen Cheng, and Pengfei Hu. Ginver: Generative model inversion attacks against collaborative inference. InWWW, 2023. 14
2023
-
[50]
How to backdoor split learning.Neural Networks, 168:326–336, 2023
Fangchao Yu, Lina Wang, Bo Zeng, Kai Zhao, Zhi Pang, and Tian Wu. How to backdoor split learning.Neural Networks, 168:326–336, 2023
2023
-
[51]
Efros, Eli Shechtman, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018
2018
-
[52]
idlg: Improved deep leakage from gradients.arXiv preprint arXiv:2001.02610, 2020
Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. idlg: Improved deep leakage from gradients.arXiv preprint arXiv:2001.02610, 2020
-
[53]
Reducing communication for split learning by randomized top-k sparsification
Fei Zheng, Chaochao Chen, Lingjuan Lyu, and Binhui Yao. Reducing communication for split learning by randomized top-k sparsification. In International Joint Conference on Artificial Intelligence, 2023
2023
-
[54]
Mask-encoded sparsification: Mitigating biased gradients in communication-efficient split learning
Wenxuan Zhou, Zhihao Qu, Shen-Huan Lyu, Miao Cai, and Baoliu Ye. Mask-encoded sparsification: Mitigating biased gradients in communication-efficient split learning. InECAI - European Confer- ence on Artificial Intelligence, Frontiers in Artificial Intelligence and Applications. IOS Press, 2024
2024
-
[55]
Deep leakage from gradients
Ligeng Zhu, Zhijian Liu, and Song Han. Deep leakage from gradients. InNeurIPS, 2019
2019
-
[56]
Passive inference attacks on split learning via adversarial regularization
Xiaochen Zhu, Xinjian Luo, Yuncheng Wu, Yangfan Jiang, Xiaokui Xiao, and Beng Chin Ooi. Passive inference attacks on split learning via adversarial regularization. InNDSS, 2025. APPENDIX Our appendix completes the per-attack and per- configuration sweeps that the main eval cross-references, and gives the step-by-step training derivation that Sec. IV-D def...
2025
-
[57]
51 3 10 30 100 300 3σ |z| on cos(¯zi, ¯z) (a) α 5 10 20 50 (b) n
-
[58]
51 3 10 30 100 300 |z| on cos(¯zi, ¯z) (c) µ
-
[59]
10. 20. 3 0. 5 0. 8 (d) p Raw Learned 1×1 LightSplit-F LightSplit-L Fig. 9:Server-input cosine signalcos(¯z i,¯z).Malicious-client |MAD-z-score|vs. each ablation axis; in every panel the swept parameter is varied while the rest are held at the defaults reported in Fig. 7. applied identically to the target path and to the simulator path so that RAWreduces ...
2048
-
[60]
We count only the bottleneck and lift-back overhead beyond the common split modelh◦g◦f
Computational complexity:Letbdenote the mini-batch size, let the cut activation have shapeC×H×Wand flat- tened dimensiond=CHW, letkbe the transmitted width (CR=d/k), and letmbe the hidden width of the learned lift-back. We count only the bottleneck and lift-back overhead beyond the common split modelh◦g◦f. a) Cost of LightSplit:Both LIGHTSPLITvariants sen...
-
[61]
b) Step 2 (client, projection):The client applies the fixed projection ˜zi =R ⊤zi ∈R k, following Eq
Forward pass: a) Step 1 (client, head):On a mini-batch{(x i, yi)}b i=1, the client computes the cut-layer activationz i =f(x i)∈R d. b) Step 2 (client, projection):The client applies the fixed projection ˜zi =R ⊤zi ∈R k, following Eq. (2). The resulting tensor is recorded in the client’s autograd graph as a function off’s parameters. c) Step 3 (network, c...
-
[62]
b) Step 12.:The client invokesloss.backward()on its local graph
Backward pass: client (Phase 1): a) Step 11:The client zeros its optimizer state with client_optimizer.zero_grad(). b) Step 12.:The client invokesloss.backward()on its local graph. Autograd traverses the two branches simulta- neously: •The CE branch reacheshand accumulates∂L CE/∂θh intoh’s.grad, then propagates back tou, leaving ∂LCE/∂uonu’s.grad. Because...
-
[63]
Backward pass: server: a) Step 14:The server zeros its optimizer state with server_optimizer.zero_grad(). Fig. 14: Level-4 VSL,CR=16; same layout as Fig. 10. Fig. 15: Level-4 VSL,CR=32; same layout as Fig. 10. b) Step 15:The server uses the incoming∂L CE/∂uas the upstream seed and callsu.backward(grad_output) on its server-side graph. Autograd traversesu→...
-
[64]
16: Level-7 USL,CR=8; same layout as Fig
Backward pass: client (Phase 2): a) Step 18.:The client receives∂L CE/∂˜zandaddsit to the contribution already sitting on ˜z’s.gradfrom Step 12, 20 Fig. 16: Level-7 USL,CR=8; same layout as Fig. 10. Fig. 17: Level-7 USL,CR=16; same layout as Fig. 10. yielding ∂L ∂˜z = ∂LCE ∂˜z +λ WCC ∂LWCC ∂˜z .(12) This client-local addition is the only point in the enti...
-
[65]
Each direction transmits exactly one tensor per sample
Wire-level and optimizer-level summary:The four mes- sages exchanged at the cut interface during a single mini-batch step are summarized in Table XI: the forward payload ˜z, the server’s return signalu, and the two backward gradient signals ∂LCE/∂uand∂L CE/∂˜z. Each direction transmits exactly one tensor per sample. The server’s update of(θ g, θϕ)depends ...
-
[66]
22: FORA white-box reference-decoder reconstructions on CIFAR-10 atCR=8×
Equivalence between single- and two-optimizer im- plementations:For any first-order optimizer (SGD, Adam, AdamW, etc.) applied parameter-wise, splitting a single loss.backward()/optimizer.step()call into two independent(zero_grad, backward, step)se- White-box target Ground truth Raw Learned 1×1 LS-F λwcc = 0 LS-F λwcc = 0.01 LS-F λwcc = 0.1 Fig. 22: FORA ...
-
[67]
Compact derivation in equations:For a reader who prefers the gradient bookkeeping in equation form, this sub- section restates the per-step computation as a sequence of derivatives. Letθ f , θh, θg, θϕ denote the trainable parameters off, h, g, ϕ, respectively, and letJ F denote the Jacobian of a mapFat the current point. 22 White-box target Ground truth ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.